SlideShare ist ein Scribd-Unternehmen logo
1 von 47
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Tooling Up For Efficiency
DIY Solutions @ Netflix
Andrew Park Financial Planning & Analysis
Sebastien de Larquier Science & Analytics
A B D 3 1 9
D e c e m b e r 1 , 2 0 1 7
Top three booth questions…
1. Can I get a free Netflix?
2. Why did show X leave the service?
3. What is Netflix doing here?
Our Team - Cloud Capacity Analytics
The Netflix Challenge
The Efficiency Hierarchy of Needs
The Future
Contents
Q&A
CONTEXT AND CONCEPTS
The Netflix Challenge
“Freedom & Responsibility”
noun | /ef är/
eu-west-1 us-east-1 us-west-2
eu-west-1a eu-west-1b eu-west-1c us-east-1c us-east-1d us-east-1e us-west-2a us-west-2b us-west-2c
m4
.large
.xlarge
.2xlarge
.4xlarge
.16xlarge
m3
.medium
.large
.xlarge
.2xlarge
r4
.large
.xlarge
.2xlarge
.4xlarge
.8xlarge
.16xlarge
r3
.large
.xlarge
.2xlarge
.4xlarge
.8xlarge
i2
.xlarge
.2xlarge
.4xlarge
.8xlarge
i3
.large
.xlarge
.2xlarge
.4xlarge
.8xlarge
.16xlarge
d2
.xlarge
.2xlarge
.4xlarge
.8xlarge
Three regions, nine availability zones
7Instancefamilies,35instancetypes
1500+
configs!
Innovation Reliability
Security Efficiency
CHARTER AND SUCCESS CRITERIA
2.Our Team
Cloud Capacity Analytics
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Charter
Support the data-related needs of the
Cloud Capacity Planning function.
Investigate trends, patterns and
anomalies in core metrics.
Suggest new data-driven approaches
to existing workflows and goals.
Success Criteria [quant]
✘super-linear
growth
✔sub-linear
growth
Use a business-aware metric:
$[cloud] / streams
time
Success Criteria [qual]
Feedback from Engineering teams:
• Regular use of our tools and insights
• Raised awareness of their impact on efficiency
• Pro-active engagement on efficiency projects
FROM FUNDAMENTALS TO AUTOMATION
3.The Efficiency
Hierarchy of Needs
Automation
Actionable
Insights
Deep Dives
Transparency
Intuitive and Interactive
dashboards
Exploratory analyses and case
studies
Targeted alerts, summary emails
and personalized dashboards
Optimization and Machine
Learning
HIERARCHY OF NEEDS
THE EFFICIENCY
HIERARCHY OF NEEDS
THE EFFICIENCYTransparency
What do you need to know before you
can even ask about efficiency ?
Transparency
HIERARCHY OF NEEDS
THE EFFICIENCY
What do you need to know before
you can even ask about efficiency ?
Cost and Usage :
AWS DBR or CUR file(s)
System data : S3 Inventory,
AWS CloudTrail, …
Metadata : AWS Tags,
Org structure, …
Undocumented facts
(a.k.a., tribal knowledge)
“If you cannot measure it, you cannot improve it.”
– Lord Kelvin
HIERARCHY OF NEEDS
THE EFFICIENCY
What do you need to know before
you can even ask about efficiency ?
“That which is measured improves.
That which is measured and reported improves exponentially.”
– Karl Pearson (or Thomas Monson)
1. Tailor views to specific use cases
2. Add business context
3. When possible, co-locate with existing tools / workflows
Transparency through dashboards, with a few important rules:
Transparency
HIERARCHY OF NEEDS
THE EFFICIENCY
Picsou [piksu]
1. Scrooge McDuck in french.
2. Netflix’s comprehensive cloud
capacity analytics tool.
Transparency
• Data: Billing + Tribal Knowledge
• Tech: Scala-app + Spark + React.js
HIERARCHY OF NEEDS
THE EFFICIENCY
Cloud Cost Dashboard
1. Enrich cost and usage data with
internal metadata (org, platforms…)
2. Add business context
3. Tailor views to users
Transparency
• Data: Billing+ Metadata
• Tech: Spark + Tableau
HIERARCHY OF NEEDS
THE EFFICIENCY
Cloud Cost Dashboard
1. Enrich cost and usage data with
internal metadata (org, platforms…)
2. Add business context
3. Tailor views to users
Transparency
• Data: Billing+ Metadata
• Tech: Spark + Tableau
HIERARCHY OF NEEDS
THE EFFICIENCY
Libra
1. Visualize reserved and used instances
across zones and instance types
2. Rebalance as necessary
3. Built-in retry logic
Transparency
• Data: AWS Reservations API
• Tech: JavaScript
HIERARCHY OF NEEDS
THE EFFICIENCY
What story do you need to tell to
make your efficiency goal a reality ?
DeepDives
HIERARCHY OF NEEDS
THE EFFICIENCY
What story do you need to
tell to make your efficiency goal a reality ?
Sometimes it takes more than a few dashboards
to know how and where to invest your cloud efficiency efforts.
Tell a story (e.g., write a memo)
showing the potential impact of
your efficiency project to generate
buy-in from your organization.
Connect the components of
complex architectures to show
the bigger picture.
DeepDives
HIERARCHY OF NEEDS
THE EFFICIENCY
Darwin QL: new UI engine for TVs about to roll out, how
will it impact our cloud efficiency ?
Relative change in Demand
(demand = # requests x duration)
sequitur_service-prod
session_logs
evcache_yellow2
evcache_map_lt
evcache_ciners
evcache_chunk_vhs
evcache_ab
nccp-bladerunner
licenseaccounting-bladerunner
evcache_sub
evcache_pnp
playready
evcache_pbc_si
playback_history
api-prod
evcache_playlist
evcache_cineps
DeepDives
Data: Billing+ Request Tracing
HIERARCHY OF NEEDS
THE EFFICIENCY
Zuul
µ-Service
A
µ-Service Z
Keystone
(Kafka)
S3
ES
EMR,
BDAS, …
Mantis,
Spark, …
Kafka
Cass
DeepDives
Data is a big piece of our Cloud
costs, but tracking and attributing
that cost to a team is complex.
How do you holistically optimize a
distributed system ?
Cradle to Grave (C2G): track the end-to-end cost of
ingesting, storing and processing data at Netflix to identify
efficiency opportunities.
HIERARCHY OF NEEDS
THE EFFICIENCY
Zuul
µ-Service
A
µ-Service Z
Keystone
(Kafka)
S3
ES
EMR,
BDAS, …
Mantis,
Spark, …
Kafka
• Each system (red boxes), tracks its own resource apportioning at
the topic / table / job level.
• We add some logic and tribal knowledge to link topics / tables /
jobs from one system to the next.
Cass
DeepDives
HIERARCHY OF NEEDS
THE EFFICIENCY
What do you need to know,
and when do you want it ?
Actionable
Insights
HIERARCHY OF NEEDS
THE EFFICIENCY
What do you need to know,
and when do you want it ?
Strive to minimize the cognitive load for
your target audience.
Targeted messages: send alerts only
when something actionable occurs
(regressions, anomalous metrics), and
provide quick links to supporting
evidence.
Insights Digest: provide summaries
that quickly get to the important
message (typically per use-case).
Actionable
Insights
HIERARCHY OF NEEDS
THE EFFICIENCY
Efficiency Score Cards (email)
• 2 core efficiency metrics (system and business context).
• Monitor changes in magnitude and trend over weeks (non-operational).
• Link each card to a detailed dashboard
Actionable
Insights
Data: System Monitoring (Atlas)
HIERARCHY OF NEEDS
THE EFFICIENCY
Efficiency Score Cards (email)
• 2 core efficiency metrics (system and business context).
• Link each card to a detailed dashboard
Actionable
Insights
Data: System Monitoring (Atlas)
HIERARCHY OF NEEDS
THE EFFICIENCY
EC2 Alerts (Picsou)
• Compute reservation shortages
across all dimensions (accounts x
zones x instance families)
• List in descending order of cost
• Attribute to top growing apps
• Also sent as a digest email linked
back to Picsou
Actionable
Insights
Data: Billing + Tribal Knowledge + Metadata
HIERARCHY OF NEEDS
THE EFFICIENCY
Let the machines take over,
what could go wrong ?
Automation
HIERARCHY OF NEEDS
THE EFFICIENCY
Let the machines take over,
what could go wrong ?
Safely automate repetitive or complex tasks.
Start simple: rules-
engine, optimization,
…
Graduate your
Actionable Insights
Show your Work
Automation
HIERARCHY OF NEEDS
THE EFFICIENCY
S3 Storage Class Optimization
• Very similar to AWS S3 Analytics product.
• In fact, we use use AWS S3 access analysis data, but make our own
recommendations.
Automation
HIERARCHY OF NEEDS
THE EFFICIENCY
S3 Storage Class Optimization
• Every recommendation can be explained from the very same dashboard
Automation
Data: AWS S3 Analytics + Tribal Knowledge
WHERE DO WE STOP ?
4.The Future
RI management
Picsou (today)
- Explore cost and usage
- Notify of RI shortages
Picsou RI Recommendation (Q1’2018)
- Ingest output from shortage analysis (EC2 Alerts)
- Use Linear Programming to compute optimal RI modification/purchase
• Email recommendations to our finance partners for sanity checking and execution
• Collect feedback to improve optimization
• Define “recommendation score” for monitoring
• Once we gain enough confidence in the recommendation
• Automatically execute recommendation
Self-Service C2G
Give data producers, consumers and caretakers the ability to
manage their own efficiency :
• Identify all involved parties along a data-topics
• Apportion data-infrastructure cost to all relevant teams
• Quickly notice low usage data-topic
• Estimate data-replication or large sinks to users ratios
Long Term : enable data-platform owners to use this tool or underlying
data to add some automation.
Device-Cloud Efficiency
Expose the impact of Device/UI features on efficiency :
• Provide the relative cost change for AB test cells
• Attribute micro-services growth and cost to each Device/UI family
2. This is achieved by implementing the successive layers of our efficiency hierarchy of
needs :
1. Netflix culture, scale, architecture and priorities requires efficiency to be
championed by a central team, but enforced by all engineers.
2a. Transparency to get context,
2b. Deep Dives to tell compelling stories and assemble puzzles,
2c. Actionable Insights to reduce the cognitive load on your organization,
2d. Automation to scale the impact of efficiency efforts.
KEY TAKEWAYS
Thank you.
Netflix Talks @ re:Invent
Monday
10:45am ARC208:Walking the tightrope: Balancing Innovation, Reliability, Security, and Efficiency (Venetian)
12:15pm SID206: Best Practices for Managing Security on AWS (MGM)
Tuesday
10:45am ARC209: A Day in the Life of a Netflix Engineer (Venetian)
11:30am CMP204: How Netflix Tunes EC2 Instances for Performance (Venetian)
Wednesday
11:30am MCL317: Orchestrating ML Training for Netflix Recommendations (Venetian)
12:15pm NET303: A day in the life of a Cloud Network Engineer at Netflix (Venetian)
1:00pm ARC312: Why Regional Reservations are a Game Changer for Netflix (Venetian)
1:00pm SID304: SecOps 2021 Today: Using AWS Services to Deliver SecOps (MGM)
1:45pm DEV334: Performing Chaos at Netflix Scale (Venetian)
4:45pm SID316: Using Access Advisor to Strike the Balance Between Security and Usability (MGM)
Thursday
12:15pm CMP311: Auto Scaling Made Easy: How Target Tracking Scaling Policies Hit the Bullseye (Palazzo)
12:15pm DAT308: Codex: Conditional Modules Strike Back (Venetian)
12:55pm CMP309: How Netflix Encodes at Scale (Venetian)
5:00pm ABD401: How Netflix Monitors Applications Real Time with Kinesis (Aria)
Friday
8:30am ABD319: Tooling Up For Efficiency: DIY Solutions @ Netflix (Aria)
10:00am ABD401: Netflix Keystone SPaaS - Real-time Stream Processing as a Service (Aria)
Architecture
Mon 10:45am ARC208:Walking the tightrope: Balancing Innovation, Reliability, Security, and Efficiency (Venetian)
Tue 10:45am ARC209: A Day in the Life of a Netflix Engineer (Venetian)
Wed 1:00pm ARC312: Why Regional Reservations are a Game Changer for Netflix (Venetian)
Compute
Tue 11:30am CMP204: How Netflix Tunes EC2 Instances for Performance (Venetian)
Thu 12:15pm CMP311: Auto Scaling Made Easy: How Target Tracking Scaling Policies Hit the Bullseye (Palazzo)
Thu 12:55pm CMP309: How Netflix Encodes at Scale (Venetian)
Security, Compliance, and Identity
Mon 12:15pm SID206: Best Practices for Managing Security on AWS (MGM)
Wed 1:00pm SID304: SecOps 2021 Today: Using AWS Services to Deliver SecOps (MGM)
Wed 4:45pm SID316: Using Access Advisor to Strike the Balance Between Security and Usability (MGM)
Machine Learning
Wed 11:30am MCL317: Orchestrating ML Training for Netflix Recommendations (Venetian)
Networking
Wed 12:15pm NET303: A day in the life of a Cloud Network Engineer at Netflix (Venetian)
Developer Community
Wed 1:45pm DEV334: Performing Chaos at Netflix Scale (Venetian)
Databases
Thu 12:15pm DAT308: Codex: Conditional Modules Strike Back (Venetian)
Analytics & Big Data
Thu 5:00pm ABD401: How Netflix Monitors Applications Real Time with Kinesis (Aria)
Fri 8:30am ABD319: Tooling Up For Efficiency: DIY Solutions @ Netflix (Aria)
Fri 10:00am ABD401: Netflix Keystone SPaaS - Real-time Stream Processing as a Service (Aria)
Netflix Talks @ re:Invent
Headline
THERE CAN ALWAYS
BE MORE SLIDES
BILLING FILE (DBR)
AWS
0
200
400
600
800
1,000
1,200
1,400
APR-13 APR-14 APR-15 APR-16 APR-17
Millions
# lines / month
Credits
All the tools featured in this presentation were designed and built by
members of the Cloud Capacity Planning teams over the past 2 years,
specifically, Torio Risianto, Rajan Mittal, and Qian Li.

Weitere ähnliche Inhalte

Was ist angesagt?

How to Build Scalable Serverless Applications
How to Build Scalable Serverless ApplicationsHow to Build Scalable Serverless Applications
How to Build Scalable Serverless Applications
Amazon Web Services
 

Was ist angesagt? (20)

Introducing Amazon Fargate
Introducing Amazon FargateIntroducing Amazon Fargate
Introducing Amazon Fargate
 
GPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and Beyond
GPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and BeyondGPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and Beyond
GPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and Beyond
 
Scaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersScaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million Users
 
Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017
Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017
Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017
 
FSV302_An Architecture for Trade Capture and Regulatory Reporting
FSV302_An Architecture for Trade Capture and Regulatory ReportingFSV302_An Architecture for Trade Capture and Regulatory Reporting
FSV302_An Architecture for Trade Capture and Regulatory Reporting
 
Supercharge Your Machine Learning Solutions with Amazon SageMaker
Supercharge Your Machine Learning Solutions with Amazon SageMakerSupercharge Your Machine Learning Solutions with Amazon SageMaker
Supercharge Your Machine Learning Solutions with Amazon SageMaker
 
ABD206-Building Visualizations and Dashboards with Amazon QuickSight
ABD206-Building Visualizations and Dashboards with Amazon QuickSightABD206-Building Visualizations and Dashboards with Amazon QuickSight
ABD206-Building Visualizations and Dashboards with Amazon QuickSight
 
AMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdf
AMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdfAMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdf
AMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdf
 
How to Build Scalable Serverless Applications
How to Build Scalable Serverless ApplicationsHow to Build Scalable Serverless Applications
How to Build Scalable Serverless Applications
 
Optimising Cost and Efficiency on AWS
Optimising Cost and Efficiency on AWSOptimising Cost and Efficiency on AWS
Optimising Cost and Efficiency on AWS
 
GAM305_Automating Mother Nature
GAM305_Automating Mother NatureGAM305_Automating Mother Nature
GAM305_Automating Mother Nature
 
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
 
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
 
ABD327_Migrating Your Traditional Data Warehouse to a Modern Data Lake
ABD327_Migrating Your Traditional Data Warehouse to a Modern Data LakeABD327_Migrating Your Traditional Data Warehouse to a Modern Data Lake
ABD327_Migrating Your Traditional Data Warehouse to a Modern Data Lake
 
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
 
Unlocking New Todays - Artificial Intelligence and Data Platforms on AWS
Unlocking New Todays - Artificial Intelligence and Data Platforms on AWSUnlocking New Todays - Artificial Intelligence and Data Platforms on AWS
Unlocking New Todays - Artificial Intelligence and Data Platforms on AWS
 
MSC202_Learn How Salesforce Used ADCs for App Load Balancing for an Internati...
MSC202_Learn How Salesforce Used ADCs for App Load Balancing for an Internati...MSC202_Learn How Salesforce Used ADCs for App Load Balancing for an Internati...
MSC202_Learn How Salesforce Used ADCs for App Load Balancing for an Internati...
 
Migrating Your Databases to AWS – Tools and Services (Level 100)
Migrating Your Databases to AWS – Tools and Services (Level 100)Migrating Your Databases to AWS – Tools and Services (Level 100)
Migrating Your Databases to AWS – Tools and Services (Level 100)
 
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
 
DAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise Workloads
DAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise WorkloadsDAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise Workloads
DAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise Workloads
 

Ähnlich wie Tooling Up for Efficiency: DIY Solutions @ Netflix - ABD319 - re:Invent 2017

Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_data
Treasure Data, Inc.
 
Wicsa2011 cloud tutorial
Wicsa2011 cloud tutorialWicsa2011 cloud tutorial
Wicsa2011 cloud tutorial
Anna Liu
 
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
Amazon Web Services Korea
 

Ähnlich wie Tooling Up for Efficiency: DIY Solutions @ Netflix - ABD319 - re:Invent 2017 (20)

Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
 
Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web Services Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web Services
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsStrata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data Analytics
 
(ARC305) How J&J Manages AWS At Scale For Enterprise Workloads
(ARC305) How J&J Manages AWS At Scale For Enterprise Workloads(ARC305) How J&J Manages AWS At Scale For Enterprise Workloads
(ARC305) How J&J Manages AWS At Scale For Enterprise Workloads
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
SplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and LogsSplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and Logs
 
Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_data
 
How we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the wayHow we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the way
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
Cloud Computing ...changes everything
Cloud Computing ...changes everythingCloud Computing ...changes everything
Cloud Computing ...changes everything
 
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo AquinoFInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
 
Wicsa2011 cloud tutorial
Wicsa2011 cloud tutorialWicsa2011 cloud tutorial
Wicsa2011 cloud tutorial
 
How eStruxture Data Centers is Using ECE to Rapidly Scale Their Business
How eStruxture Data Centers is Using ECE to Rapidly Scale Their BusinessHow eStruxture Data Centers is Using ECE to Rapidly Scale Their Business
How eStruxture Data Centers is Using ECE to Rapidly Scale Their Business
 
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
 
Media Success Stories from the Cloud
Media Success Stories from the CloudMedia Success Stories from the Cloud
Media Success Stories from the Cloud
 
Technology Overview
Technology OverviewTechnology Overview
Technology Overview
 
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
 

Mehr von Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Tooling Up for Efficiency: DIY Solutions @ Netflix - ABD319 - re:Invent 2017

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tooling Up For Efficiency DIY Solutions @ Netflix Andrew Park Financial Planning & Analysis Sebastien de Larquier Science & Analytics A B D 3 1 9 D e c e m b e r 1 , 2 0 1 7
  • 2. Top three booth questions… 1. Can I get a free Netflix? 2. Why did show X leave the service? 3. What is Netflix doing here?
  • 3. Our Team - Cloud Capacity Analytics The Netflix Challenge The Efficiency Hierarchy of Needs The Future Contents Q&A
  • 4. CONTEXT AND CONCEPTS The Netflix Challenge
  • 5.
  • 7. eu-west-1 us-east-1 us-west-2 eu-west-1a eu-west-1b eu-west-1c us-east-1c us-east-1d us-east-1e us-west-2a us-west-2b us-west-2c m4 .large .xlarge .2xlarge .4xlarge .16xlarge m3 .medium .large .xlarge .2xlarge r4 .large .xlarge .2xlarge .4xlarge .8xlarge .16xlarge r3 .large .xlarge .2xlarge .4xlarge .8xlarge i2 .xlarge .2xlarge .4xlarge .8xlarge i3 .large .xlarge .2xlarge .4xlarge .8xlarge .16xlarge d2 .xlarge .2xlarge .4xlarge .8xlarge Three regions, nine availability zones 7Instancefamilies,35instancetypes 1500+ configs!
  • 9.
  • 10. CHARTER AND SUCCESS CRITERIA 2.Our Team Cloud Capacity Analytics
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Charter Support the data-related needs of the Cloud Capacity Planning function. Investigate trends, patterns and anomalies in core metrics. Suggest new data-driven approaches to existing workflows and goals.
  • 12. Success Criteria [quant] ✘super-linear growth ✔sub-linear growth Use a business-aware metric: $[cloud] / streams time
  • 13. Success Criteria [qual] Feedback from Engineering teams: • Regular use of our tools and insights • Raised awareness of their impact on efficiency • Pro-active engagement on efficiency projects
  • 14. FROM FUNDAMENTALS TO AUTOMATION 3.The Efficiency Hierarchy of Needs
  • 15. Automation Actionable Insights Deep Dives Transparency Intuitive and Interactive dashboards Exploratory analyses and case studies Targeted alerts, summary emails and personalized dashboards Optimization and Machine Learning HIERARCHY OF NEEDS THE EFFICIENCY
  • 16. HIERARCHY OF NEEDS THE EFFICIENCYTransparency What do you need to know before you can even ask about efficiency ?
  • 17. Transparency HIERARCHY OF NEEDS THE EFFICIENCY What do you need to know before you can even ask about efficiency ? Cost and Usage : AWS DBR or CUR file(s) System data : S3 Inventory, AWS CloudTrail, … Metadata : AWS Tags, Org structure, … Undocumented facts (a.k.a., tribal knowledge) “If you cannot measure it, you cannot improve it.” – Lord Kelvin
  • 18. HIERARCHY OF NEEDS THE EFFICIENCY What do you need to know before you can even ask about efficiency ? “That which is measured improves. That which is measured and reported improves exponentially.” – Karl Pearson (or Thomas Monson) 1. Tailor views to specific use cases 2. Add business context 3. When possible, co-locate with existing tools / workflows Transparency through dashboards, with a few important rules: Transparency
  • 19. HIERARCHY OF NEEDS THE EFFICIENCY Picsou [piksu] 1. Scrooge McDuck in french. 2. Netflix’s comprehensive cloud capacity analytics tool. Transparency • Data: Billing + Tribal Knowledge • Tech: Scala-app + Spark + React.js
  • 20. HIERARCHY OF NEEDS THE EFFICIENCY Cloud Cost Dashboard 1. Enrich cost and usage data with internal metadata (org, platforms…) 2. Add business context 3. Tailor views to users Transparency • Data: Billing+ Metadata • Tech: Spark + Tableau
  • 21. HIERARCHY OF NEEDS THE EFFICIENCY Cloud Cost Dashboard 1. Enrich cost and usage data with internal metadata (org, platforms…) 2. Add business context 3. Tailor views to users Transparency • Data: Billing+ Metadata • Tech: Spark + Tableau
  • 22. HIERARCHY OF NEEDS THE EFFICIENCY Libra 1. Visualize reserved and used instances across zones and instance types 2. Rebalance as necessary 3. Built-in retry logic Transparency • Data: AWS Reservations API • Tech: JavaScript
  • 23. HIERARCHY OF NEEDS THE EFFICIENCY What story do you need to tell to make your efficiency goal a reality ? DeepDives
  • 24. HIERARCHY OF NEEDS THE EFFICIENCY What story do you need to tell to make your efficiency goal a reality ? Sometimes it takes more than a few dashboards to know how and where to invest your cloud efficiency efforts. Tell a story (e.g., write a memo) showing the potential impact of your efficiency project to generate buy-in from your organization. Connect the components of complex architectures to show the bigger picture. DeepDives
  • 25. HIERARCHY OF NEEDS THE EFFICIENCY Darwin QL: new UI engine for TVs about to roll out, how will it impact our cloud efficiency ? Relative change in Demand (demand = # requests x duration) sequitur_service-prod session_logs evcache_yellow2 evcache_map_lt evcache_ciners evcache_chunk_vhs evcache_ab nccp-bladerunner licenseaccounting-bladerunner evcache_sub evcache_pnp playready evcache_pbc_si playback_history api-prod evcache_playlist evcache_cineps DeepDives Data: Billing+ Request Tracing
  • 26. HIERARCHY OF NEEDS THE EFFICIENCY Zuul µ-Service A µ-Service Z Keystone (Kafka) S3 ES EMR, BDAS, … Mantis, Spark, … Kafka Cass DeepDives Data is a big piece of our Cloud costs, but tracking and attributing that cost to a team is complex. How do you holistically optimize a distributed system ? Cradle to Grave (C2G): track the end-to-end cost of ingesting, storing and processing data at Netflix to identify efficiency opportunities.
  • 27. HIERARCHY OF NEEDS THE EFFICIENCY Zuul µ-Service A µ-Service Z Keystone (Kafka) S3 ES EMR, BDAS, … Mantis, Spark, … Kafka • Each system (red boxes), tracks its own resource apportioning at the topic / table / job level. • We add some logic and tribal knowledge to link topics / tables / jobs from one system to the next. Cass DeepDives
  • 28. HIERARCHY OF NEEDS THE EFFICIENCY What do you need to know, and when do you want it ? Actionable Insights
  • 29. HIERARCHY OF NEEDS THE EFFICIENCY What do you need to know, and when do you want it ? Strive to minimize the cognitive load for your target audience. Targeted messages: send alerts only when something actionable occurs (regressions, anomalous metrics), and provide quick links to supporting evidence. Insights Digest: provide summaries that quickly get to the important message (typically per use-case). Actionable Insights
  • 30. HIERARCHY OF NEEDS THE EFFICIENCY Efficiency Score Cards (email) • 2 core efficiency metrics (system and business context). • Monitor changes in magnitude and trend over weeks (non-operational). • Link each card to a detailed dashboard Actionable Insights Data: System Monitoring (Atlas)
  • 31. HIERARCHY OF NEEDS THE EFFICIENCY Efficiency Score Cards (email) • 2 core efficiency metrics (system and business context). • Link each card to a detailed dashboard Actionable Insights Data: System Monitoring (Atlas)
  • 32. HIERARCHY OF NEEDS THE EFFICIENCY EC2 Alerts (Picsou) • Compute reservation shortages across all dimensions (accounts x zones x instance families) • List in descending order of cost • Attribute to top growing apps • Also sent as a digest email linked back to Picsou Actionable Insights Data: Billing + Tribal Knowledge + Metadata
  • 33. HIERARCHY OF NEEDS THE EFFICIENCY Let the machines take over, what could go wrong ? Automation
  • 34. HIERARCHY OF NEEDS THE EFFICIENCY Let the machines take over, what could go wrong ? Safely automate repetitive or complex tasks. Start simple: rules- engine, optimization, … Graduate your Actionable Insights Show your Work Automation
  • 35. HIERARCHY OF NEEDS THE EFFICIENCY S3 Storage Class Optimization • Very similar to AWS S3 Analytics product. • In fact, we use use AWS S3 access analysis data, but make our own recommendations. Automation
  • 36. HIERARCHY OF NEEDS THE EFFICIENCY S3 Storage Class Optimization • Every recommendation can be explained from the very same dashboard Automation Data: AWS S3 Analytics + Tribal Knowledge
  • 37. WHERE DO WE STOP ? 4.The Future
  • 38. RI management Picsou (today) - Explore cost and usage - Notify of RI shortages Picsou RI Recommendation (Q1’2018) - Ingest output from shortage analysis (EC2 Alerts) - Use Linear Programming to compute optimal RI modification/purchase • Email recommendations to our finance partners for sanity checking and execution • Collect feedback to improve optimization • Define “recommendation score” for monitoring • Once we gain enough confidence in the recommendation • Automatically execute recommendation
  • 39. Self-Service C2G Give data producers, consumers and caretakers the ability to manage their own efficiency : • Identify all involved parties along a data-topics • Apportion data-infrastructure cost to all relevant teams • Quickly notice low usage data-topic • Estimate data-replication or large sinks to users ratios Long Term : enable data-platform owners to use this tool or underlying data to add some automation.
  • 40. Device-Cloud Efficiency Expose the impact of Device/UI features on efficiency : • Provide the relative cost change for AB test cells • Attribute micro-services growth and cost to each Device/UI family
  • 41. 2. This is achieved by implementing the successive layers of our efficiency hierarchy of needs : 1. Netflix culture, scale, architecture and priorities requires efficiency to be championed by a central team, but enforced by all engineers. 2a. Transparency to get context, 2b. Deep Dives to tell compelling stories and assemble puzzles, 2c. Actionable Insights to reduce the cognitive load on your organization, 2d. Automation to scale the impact of efficiency efforts. KEY TAKEWAYS
  • 43. Netflix Talks @ re:Invent Monday 10:45am ARC208:Walking the tightrope: Balancing Innovation, Reliability, Security, and Efficiency (Venetian) 12:15pm SID206: Best Practices for Managing Security on AWS (MGM) Tuesday 10:45am ARC209: A Day in the Life of a Netflix Engineer (Venetian) 11:30am CMP204: How Netflix Tunes EC2 Instances for Performance (Venetian) Wednesday 11:30am MCL317: Orchestrating ML Training for Netflix Recommendations (Venetian) 12:15pm NET303: A day in the life of a Cloud Network Engineer at Netflix (Venetian) 1:00pm ARC312: Why Regional Reservations are a Game Changer for Netflix (Venetian) 1:00pm SID304: SecOps 2021 Today: Using AWS Services to Deliver SecOps (MGM) 1:45pm DEV334: Performing Chaos at Netflix Scale (Venetian) 4:45pm SID316: Using Access Advisor to Strike the Balance Between Security and Usability (MGM) Thursday 12:15pm CMP311: Auto Scaling Made Easy: How Target Tracking Scaling Policies Hit the Bullseye (Palazzo) 12:15pm DAT308: Codex: Conditional Modules Strike Back (Venetian) 12:55pm CMP309: How Netflix Encodes at Scale (Venetian) 5:00pm ABD401: How Netflix Monitors Applications Real Time with Kinesis (Aria) Friday 8:30am ABD319: Tooling Up For Efficiency: DIY Solutions @ Netflix (Aria) 10:00am ABD401: Netflix Keystone SPaaS - Real-time Stream Processing as a Service (Aria)
  • 44. Architecture Mon 10:45am ARC208:Walking the tightrope: Balancing Innovation, Reliability, Security, and Efficiency (Venetian) Tue 10:45am ARC209: A Day in the Life of a Netflix Engineer (Venetian) Wed 1:00pm ARC312: Why Regional Reservations are a Game Changer for Netflix (Venetian) Compute Tue 11:30am CMP204: How Netflix Tunes EC2 Instances for Performance (Venetian) Thu 12:15pm CMP311: Auto Scaling Made Easy: How Target Tracking Scaling Policies Hit the Bullseye (Palazzo) Thu 12:55pm CMP309: How Netflix Encodes at Scale (Venetian) Security, Compliance, and Identity Mon 12:15pm SID206: Best Practices for Managing Security on AWS (MGM) Wed 1:00pm SID304: SecOps 2021 Today: Using AWS Services to Deliver SecOps (MGM) Wed 4:45pm SID316: Using Access Advisor to Strike the Balance Between Security and Usability (MGM) Machine Learning Wed 11:30am MCL317: Orchestrating ML Training for Netflix Recommendations (Venetian) Networking Wed 12:15pm NET303: A day in the life of a Cloud Network Engineer at Netflix (Venetian) Developer Community Wed 1:45pm DEV334: Performing Chaos at Netflix Scale (Venetian) Databases Thu 12:15pm DAT308: Codex: Conditional Modules Strike Back (Venetian) Analytics & Big Data Thu 5:00pm ABD401: How Netflix Monitors Applications Real Time with Kinesis (Aria) Fri 8:30am ABD319: Tooling Up For Efficiency: DIY Solutions @ Netflix (Aria) Fri 10:00am ABD401: Netflix Keystone SPaaS - Real-time Stream Processing as a Service (Aria) Netflix Talks @ re:Invent
  • 46. BILLING FILE (DBR) AWS 0 200 400 600 800 1,000 1,200 1,400 APR-13 APR-14 APR-15 APR-16 APR-17 Millions # lines / month
  • 47. Credits All the tools featured in this presentation were designed and built by members of the Cloud Capacity Planning teams over the past 2 years, specifically, Torio Risianto, Rajan Mittal, and Qian Li.