SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
Netflix Development Patterns for
Rapid Iteration, Scale, Performance, & Availability
Neil Hunt, Netflix
November 13, 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Are You Designing Systems That Are:
•
•
•
•

Web-scale
Global
Highly-available
Consumer-facing

• Cloud Native
Cloud Native
•
•
•
•
•

Service oriented architecture
Redundancy
Statelessness
NoSQL
Eventual consistency
Assumptions
Everything is Broken

Hardware will fail

Scale

Slowly Changing
Large Scale

Rapid Change
Large Scale

Telcos Web-Scale
Enterprise IT Startups
Slowly Changing
Small Scale

Rapid Change
Small Scale

Everything works

Software will fail
Speed
Netflix Cloud Goals:
Availability, Scale, Performance
Performance
• Reduce session start by 1s
Save 1 human lifetime per day!
Win more moments of truth
• Suggest choices 1% better
500k hours/day additional value delivered
Scale
•
•
•
•
•

50% y/y traffic growth
50 Countries, 3 continents
Tens of thousands of instances at peak
4 AWS regions, 12 datacenters
~$.001 per start
Availability
• Aspire to 4 x nines (99.99% of starts successful)
• Per Quarter:
– Downtime: < 3 mins (peak time)
– Successful starts: 9.999B
– Failures: 1M
 frustration, calls, lost business
Availabilities Compound
N Service
Dependencies
2

…

N dependencies

99.99%

.99

1000

99.99%

.999

100

99.99%

.9998

10

99.99N%

Availability

.9
Availabilities Compound
To achieve 99.99% availability
with 1000 components
requires:
or
99.9999% availability
for each dependency

Isolation for
independence

Component failure leads
to system failure

Component failure leads
to degradation rather than
system failure
Availability, Scale, Performance
Are Not Enough!
Rapid Iteration – Rate of Change
• Running tests
• Rolling out tests
– Engineering the winning test experience for scale

• Adding features
• Scaling up
• Removing features, simplifying, minimizing
Testing
• Up to 1,000 changes per day!
Rate of Change
• Change leads to bugs
–
–
–
–

New features
New configurations
New types of inputs
Scaling up

• Availability is in tension with rate of change
Availability / Rate of Change Tradeoff
Availability

99.999%

99.99%
Frontier of
availability/change
99.9%

99%
1

10

100

Rate of Change

1000
Availability / Rate of Change Tradeoff
Availability

99.999%

99.99%
Frontier of
availability/change
99.9%

99%
1

10

100

Rate of Change

1000
Shifting the Curve…
Availability

99.999%

99.99%

99.9%

99%
1

10

100

Rate of Change

1000
Shifting the Curve
• Must break the chained dependencies
that compound in cascading system failure
• Subsystem isolation:
– Failure in one component
should never result in cascading system failure
Isolating Subsystems
Redundant systems with timeout & failover
• Failure of instance
• Failure of network
• Latency monkey to
test

Dependent
System
Timeout

Dependence
Isolating Subsystems
Redundant systems with timeout & failover
• Failure of instance
• Failure of network

Higher Tier
System
Longer
timeout
Dependent
System
Short
timeout

• Latency monkey to
test
Dependence
Isolating Subsystems
Timeout with fallback default response
• Network failure
• Software bug

{ status=mem,
plan=4,
device=true }

Dependent
System
Timeout &
Default response

Dependence
Isolating Subsystems
Canary Push
• Network failure
• Software bug

Dependent
System
Timeout

Canary
instance
new code

Dependence
Isolating Subsystems
Red/Black deployment
• Software bugs

Dependent
System
Fail back to
old code

Bad code
pushed

Dependence
V2.3

Dependence
V2.2
Isolating Subsystems
Standby Blue system
• Independent
implementation
• Simplified logic

Dependent
System
Fail to static
version

Static reference
implementation
Dependence
V2.3
Isolating Subsystems

Load
Balancer

Zone isolation
• Infrastructure failure
(e.g. power outage)

Zone A

Zone B

Dependent
System

Dependent
System

• Chaos Gorilla
Dependence

Dependence
Isolating Subsystems
Region isolation
DNS

• Infrastructure
software bugs
(e.g. load
balancer fail)
• Chaos Kong

Region E

Region W

Load
Balancer

Load
Balancer

Zone A

Zone B

Zone A

Zone B

Dependen
t System

Dependen
t System

Dependen
t System

Dependen
t System

Dependence

Dependence

Dependence

Dependence
Isolating Subsystems
Dependency Mode

Isolating Technique

Instance Failure
Network failure

Redundant systems with failover and timeout
Timeout with default response

Network failure
Software bug

Canary push
Red-black deployment
Blue systems

Infrastructure failure

Zone isolation

Cross-zone software bugs

Region isolation
Trying Harder Won’t Cut It
• Trying harder gets a linear return on an exponential
problem
• Need to be great at execution
AND
Have the right architecture
• What architectural features are you using to ensure
availability, scale, performance, & rapid rate of change?
Please give us your feedback on this
presentation

DMG206
As a thank you, we will select prize
winners daily for completed surveys!

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Kubernetes Architecture - beyond a black box - Part 1
Kubernetes Architecture - beyond a black box - Part 1Kubernetes Architecture - beyond a black box - Part 1
Kubernetes Architecture - beyond a black box - Part 1
 
Infrastructure as Code Maturity Model v1
Infrastructure as Code Maturity Model v1Infrastructure as Code Maturity Model v1
Infrastructure as Code Maturity Model v1
 
Kubernetes Networking
Kubernetes NetworkingKubernetes Networking
Kubernetes Networking
 
Kks sre book_ch1,2
Kks sre book_ch1,2Kks sre book_ch1,2
Kks sre book_ch1,2
 
Istio Service Mesh
Istio Service MeshIstio Service Mesh
Istio Service Mesh
 
Containers Anywhere with OpenShift by Red Hat
Containers Anywhere with OpenShift by Red HatContainers Anywhere with OpenShift by Red Hat
Containers Anywhere with OpenShift by Red Hat
 
Openshift Container Platform
Openshift Container PlatformOpenshift Container Platform
Openshift Container Platform
 
CI/CD on AWS
CI/CD on AWSCI/CD on AWS
CI/CD on AWS
 
Containers and Docker
Containers and DockerContainers and Docker
Containers and Docker
 
Introduction to Red Hat OpenShift 4
Introduction to Red Hat OpenShift 4Introduction to Red Hat OpenShift 4
Introduction to Red Hat OpenShift 4
 
Building DevOps in the Enterprise: Balancing Centralized and Decentralized Teams
Building DevOps in the Enterprise: Balancing Centralized and Decentralized TeamsBuilding DevOps in the Enterprise: Balancing Centralized and Decentralized Teams
Building DevOps in the Enterprise: Balancing Centralized and Decentralized Teams
 
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
 
Introduction to Chaos Engineering
Introduction to Chaos EngineeringIntroduction to Chaos Engineering
Introduction to Chaos Engineering
 
Rancher MasterClass - Avoiding-configuration-drift.pptx
Rancher  MasterClass - Avoiding-configuration-drift.pptxRancher  MasterClass - Avoiding-configuration-drift.pptx
Rancher MasterClass - Avoiding-configuration-drift.pptx
 
Netflix Massively Scalable, Highly Available, Immutable Infrastructure
Netflix Massively Scalable, Highly Available, Immutable InfrastructureNetflix Massively Scalable, Highly Available, Immutable Infrastructure
Netflix Massively Scalable, Highly Available, Immutable Infrastructure
 
Microservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SREMicroservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SRE
 
From Monolithic to Microservices (AWS & Digital Goodie)
From Monolithic to Microservices (AWS & Digital Goodie)From Monolithic to Microservices (AWS & Digital Goodie)
From Monolithic to Microservices (AWS & Digital Goodie)
 
Introduction to Istio on Kubernetes
Introduction to Istio on KubernetesIntroduction to Istio on Kubernetes
Introduction to Istio on Kubernetes
 
Cloud Native Landscape (CNCF and OCI)
Cloud Native Landscape (CNCF and OCI)Cloud Native Landscape (CNCF and OCI)
Cloud Native Landscape (CNCF and OCI)
 
Microservices, Kubernetes and Istio - A Great Fit!
Microservices, Kubernetes and Istio - A Great Fit!Microservices, Kubernetes and Istio - A Great Fit!
Microservices, Kubernetes and Istio - A Great Fit!
 

Andere mochten auch

Andere mochten auch (20)

Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
 
Migrating My.T-Mobile.com to AWS (ENT214) | AWS re:Invent 2013
Migrating My.T-Mobile.com to AWS (ENT214) | AWS re:Invent 2013Migrating My.T-Mobile.com to AWS (ENT214) | AWS re:Invent 2013
Migrating My.T-Mobile.com to AWS (ENT214) | AWS re:Invent 2013
 
Java Microservices with Netflix OSS & Spring
Java Microservices with Netflix OSS & Spring Java Microservices with Netflix OSS & Spring
Java Microservices with Netflix OSS & Spring
 
MicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scaleMicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scale
 
Expanding you business through M&A - Hosting Industry
Expanding you business through M&A - Hosting IndustryExpanding you business through M&A - Hosting Industry
Expanding you business through M&A - Hosting Industry
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
 
Hulu.com
Hulu.comHulu.com
Hulu.com
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013
 
Arc305 how netflix leverages multiple regions to increase availability an i...
Arc305 how netflix leverages multiple regions to increase availability   an i...Arc305 how netflix leverages multiple regions to increase availability   an i...
Arc305 how netflix leverages multiple regions to increase availability an i...
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSF
 
Svc 202-netflix-open-source
Svc 202-netflix-open-sourceSvc 202-netflix-open-source
Svc 202-netflix-open-source
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
Recsys 2014 Keynote: The Value of Better Recommendations - For Businesses, Co...
Recsys 2014 Keynote: The Value of Better Recommendations - For Businesses, Co...Recsys 2014 Keynote: The Value of Better Recommendations - For Businesses, Co...
Recsys 2014 Keynote: The Value of Better Recommendations - For Businesses, Co...
 
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)
 
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)
 
Prometheus Overview
Prometheus OverviewPrometheus Overview
Prometheus Overview
 
A quick comparison between Netflix, Hulu, iTunes and whatnot
A quick comparison between Netflix, Hulu, iTunes and whatnotA quick comparison between Netflix, Hulu, iTunes and whatnot
A quick comparison between Netflix, Hulu, iTunes and whatnot
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
 
Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning Models
 

Ähnlich wie Netflix Development Patterns for Scale, Performance & Availability (DMG206) | AWS re:Invent 2013

Tiger oracle
Tiger oracleTiger oracle
Tiger oracle
d0nn9n
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQLWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Continuent
 

Ähnlich wie Netflix Development Patterns for Scale, Performance & Availability (DMG206) | AWS re:Invent 2013 (20)

More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
 
(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...
(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...
(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...
 
Embracing Failure - Fault Injection and Service Resilience at Netflix
Embracing Failure - Fault Injection and Service Resilience at NetflixEmbracing Failure - Fault Injection and Service Resilience at Netflix
Embracing Failure - Fault Injection and Service Resilience at Netflix
 
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
 
Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that grow
 
Service Stampede: Surviving a Thousand Services
Service Stampede: Surviving a Thousand ServicesService Stampede: Surviving a Thousand Services
Service Stampede: Surviving a Thousand Services
 
Mini-Training: Netflix Simian Army
Mini-Training: Netflix Simian ArmyMini-Training: Netflix Simian Army
Mini-Training: Netflix Simian Army
 
8 cloud design patterns you ought to know - Update Conference 2018
8 cloud design patterns you ought to know - Update Conference 20188 cloud design patterns you ought to know - Update Conference 2018
8 cloud design patterns you ought to know - Update Conference 2018
 
High Availability in the Cloud - Architectural Best Practices
High Availability in the Cloud - Architectural Best PracticesHigh Availability in the Cloud - Architectural Best Practices
High Availability in the Cloud - Architectural Best Practices
 
Designing apps for resiliency
Designing apps for resiliencyDesigning apps for resiliency
Designing apps for resiliency
 
Tiger oracle
Tiger oracleTiger oracle
Tiger oracle
 
(ISM301) Engineering Netflix Global Operations In The Cloud
(ISM301) Engineering Netflix Global Operations In The Cloud(ISM301) Engineering Netflix Global Operations In The Cloud
(ISM301) Engineering Netflix Global Operations In The Cloud
 
Engineering Netflix Global Operations in the Cloud
Engineering Netflix Global Operations in the CloudEngineering Netflix Global Operations in the Cloud
Engineering Netflix Global Operations in the Cloud
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQLWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
 
So we're running Apache ZooKeeper. Now What? By Camille Fournier
So we're running Apache ZooKeeper. Now What? By Camille Fournier So we're running Apache ZooKeeper. Now What? By Camille Fournier
So we're running Apache ZooKeeper. Now What? By Camille Fournier
 
Site reliability in the Serverless age - Serverless Boston 2019
Site reliability in the Serverless age  - Serverless Boston 2019Site reliability in the Serverless age  - Serverless Boston 2019
Site reliability in the Serverless age - Serverless Boston 2019
 
Tokyo azure meetup #12 service fabric internals
Tokyo azure meetup #12   service fabric internalsTokyo azure meetup #12   service fabric internals
Tokyo azure meetup #12 service fabric internals
 
Cloud Design Patterns - Hong Kong Codeaholics
Cloud Design Patterns - Hong Kong CodeaholicsCloud Design Patterns - Hong Kong Codeaholics
Cloud Design Patterns - Hong Kong Codeaholics
 
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
 
Why Distributed Databases?
Why Distributed Databases?Why Distributed Databases?
Why Distributed Databases?
 

Mehr von Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Kürzlich hochgeladen (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Netflix Development Patterns for Scale, Performance & Availability (DMG206) | AWS re:Invent 2013

  • 1. Netflix Development Patterns for Rapid Iteration, Scale, Performance, & Availability Neil Hunt, Netflix November 13, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 2. Are You Designing Systems That Are: • • • • Web-scale Global Highly-available Consumer-facing • Cloud Native
  • 3. Cloud Native • • • • • Service oriented architecture Redundancy Statelessness NoSQL Eventual consistency
  • 4. Assumptions Everything is Broken Hardware will fail Scale Slowly Changing Large Scale Rapid Change Large Scale Telcos Web-Scale Enterprise IT Startups Slowly Changing Small Scale Rapid Change Small Scale Everything works Software will fail Speed
  • 6. Performance • Reduce session start by 1s Save 1 human lifetime per day! Win more moments of truth • Suggest choices 1% better 500k hours/day additional value delivered
  • 7. Scale • • • • • 50% y/y traffic growth 50 Countries, 3 continents Tens of thousands of instances at peak 4 AWS regions, 12 datacenters ~$.001 per start
  • 8. Availability • Aspire to 4 x nines (99.99% of starts successful) • Per Quarter: – Downtime: < 3 mins (peak time) – Successful starts: 9.999B – Failures: 1M  frustration, calls, lost business
  • 9. Availabilities Compound N Service Dependencies 2 … N dependencies 99.99% .99 1000 99.99% .999 100 99.99% .9998 10 99.99N% Availability .9
  • 10. Availabilities Compound To achieve 99.99% availability with 1000 components requires: or 99.9999% availability for each dependency Isolation for independence Component failure leads to system failure Component failure leads to degradation rather than system failure
  • 12. Rapid Iteration – Rate of Change • Running tests • Rolling out tests – Engineering the winning test experience for scale • Adding features • Scaling up • Removing features, simplifying, minimizing
  • 13. Testing • Up to 1,000 changes per day!
  • 14. Rate of Change • Change leads to bugs – – – – New features New configurations New types of inputs Scaling up • Availability is in tension with rate of change
  • 15. Availability / Rate of Change Tradeoff Availability 99.999% 99.99% Frontier of availability/change 99.9% 99% 1 10 100 Rate of Change 1000
  • 16. Availability / Rate of Change Tradeoff Availability 99.999% 99.99% Frontier of availability/change 99.9% 99% 1 10 100 Rate of Change 1000
  • 18. Shifting the Curve • Must break the chained dependencies that compound in cascading system failure • Subsystem isolation: – Failure in one component should never result in cascading system failure
  • 19. Isolating Subsystems Redundant systems with timeout & failover • Failure of instance • Failure of network • Latency monkey to test Dependent System Timeout Dependence
  • 20. Isolating Subsystems Redundant systems with timeout & failover • Failure of instance • Failure of network Higher Tier System Longer timeout Dependent System Short timeout • Latency monkey to test Dependence
  • 21. Isolating Subsystems Timeout with fallback default response • Network failure • Software bug { status=mem, plan=4, device=true } Dependent System Timeout & Default response Dependence
  • 22. Isolating Subsystems Canary Push • Network failure • Software bug Dependent System Timeout Canary instance new code Dependence
  • 23. Isolating Subsystems Red/Black deployment • Software bugs Dependent System Fail back to old code Bad code pushed Dependence V2.3 Dependence V2.2
  • 24. Isolating Subsystems Standby Blue system • Independent implementation • Simplified logic Dependent System Fail to static version Static reference implementation Dependence V2.3
  • 25. Isolating Subsystems Load Balancer Zone isolation • Infrastructure failure (e.g. power outage) Zone A Zone B Dependent System Dependent System • Chaos Gorilla Dependence Dependence
  • 26. Isolating Subsystems Region isolation DNS • Infrastructure software bugs (e.g. load balancer fail) • Chaos Kong Region E Region W Load Balancer Load Balancer Zone A Zone B Zone A Zone B Dependen t System Dependen t System Dependen t System Dependen t System Dependence Dependence Dependence Dependence
  • 27. Isolating Subsystems Dependency Mode Isolating Technique Instance Failure Network failure Redundant systems with failover and timeout Timeout with default response Network failure Software bug Canary push Red-black deployment Blue systems Infrastructure failure Zone isolation Cross-zone software bugs Region isolation
  • 28. Trying Harder Won’t Cut It • Trying harder gets a linear return on an exponential problem • Need to be great at execution AND Have the right architecture • What architectural features are you using to ensure availability, scale, performance, & rapid rate of change?
  • 29. Please give us your feedback on this presentation DMG206 As a thank you, we will select prize winners daily for completed surveys!