SlideShare a Scribd company logo
1 of 49
Dystopia as a Service
April 2013
Adrian Cockcroft
@adrianco #netflixcloud @NetflixOSS
http://www.linkedin.com/in/adriancockcroft
Dystopia
Cloud Native
NetflixOSS – Cloud Native On-Ramp
Opportunities
Dystopia - Abstract
We have spent years striving to build perfect apps running on
perfect kernels on perfect CPUs connected by perfect
networks, but this utopia hasn't really arrived. Instead we live
in a dystopian world of buggy apps changing several times a
day running on JVMs running on an old version of Linux
running on Xen running on something I can't see, that only
exists for a few hours, connected by a network of unknown
topology and operated by many layers of automation.
I will discuss the new challenges and demands of living in this
dystopian world of cloud based services. I will also give an
overview of the Netflix open source cloud platform (see
netflix.github.com) that we use to create our own island of
utopian agility and availability regardless of what is going on
underneath.
We are Engineers
We solve hard problems
We build amazing and complex things
We fix things when they break
We strive for perfection
Perfect code
Perfect hardware
Perfectly operated
But perfection takes too long…
So we compromise
Time to market vs. Quality
Utopia remains out of reach
Where time to market wins big
Web services
Agile infrastructure - cloud
Continuous deployment
How Soon?
Code features in days instead of months
Hardware in minutes instead of weeks
Incident response in seconds instead of hours
Tipping the Balance
Utopia Dystopia
A new engineering challenge
Construct a highly agile and highly
available service from ephemeral and
often broken components
Cloud Native
How does Netflix work?
Netflix Member Web Site Home Page
Personalization Driven – What goes on to make this?
How Netflix Streaming Works
Customer Device
(PC, PS3, TV…)
Web Site or
Discovery API
User Data
Personalization
Streaming API
DRM
QoS Logging
OpenConnect
CDN Boxes
CDN
Management and
Steering
Content Encoding
Consumer
Electronics
AWS Cloud
Services
CDN Edge
Locations
Content Delivery Service
Open Source Hardware Design + FreeBSD, bird, nginx
November 2012 Traffic
Real Web Server Dependencies Flow
(Netflix Home page business transaction as seen by AppDynamics)
Start Here
memcached
Cassandra
Web service
S3 bucket
Three Personalization movie group
choosers (for US, Canada and Latam)
Each icon is
three to a few
hundred
instances
across three
AWS zones
Cloud Native Architecture
Distributed Quorum
NoSQL Datastores
Autoscaled Micro
Services
Autoscaled Micro
Services
Clients Things
JVM JVM
JVM JVM
Cassandra Cassandra Cassandra
Memcached
JVM
Zone A Zone B Zone C
New Anti-Fragile Patterns
Micro-services
Chaos engines
Highly available systems composed
from ephemeral components
Stateless Micro-Service Architecture
Linux Base AMI (CentOS or Ubuntu)
Optional
Apache
frontend,
memcached,
non-java apps
Monitoring
Log rotation
to S3
AppDynamics
machineagent
Epic/Atlas
Java (JDK 6 or 7)
AppDynamics
appagent
monitoring
GC and thread
dump logging
Tomcat
Application war file, base
servlet, platform, client
interface jars, Astyanax
Healthcheck, status
servlets, JMX interface,
Servo autoscale
Cassandra Instance Architecture
Linux Base AMI (CentOS or Ubuntu)
Tomcat and
Priam on JDK
Healthcheck,
Status
Monitoring
AppDynamics
machineagent
Epic/Atlas
Java (JDK 7)
AppDynamics
appagent
monitoring
GC and thread
dump logging
Cassandra Server
Local Ephemeral Disk Space – 2TB of SSD or 1.6TB disk
holding Commit log and SSTables
Cloud Native
Master copies of data are cloud resident
Everything is dynamically provisioned
All services are ephemeral
Dynamic Scalability
Asgard
http://techblog.netflix.com/2012/06/asgard-web-based-cloud-management-and.html
Ephemeral Instances
• Largest services are autoscaled
• Average lifetime of an instance is 36 hours
P
u
s
h
Autoscale Up
Autoscale Down
Managing Multi-Region Availability
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
Regional Load Balancers
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
Regional Load Balancers
UltraDNS
DynECT
DNS
AWS
Route53
A portable way to manage multiple DNS providers from Java
Denominator
A Cloud Native Open Source Platform
Inspiration
Antifragile API Patterns
Functional Reactive with Circuit Breakers and Bulkheads
Establish our
solutions as Best
Practices / Standards
Hire, Retain and
Engage Top
Engineers
Build up Netflix
Technology Brand
Benefit from a
shared ecosystem
Goals
Github
NetflixOSS
Source
AWS
Base AMI
Maven
Central
Cloudbees
Jenkins
Aminator
Bakery
Dynaslave
AWS Build
Slaves
Asgard
(+ Frigga)
Console
AWS
Baked AMIs
Odin
Orchestration
API
AWS
Account
NetflixOSS Continuous Build and Deployment
AWS Account
Asgard Console
Archaius Config
Service
Cross region
Priam C*
Explorers
Dashboards
Atlas
Monitoring
Genie Hadoop
Services
Multiple AWS Regions
Eureka Registry
Exhibitor ZK
Edda History
Simian Army
3 AWS Zones
Application
Clusters
Autoscale Groups
Instances
Priam
Cassandra
Persistent Storage
Evcache
Memcached
Ephemeral Storage
NetflixOSS Services Scope
•Baked AMI – Tomcat, Apache, your code
•Governator – Guice based dependency injection
•Archaius – dynamic configuration properties client
•Eureka - service registration client
Initialization
•Karyon - Base Server for inbound requests
•RxJava – Reactive pattern
•Hystrix/Turbine – dependencies and real-time status
•Ribbon - REST Client for outbound calls
Service
Requests
•Astyanax – Cassandra client and pattern library
•Evcache – Zone aware Memcached client
•Curator – Zookeeper patterns
•Denominator – DNS routing abstraction
Data Access
•Blitz4j – non-blocking logging
•Servo – metrics export for autoscaling
•Atlas – high volume instrumentation
Logging
NetflixOSS Instance Libraries
•CassJmeter – Load testing for Cassandra
•Circus Monkey – Test account reservation rebalancingTest Tools
•Janitor Monkey – Cleans up unused resources
•Efficiency Monkey
•Doctor Monkey
•Howler Monkey – Complains about expiring certs
Maintenance
•Chaos Monkey – Kills Instances
•Chaos Gorilla – Kills Availability Zones
•Chaos Kong – Kills Regions
•Latency Monkey – Latency and error injection
Availability
•Security Monkey
•Conformity MonkeySecurity
NetflixOSS Testing and Automation
Example Application – RSS Reader
More Use Cases
More
Features
Better portability
Higher availability
Easier to deploy
Contributions from end users
Contributions from vendors
What’s Coming Next?
Vendor Driven Portability
Interest in using NetflixOSS for Enterprise Private Clouds
“It’s done when it runs Asgard”
Functionally complete
Demonstrated March
Release 3.3 in 2Q13
Some vendor interest
Needs AWS compatible Autoscaler
Some vendor interest
Many missing features
Bait and switch AWS API strategy
Netflix Cloud Prize
Boosting the @NetflixOSS Ecosystem
Entrants
Netflix
Engineering
Judges Winners
Nominations
Conforms to
Rules
Working
Code
Community
Traction
Categories
Registration
Opened
March 13
Github
Apache
Licensed
Contributions
Github
Close Entries
September 15
Github
Award
Ceremony
Dinner
November
AWS
Re:Invent
Ten Prize
Categories
$10K cash
$5K AWS
AWS
Re:Invent
Tickets
Trophy
Functionality and scale now, portability coming
Moving from parts to a platform in 2013
Netflix is fostering an ecosystem
Rapid Evolution - Low MTBIAMSH
(Mean Time Between Idea And Making Stuff Happen)
Opportunities
Monoculture
Replicate “the best” as patterns
Reduce interaction complexity
But… epidemic single point of failure
Pattern Failures
Infrastructure Pattern Failures
Software Stack Pattern Failures
Application Pattern Failures
Infrastructure Pattern Failures
• Device failures – bad batch of disks, PSUs, etc.
• CPU failures – cache corruption, math errors
• Datacenter failures – power, network, disaster
• Routing failures – DNS, Internet/ISP path
Software Stack Pattern Failures
• Time bombs – Counter wrap, memory leak
• Date bombs - Leap year, leap second, epoch
• Expiration – Certs timing out
• Trust revocation – Certificate Authority fails
• Security exploit – everything compromised
• Language bugs – compilers and runtime
Application Pattern Failures
• Content bombs – Data dependent failure
• Configuration – wrong/bad syntax
• Versioning – incompatible mixes
• Cascading failures – error handling bugs etc.
• Cascading overload – excessive logging etc.
• Network bugs – routers, firewalls, protocols
What to do?
Automated diversity management
Diversify the automation as well
Efficient vs. Antifragile trade-off
Linux Foundation
• Strengths
– Ubiquitous support, open source is the default
• Weaknesses
– Networking vs. BSD, observability
• Opportunities
– Optimize for ephemeral dynamic use cases
• Threats
– Epidemic failure modes – e.g. “leap second”
Takeaway
Netflix is making it easy for everyone to adopt Cloud Native patterns.
Optimize for dystopia and diversity.
http://netflix.github.com
http://techblog.netflix.com
http://slideshare.net/Netflix
http://www.linkedin.com/in/adriancockcroft
@adrianco #netflixcloud @NetflixOSS

More Related Content

What's hot

SV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformSV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformAdrian Cockcroft
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsAdrian Cockcroft
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud ArchitectureAdrian Cockcroft
 
Performance architecture for cloud connect
Performance architecture for cloud connectPerformance architecture for cloud connect
Performance architecture for cloud connectAdrian Cockcroft
 
Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Adrian Cockcroft
 
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Adrian Cockcroft
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Adrian Cockcroft
 
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Adrian Cockcroft
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSFAdrian Cockcroft
 
Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumAdrian Cockcroft
 
Asgard, the Grails App that Deploys Netflix to the Cloud
Asgard, the Grails App that Deploys Netflix to the CloudAsgard, the Grails App that Deploys Netflix to the Cloud
Asgard, the Grails App that Deploys Netflix to the CloudJoe Sondow
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Adrian Cockcroft
 
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012Amazon Web Services
 
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012Amazon Web Services
 
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012Amazon Web Services
 
(ISM301) Engineering Netflix Global Operations In The Cloud
(ISM301) Engineering Netflix Global Operations In The Cloud(ISM301) Engineering Netflix Global Operations In The Cloud
(ISM301) Engineering Netflix Global Operations In The CloudAmazon Web Services
 
High Availability in the Cloud - Architectural Best Practices
High Availability in the Cloud - Architectural Best PracticesHigh Availability in the Cloud - Architectural Best Practices
High Availability in the Cloud - Architectural Best PracticesRightScale
 

What's hot (20)

SV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformSV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source Platform
 
Netflix in the Cloud
Netflix in the CloudNetflix in the Cloud
Netflix in the Cloud
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and Ops
 
NetflixOSS Meetup
NetflixOSS MeetupNetflixOSS Meetup
NetflixOSS Meetup
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud Architecture
 
Performance architecture for cloud connect
Performance architecture for cloud connectPerformance architecture for cloud connect
Performance architecture for cloud connect
 
Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Netflix Velocity Conference 2011
Netflix Velocity Conference 2011
 
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)
 
Netflix in the cloud 2011
Netflix in the cloud 2011Netflix in the cloud 2011
Netflix in the cloud 2011
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
 
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSF
 
Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV Forum
 
Asgard, the Grails App that Deploys Netflix to the Cloud
Asgard, the Grails App that Deploys Netflix to the CloudAsgard, the Grails App that Deploys Netflix to the Cloud
Asgard, the Grails App that Deploys Netflix to the Cloud
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
 
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
 
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012
ARC301 Intro to Chaos Monkey & the Simian Army - AWS re: Invent 2012
 
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
 
(ISM301) Engineering Netflix Global Operations In The Cloud
(ISM301) Engineering Netflix Global Operations In The Cloud(ISM301) Engineering Netflix Global Operations In The Cloud
(ISM301) Engineering Netflix Global Operations In The Cloud
 
High Availability in the Cloud - Architectural Best Practices
High Availability in the Cloud - Architectural Best PracticesHigh Availability in the Cloud - Architectural Best Practices
High Availability in the Cloud - Architectural Best Practices
 

Viewers also liked

Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesAdrian Cockcroft
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Adrian Cockcroft
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSAdrian Cockcroft
 
Microservices Workshop All Topics Deck 2016
Microservices Workshop All Topics Deck 2016Microservices Workshop All Topics Deck 2016
Microservices Workshop All Topics Deck 2016Adrian Cockcroft
 
Microservices and IBM Bluemix meetup presentation
Microservices and IBM Bluemix meetup presentationMicroservices and IBM Bluemix meetup presentation
Microservices and IBM Bluemix meetup presentationCarlos Ferreira
 
When Developers Operate and Operators Develop
When Developers Operate and Operators DevelopWhen Developers Operate and Operators Develop
When Developers Operate and Operators DevelopAdrian Cockcroft
 
Openstack Silicon Valley - Vendor Lock In
Openstack Silicon Valley - Vendor Lock InOpenstack Silicon Valley - Vendor Lock In
Openstack Silicon Valley - Vendor Lock InAdrian Cockcroft
 
Cloud Trends Nov2015 Structure
Cloud Trends Nov2015 StructureCloud Trends Nov2015 Structure
Cloud Trends Nov2015 StructureAdrian Cockcroft
 
Hyperdex - A closer look
Hyperdex - A closer lookHyperdex - A closer look
Hyperdex - A closer lookDECK36
 
LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013dotCloud
 
Blazes: coordination analysis for distributed programs
Blazes: coordination analysis for distributed programsBlazes: coordination analysis for distributed programs
Blazes: coordination analysis for distributed programspalvaro
 

Viewers also liked (13)

Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
 
Netflix and Open Source
Netflix and Open SourceNetflix and Open Source
Netflix and Open Source
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWS
 
Microxchg Microservices
Microxchg MicroservicesMicroxchg Microservices
Microxchg Microservices
 
Microservices Workshop All Topics Deck 2016
Microservices Workshop All Topics Deck 2016Microservices Workshop All Topics Deck 2016
Microservices Workshop All Topics Deck 2016
 
Microservices and IBM Bluemix meetup presentation
Microservices and IBM Bluemix meetup presentationMicroservices and IBM Bluemix meetup presentation
Microservices and IBM Bluemix meetup presentation
 
When Developers Operate and Operators Develop
When Developers Operate and Operators DevelopWhen Developers Operate and Operators Develop
When Developers Operate and Operators Develop
 
Openstack Silicon Valley - Vendor Lock In
Openstack Silicon Valley - Vendor Lock InOpenstack Silicon Valley - Vendor Lock In
Openstack Silicon Valley - Vendor Lock In
 
Cloud Trends Nov2015 Structure
Cloud Trends Nov2015 StructureCloud Trends Nov2015 Structure
Cloud Trends Nov2015 Structure
 
Hyperdex - A closer look
Hyperdex - A closer lookHyperdex - A closer look
Hyperdex - A closer look
 
LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013
 
Blazes: coordination analysis for distributed programs
Blazes: coordination analysis for distributed programsBlazes: coordination analysis for distributed programs
Blazes: coordination analysis for distributed programs
 

Similar to Dystopia as a Service

Web Scale Applications using NeflixOSS Cloud Platform
Web Scale Applications using NeflixOSS Cloud PlatformWeb Scale Applications using NeflixOSS Cloud Platform
Web Scale Applications using NeflixOSS Cloud PlatformSudhir Tonse
 
Microservices reativos usando a stack do Netflix na AWS
Microservices reativos usando a stack do Netflix na AWSMicroservices reativos usando a stack do Netflix na AWS
Microservices reativos usando a stack do Netflix na AWSDiego Pacheco
 
How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...
How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...
How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...Amazon Web Services
 
Netflix presents at MassTLC Cloud Summit 2013
Netflix presents at MassTLC Cloud Summit 2013Netflix presents at MassTLC Cloud Summit 2013
Netflix presents at MassTLC Cloud Summit 2013MassTLC
 
(CMP404) Cloud Rendering at Walt Disney Animation Studios
(CMP404) Cloud Rendering at Walt Disney Animation Studios(CMP404) Cloud Rendering at Walt Disney Animation Studios
(CMP404) Cloud Rendering at Walt Disney Animation StudiosAmazon Web Services
 
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...Amazon Web Services
 
C# Client to Cloud
C# Client to CloudC# Client to Cloud
C# Client to CloudStuart Lodge
 
Getting Started with AWS Lambda and the Serverless Cloud
Getting Started with AWS Lambda and the Serverless CloudGetting Started with AWS Lambda and the Serverless Cloud
Getting Started with AWS Lambda and the Serverless CloudAmazon Web Services
 
Usman Shakeel - Cloud Rendering at Scale :: AWS Rendering Seminar
Usman Shakeel - Cloud Rendering at Scale :: AWS Rendering SeminarUsman Shakeel - Cloud Rendering at Scale :: AWS Rendering Seminar
Usman Shakeel - Cloud Rendering at Scale :: AWS Rendering SeminarAmazon Web Services Korea
 
MassTLC Cloud Summit Keynote
MassTLC Cloud Summit KeynoteMassTLC Cloud Summit Keynote
MassTLC Cloud Summit KeynoteAriel Tseitlin
 
Distributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystemDistributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystemZhenzhong Xu
 
AWS re:Invent 2016: Serverless IoT Back Ends (IOT401)
AWS re:Invent 2016: Serverless IoT Back Ends (IOT401)AWS re:Invent 2016: Serverless IoT Back Ends (IOT401)
AWS re:Invent 2016: Serverless IoT Back Ends (IOT401)Amazon Web Services
 
Architecting for the Cloud: Best Practices
Architecting for the Cloud: Best PracticesArchitecting for the Cloud: Best Practices
Architecting for the Cloud: Best PracticesAmazon Web Services
 
AWS Architecting Cloud Apps - Best Practices and Design Patterns By Jinesh Varia
AWS Architecting Cloud Apps - Best Practices and Design Patterns By Jinesh VariaAWS Architecting Cloud Apps - Best Practices and Design Patterns By Jinesh Varia
AWS Architecting Cloud Apps - Best Practices and Design Patterns By Jinesh VariaAmazon Web Services
 
Aws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaAws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaHelen Rogers
 
Serverless Web Apps using API Gateway, Lambda and DynamoDB
Serverless Web Apps using API Gateway, Lambda and DynamoDBServerless Web Apps using API Gateway, Lambda and DynamoDB
Serverless Web Apps using API Gateway, Lambda and DynamoDBAmazon Web Services
 

Similar to Dystopia as a Service (20)

Web Scale Applications using NeflixOSS Cloud Platform
Web Scale Applications using NeflixOSS Cloud PlatformWeb Scale Applications using NeflixOSS Cloud Platform
Web Scale Applications using NeflixOSS Cloud Platform
 
Microservices reativos usando a stack do Netflix na AWS
Microservices reativos usando a stack do Netflix na AWSMicroservices reativos usando a stack do Netflix na AWS
Microservices reativos usando a stack do Netflix na AWS
 
How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...
How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...
How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...
 
Svc 202-netflix-open-source
Svc 202-netflix-open-sourceSvc 202-netflix-open-source
Svc 202-netflix-open-source
 
Netflix presents at MassTLC Cloud Summit 2013
Netflix presents at MassTLC Cloud Summit 2013Netflix presents at MassTLC Cloud Summit 2013
Netflix presents at MassTLC Cloud Summit 2013
 
Global Netflix Platform
Global Netflix PlatformGlobal Netflix Platform
Global Netflix Platform
 
(CMP404) Cloud Rendering at Walt Disney Animation Studios
(CMP404) Cloud Rendering at Walt Disney Animation Studios(CMP404) Cloud Rendering at Walt Disney Animation Studios
(CMP404) Cloud Rendering at Walt Disney Animation Studios
 
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...
 
C# Client to Cloud
C# Client to CloudC# Client to Cloud
C# Client to Cloud
 
Getting Started with AWS Lambda and the Serverless Cloud
Getting Started with AWS Lambda and the Serverless CloudGetting Started with AWS Lambda and the Serverless Cloud
Getting Started with AWS Lambda and the Serverless Cloud
 
Usman Shakeel - Cloud Rendering at Scale :: AWS Rendering Seminar
Usman Shakeel - Cloud Rendering at Scale :: AWS Rendering SeminarUsman Shakeel - Cloud Rendering at Scale :: AWS Rendering Seminar
Usman Shakeel - Cloud Rendering at Scale :: AWS Rendering Seminar
 
MassTLC Cloud Summit Keynote
MassTLC Cloud Summit KeynoteMassTLC Cloud Summit Keynote
MassTLC Cloud Summit Keynote
 
Best of re:Invent
Best of re:InventBest of re:Invent
Best of re:Invent
 
Distributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystemDistributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystem
 
AWS re:Invent 2016: Serverless IoT Back Ends (IOT401)
AWS re:Invent 2016: Serverless IoT Back Ends (IOT401)AWS re:Invent 2016: Serverless IoT Back Ends (IOT401)
AWS re:Invent 2016: Serverless IoT Back Ends (IOT401)
 
Architecting for the Cloud: Best Practices
Architecting for the Cloud: Best PracticesArchitecting for the Cloud: Best Practices
Architecting for the Cloud: Best Practices
 
AWS Architecting Cloud Apps - Best Practices and Design Patterns By Jinesh Varia
AWS Architecting Cloud Apps - Best Practices and Design Patterns By Jinesh VariaAWS Architecting Cloud Apps - Best Practices and Design Patterns By Jinesh Varia
AWS Architecting Cloud Apps - Best Practices and Design Patterns By Jinesh Varia
 
The Best of re:invent 2016
The Best of re:invent 2016The Best of re:invent 2016
The Best of re:invent 2016
 
Aws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaAws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon Elisha
 
Serverless Web Apps using API Gateway, Lambda and DynamoDB
Serverless Web Apps using API Gateway, Lambda and DynamoDBServerless Web Apps using API Gateway, Lambda and DynamoDB
Serverless Web Apps using API Gateway, Lambda and DynamoDB
 

Recently uploaded

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 

Recently uploaded (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 

Dystopia as a Service

  • 1. Dystopia as a Service April 2013 Adrian Cockcroft @adrianco #netflixcloud @NetflixOSS http://www.linkedin.com/in/adriancockcroft
  • 2. Dystopia Cloud Native NetflixOSS – Cloud Native On-Ramp Opportunities
  • 3. Dystopia - Abstract We have spent years striving to build perfect apps running on perfect kernels on perfect CPUs connected by perfect networks, but this utopia hasn't really arrived. Instead we live in a dystopian world of buggy apps changing several times a day running on JVMs running on an old version of Linux running on Xen running on something I can't see, that only exists for a few hours, connected by a network of unknown topology and operated by many layers of automation. I will discuss the new challenges and demands of living in this dystopian world of cloud based services. I will also give an overview of the Netflix open source cloud platform (see netflix.github.com) that we use to create our own island of utopian agility and availability regardless of what is going on underneath.
  • 4. We are Engineers We solve hard problems We build amazing and complex things We fix things when they break
  • 5. We strive for perfection Perfect code Perfect hardware Perfectly operated
  • 6. But perfection takes too long… So we compromise Time to market vs. Quality Utopia remains out of reach
  • 7. Where time to market wins big Web services Agile infrastructure - cloud Continuous deployment
  • 8. How Soon? Code features in days instead of months Hardware in minutes instead of weeks Incident response in seconds instead of hours
  • 10. A new engineering challenge Construct a highly agile and highly available service from ephemeral and often broken components
  • 11. Cloud Native How does Netflix work?
  • 12. Netflix Member Web Site Home Page Personalization Driven – What goes on to make this?
  • 13. How Netflix Streaming Works Customer Device (PC, PS3, TV…) Web Site or Discovery API User Data Personalization Streaming API DRM QoS Logging OpenConnect CDN Boxes CDN Management and Steering Content Encoding Consumer Electronics AWS Cloud Services CDN Edge Locations
  • 14. Content Delivery Service Open Source Hardware Design + FreeBSD, bird, nginx
  • 16. Real Web Server Dependencies Flow (Netflix Home page business transaction as seen by AppDynamics) Start Here memcached Cassandra Web service S3 bucket Three Personalization movie group choosers (for US, Canada and Latam) Each icon is three to a few hundred instances across three AWS zones
  • 17. Cloud Native Architecture Distributed Quorum NoSQL Datastores Autoscaled Micro Services Autoscaled Micro Services Clients Things JVM JVM JVM JVM Cassandra Cassandra Cassandra Memcached JVM Zone A Zone B Zone C
  • 18. New Anti-Fragile Patterns Micro-services Chaos engines Highly available systems composed from ephemeral components
  • 19. Stateless Micro-Service Architecture Linux Base AMI (CentOS or Ubuntu) Optional Apache frontend, memcached, non-java apps Monitoring Log rotation to S3 AppDynamics machineagent Epic/Atlas Java (JDK 6 or 7) AppDynamics appagent monitoring GC and thread dump logging Tomcat Application war file, base servlet, platform, client interface jars, Astyanax Healthcheck, status servlets, JMX interface, Servo autoscale
  • 20. Cassandra Instance Architecture Linux Base AMI (CentOS or Ubuntu) Tomcat and Priam on JDK Healthcheck, Status Monitoring AppDynamics machineagent Epic/Atlas Java (JDK 7) AppDynamics appagent monitoring GC and thread dump logging Cassandra Server Local Ephemeral Disk Space – 2TB of SSD or 1.6TB disk holding Commit log and SSTables
  • 21. Cloud Native Master copies of data are cloud resident Everything is dynamically provisioned All services are ephemeral
  • 24. Ephemeral Instances • Largest services are autoscaled • Average lifetime of an instance is 36 hours P u s h Autoscale Up Autoscale Down
  • 25. Managing Multi-Region Availability Cassandra Replicas Zone A Cassandra Replicas Zone B Cassandra Replicas Zone C Regional Load Balancers Cassandra Replicas Zone A Cassandra Replicas Zone B Cassandra Replicas Zone C Regional Load Balancers UltraDNS DynECT DNS AWS Route53 A portable way to manage multiple DNS providers from Java Denominator
  • 26. A Cloud Native Open Source Platform
  • 28. Antifragile API Patterns Functional Reactive with Circuit Breakers and Bulkheads
  • 29. Establish our solutions as Best Practices / Standards Hire, Retain and Engage Top Engineers Build up Netflix Technology Brand Benefit from a shared ecosystem Goals
  • 30. Github NetflixOSS Source AWS Base AMI Maven Central Cloudbees Jenkins Aminator Bakery Dynaslave AWS Build Slaves Asgard (+ Frigga) Console AWS Baked AMIs Odin Orchestration API AWS Account NetflixOSS Continuous Build and Deployment
  • 31. AWS Account Asgard Console Archaius Config Service Cross region Priam C* Explorers Dashboards Atlas Monitoring Genie Hadoop Services Multiple AWS Regions Eureka Registry Exhibitor ZK Edda History Simian Army 3 AWS Zones Application Clusters Autoscale Groups Instances Priam Cassandra Persistent Storage Evcache Memcached Ephemeral Storage NetflixOSS Services Scope
  • 32. •Baked AMI – Tomcat, Apache, your code •Governator – Guice based dependency injection •Archaius – dynamic configuration properties client •Eureka - service registration client Initialization •Karyon - Base Server for inbound requests •RxJava – Reactive pattern •Hystrix/Turbine – dependencies and real-time status •Ribbon - REST Client for outbound calls Service Requests •Astyanax – Cassandra client and pattern library •Evcache – Zone aware Memcached client •Curator – Zookeeper patterns •Denominator – DNS routing abstraction Data Access •Blitz4j – non-blocking logging •Servo – metrics export for autoscaling •Atlas – high volume instrumentation Logging NetflixOSS Instance Libraries
  • 33. •CassJmeter – Load testing for Cassandra •Circus Monkey – Test account reservation rebalancingTest Tools •Janitor Monkey – Cleans up unused resources •Efficiency Monkey •Doctor Monkey •Howler Monkey – Complains about expiring certs Maintenance •Chaos Monkey – Kills Instances •Chaos Gorilla – Kills Availability Zones •Chaos Kong – Kills Regions •Latency Monkey – Latency and error injection Availability •Security Monkey •Conformity MonkeySecurity NetflixOSS Testing and Automation
  • 35. More Use Cases More Features Better portability Higher availability Easier to deploy Contributions from end users Contributions from vendors What’s Coming Next?
  • 36. Vendor Driven Portability Interest in using NetflixOSS for Enterprise Private Clouds “It’s done when it runs Asgard” Functionally complete Demonstrated March Release 3.3 in 2Q13 Some vendor interest Needs AWS compatible Autoscaler Some vendor interest Many missing features Bait and switch AWS API strategy
  • 37. Netflix Cloud Prize Boosting the @NetflixOSS Ecosystem
  • 38.
  • 39. Entrants Netflix Engineering Judges Winners Nominations Conforms to Rules Working Code Community Traction Categories Registration Opened March 13 Github Apache Licensed Contributions Github Close Entries September 15 Github Award Ceremony Dinner November AWS Re:Invent Ten Prize Categories $10K cash $5K AWS AWS Re:Invent Tickets Trophy
  • 40. Functionality and scale now, portability coming Moving from parts to a platform in 2013 Netflix is fostering an ecosystem Rapid Evolution - Low MTBIAMSH (Mean Time Between Idea And Making Stuff Happen)
  • 42. Monoculture Replicate “the best” as patterns Reduce interaction complexity But… epidemic single point of failure
  • 43. Pattern Failures Infrastructure Pattern Failures Software Stack Pattern Failures Application Pattern Failures
  • 44. Infrastructure Pattern Failures • Device failures – bad batch of disks, PSUs, etc. • CPU failures – cache corruption, math errors • Datacenter failures – power, network, disaster • Routing failures – DNS, Internet/ISP path
  • 45. Software Stack Pattern Failures • Time bombs – Counter wrap, memory leak • Date bombs - Leap year, leap second, epoch • Expiration – Certs timing out • Trust revocation – Certificate Authority fails • Security exploit – everything compromised • Language bugs – compilers and runtime
  • 46. Application Pattern Failures • Content bombs – Data dependent failure • Configuration – wrong/bad syntax • Versioning – incompatible mixes • Cascading failures – error handling bugs etc. • Cascading overload – excessive logging etc. • Network bugs – routers, firewalls, protocols
  • 47. What to do? Automated diversity management Diversify the automation as well Efficient vs. Antifragile trade-off
  • 48. Linux Foundation • Strengths – Ubiquitous support, open source is the default • Weaknesses – Networking vs. BSD, observability • Opportunities – Optimize for ephemeral dynamic use cases • Threats – Epidemic failure modes – e.g. “leap second”
  • 49. Takeaway Netflix is making it easy for everyone to adopt Cloud Native patterns. Optimize for dystopia and diversity. http://netflix.github.com http://techblog.netflix.com http://slideshare.net/Netflix http://www.linkedin.com/in/adriancockcroft @adrianco #netflixcloud @NetflixOSS