SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
Cloud Capacity Planning
South Bay SRE meetup - August 9th, 2016
● Cloud Capacity Planning..an Oxymoron?
● Santa Cloud: How Netflix Does Holiday Capacity Planning
● The Data Behind the Planning
Presenting...
Cloud Capacity Planning..an Oxymoron?
South Bay SRE Meetup: August 9th, 2016
● > 83M households
● 190 Countries
● 35% of Internet traffic in US at peak
● Entirely on Cloud*, three regions
● Evacuate a region monthly...for 24 hours
● Capacity planning ~ 5 people! (in the room :-)
* Content served from homegrown OpenConnect CDN
Capacity Planning Concerns
● Facility considerations (Space, Power, Network, Cooling)
● Supply Chain Management Constraints and Relationships
● Hardware lifetime contour & failure rates (MTBF)
● Systems management staff
● Seasonal and unexpected burst considerations
● Workload colocation and performance demands
● Over-provisioning for reliability and rate of innovation
● Effective tooling
● Business continuity planning
(Cloud) Capacity Planning Concerns
● Facility considerations (Power, Network, Cooling)
● Supply Chain Management Constraints and Relationships
● Hardware lifetime contour & failure rates (MTBF)
● Systems management staff
● Seasonal and unexpected burst considerations
● Workload colocation and performance demands
● Over-provisioning for reliability and rate of innovation
● Effective tooling
● Business continuity planning
Cloud-specific CP Factors
● Capacity bounds..unknown (-)
● Vendor Decisions (-/+)
○ Hardware/Offering Evolution Timeline
○ Resource Demand (CPU/Mem/Disk/Net) Matrix
● On-Demand Capability (+)
Netflix Model
● Depend on the AWS on-demand pool for elasticity
● Monitor insufficient capacity exceptions (ICEs) for boundaries
● Invest heavily in 3 year reservations
● Maintain relatively few, large reserved pools
● Cloud Capacity Analytics team develops tools for insight
● Leverage cross-account resource borrowing
The Triad Cloud Impact
Innovation
Reliability
Efficiency
Default Preferred
Considerations of Scale
● Capacity required for critical footprint might require “guarantees”
● API-based observability has limits
● All resources have capacity limits/throttles
● Resource limits by default set for lowest common denominator
● Get creative with unused, but paid for capacity
● Billing file size!
Summary
Capacity
Planning
Coburn Watson
● Director of Performance and Reliability at Netflix
○ Site Reliability Engineering, Performance and OS Engineering, Traffic Management, Chaos Engineering,
Capacity Planning, Cloud Network Engineering
● @coburnw, cwatson@netflix.com
● Looking for some great capacity planning-minded folks
● Performance and Reliability Youtube Channel

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (10)

Engineering Tools at Netflix: Enabling Continuous Delivery
Engineering Tools at Netflix: Enabling Continuous DeliveryEngineering Tools at Netflix: Enabling Continuous Delivery
Engineering Tools at Netflix: Enabling Continuous Delivery
 
OTT & The Future of Connected TV
OTT & The Future of Connected TVOTT & The Future of Connected TV
OTT & The Future of Connected TV
 
Continuous Delivery at Netflix, and beyond
Continuous Delivery at Netflix, and beyondContinuous Delivery at Netflix, and beyond
Continuous Delivery at Netflix, and beyond
 
Implementing DevOps
Implementing DevOpsImplementing DevOps
Implementing DevOps
 
Splitting the Check on Compliance and Security
Splitting the Check on Compliance and SecuritySplitting the Check on Compliance and Security
Splitting the Check on Compliance and Security
 
Linux Instrumentation
Linux InstrumentationLinux Instrumentation
Linux Instrumentation
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
 
(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s Dilemma(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s Dilemma
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPF
 
Culture
CultureCulture
Culture
 

Mehr von Coburn Watson

Mehr von Coburn Watson (6)

Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
 
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
 
goto; London: Keeping your Cloud Footprint in Check
goto; London: Keeping your Cloud Footprint in Checkgoto; London: Keeping your Cloud Footprint in Check
goto; London: Keeping your Cloud Footprint in Check
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
#lspe Q1 2013 dynamically scaling netflix in the cloud
#lspe Q1 2013   dynamically scaling netflix in the cloud#lspe Q1 2013   dynamically scaling netflix in the cloud
#lspe Q1 2013 dynamically scaling netflix in the cloud
 
AWS Re:Invent - Optimizing Costs with AWS
AWS Re:Invent -  Optimizing Costs with AWSAWS Re:Invent -  Optimizing Costs with AWS
AWS Re:Invent - Optimizing Costs with AWS
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Cloud Capacity Planning..an Oxymoron? - South Bay SRE Meetup Aug-09-2016

  • 1. Cloud Capacity Planning South Bay SRE meetup - August 9th, 2016
  • 2. ● Cloud Capacity Planning..an Oxymoron? ● Santa Cloud: How Netflix Does Holiday Capacity Planning ● The Data Behind the Planning Presenting...
  • 3. Cloud Capacity Planning..an Oxymoron? South Bay SRE Meetup: August 9th, 2016
  • 4. ● > 83M households ● 190 Countries ● 35% of Internet traffic in US at peak ● Entirely on Cloud*, three regions ● Evacuate a region monthly...for 24 hours ● Capacity planning ~ 5 people! (in the room :-) * Content served from homegrown OpenConnect CDN
  • 5. Capacity Planning Concerns ● Facility considerations (Space, Power, Network, Cooling) ● Supply Chain Management Constraints and Relationships ● Hardware lifetime contour & failure rates (MTBF) ● Systems management staff ● Seasonal and unexpected burst considerations ● Workload colocation and performance demands ● Over-provisioning for reliability and rate of innovation ● Effective tooling ● Business continuity planning
  • 6. (Cloud) Capacity Planning Concerns ● Facility considerations (Power, Network, Cooling) ● Supply Chain Management Constraints and Relationships ● Hardware lifetime contour & failure rates (MTBF) ● Systems management staff ● Seasonal and unexpected burst considerations ● Workload colocation and performance demands ● Over-provisioning for reliability and rate of innovation ● Effective tooling ● Business continuity planning
  • 7.
  • 8. Cloud-specific CP Factors ● Capacity bounds..unknown (-) ● Vendor Decisions (-/+) ○ Hardware/Offering Evolution Timeline ○ Resource Demand (CPU/Mem/Disk/Net) Matrix ● On-Demand Capability (+)
  • 9. Netflix Model ● Depend on the AWS on-demand pool for elasticity ● Monitor insufficient capacity exceptions (ICEs) for boundaries ● Invest heavily in 3 year reservations ● Maintain relatively few, large reserved pools ● Cloud Capacity Analytics team develops tools for insight ● Leverage cross-account resource borrowing
  • 10. The Triad Cloud Impact Innovation Reliability Efficiency Default Preferred
  • 11.
  • 12. Considerations of Scale ● Capacity required for critical footprint might require “guarantees” ● API-based observability has limits ● All resources have capacity limits/throttles ● Resource limits by default set for lowest common denominator ● Get creative with unused, but paid for capacity ● Billing file size!
  • 14. Coburn Watson ● Director of Performance and Reliability at Netflix ○ Site Reliability Engineering, Performance and OS Engineering, Traffic Management, Chaos Engineering, Capacity Planning, Cloud Network Engineering ● @coburnw, cwatson@netflix.com ● Looking for some great capacity planning-minded folks ● Performance and Reliability Youtube Channel