The document discusses optimizing costs when using AWS. It describes Netflix's methodology which includes monitoring usage at scale across their applications, services, and teams. Key aspects involve dynamically adjusting capacity for workloads, maximizing unused reservations, and balancing online transaction processing and batch demands through performance testing and optimization of AWS resources and auto scaling groups. The document shares examples of monitoring and optimization results Netflix achieved through their Asgard framework and open sourcing plans.
4. Rationale
• Applications operate at massive scale
• Service-oriented architecture has many moving parts (teams)
• Netflix development model includes unconstrained deployment
capabilities
• “Freedom and Responsibility” culture
• Improve availability; avoid saturation of key resources
• Dynamically adjust capacity to meet workload demands
• Plan for increased workload…we’re growing
• Maximize efficiency
• Balance OLTP and batch demands
• “That which is measured improves”
5. • Asgard framework enables turnkey deployment (Netflix open-sourced)
• All engineers have full access
• Real-time reservation capacity
• Unconstrained ASG size limits
6.
7. • Birds-eye view of usage
• Near real-time data
• Open sourcing plans for tool
• Decomposes by application
8.
9.
10.
11.
12.
13. Unused Reservation Instance Hours *
2,000
1,500 Need to
use this
1,000 capacity
500
0
Mon Tue Wed Thu Fri Sat
* - fictitious volumes
14.
15.
16. Healthy
Thrashing
Double-Jump
Y-axis = number of instances in ASG