6. Context
● Operational intelligence, Edge Engineering
● Near real time detection of anomalies from
customers' streaming experience
● Stream processing platform to run ad hoc
queries on live event streams
7. Context
● Operational intelligence, Edge Engineering
● Near real time detection of anomalies from
customers' streaming experience
● Stream processing platform to run ad hoc
queries on live event streams
● Iterative detection of coarse grain and fine
grain signals
30. Our motivations for new framework
● Cloud native
○ Large variation in peak to trough usage
○ Servers are more ephemeral
31. Our motivations for new framework
● Cloud native
○ Large variation in peak to trough usage
○ Servers are more ephemeral
● Resource usage optimizations
53. Rules based cluster autoscaling
● Set up rules per host attribute value
○ E.g., one autoscale rule per ASG/cluster, one cluster for network-
intensive jobs, another for CPU/memory-intensive jobs
● Sample:
○ ClusterName, MinIdle, MaxIdle, CoolDownSecs
○ NwkClstr,10,20,360; CmputeClstr,5,20,300
#Idle
hosts
Trigger down scale
Trigger up scale
min
max
54. Shortfall based cluster autoscaling
● Rule-based scale up has a cool down period
○ What if there’s a surge of incoming requests?
● Pending requests trigger shortfall analysis
○ Scale up happens regardless of cool down period
○ Remembers which tasks have already been covered
55. Combining predictive autoscaling
● EC2 Auto Scaling groups have 3 numbers to control
● Fenzo sets “desirable” value
● Set max value to limit scale up
● Scryer can set min value
1
http://techblog.netflix.com/2013/11/scryer-netflixs-predictive-auto-scaling.html