Spotinst provides advanced auto scaling and deployment tools for Kubernetes and ECS. It has over 1000 customers and has raised $17 million. Spotinst uses heterogeneous autoscaling across different instance types and auto scales infrastructure and services to ensure SLAs while achieving 60-80% cost savings. The Spotinst autoscaler scales according to cluster needs by detecting insufficient resources and satisfying pod requirements without waiting for thresholds. It also scales down unused instances and maintains a buffer of spare capacity.
3. 1,500 VMs
100,000 VMs
Jan 2016 Dec 2017
Spotinst powers workloads with cloud excess capacity
Managed VMs by Spotinst, per month
Ensures SLA
60-80% Cost SavingsThe Fastest
Growing
Company in
Deloitte Fast 50
of 2017
4. Agenda
Anatomy of k8s
“Heterogenous” or “Tetris” scaling
Scaling k8s - the old school way
The problems of old school scaling
k8s autoscaler concepts and implementation
5. Anatomy of k8s
Kubernetes Scheduler
Kubernetes Node
Pod
containercontainer
Pod
containercontainer
Ingress Controller
Kubernetes Node
Pod
containercontainer
Pod
containercontainer
Kubernetes Node
Pod
containercontainer
Pod
containercontainer
6. 2 Layers Heterogenous Autoscaling
c3.large c3.2xlarge m3.medInfrastructure
Distributed Cluster
Service auto scaling
Infrastructure auto scaling
Containers
7. How do you scale (Infrastructure Layer)
today?
When total Memory / CPU Reservation & Utilization meet a certain
threshold
11. k8s auto-scaler
No scaling policies required
Scale according to cluster needs )events)
Fast scaling - don’t wait for thresholds, satisfy your tasks needs
Scale down when instances are fragmented (But!)
Keep headroom for incoming pods
Head room is not reservation, but rather units of work
12. Catching the right events
Insufficient Memory
Insufficient CPU
No nodes are available that match all of the following
predicates:: Insufficient memory, PodToleratesNodeTaints.
No nodes are available that match all of the following
predicates:: Insufficient cpu, PodToleratesNodeTaints.
13. Scale up - Logic
According to the pending Pods, sum the required amount of resources -
CPU and memory (e.g 10 cores, 28 GB RAM)
Determine the instance type that can handle the most resource-
consuming task
14. Scale down - Defragmentation
Look for idle instances - CPU & memory below 40%
3 consecutive periods of 1 min
Make sure the pods running on this instance can run on other
instances
CPU
RAM
All pods’ ports are available on other instances
Drain the instance
Wait for the pods to be rescheduled on other instances
Terminate the instance
16. Headroom
A buffer of spare capacity that makes sure that when we want to
scale more tasks, we don't have to wait for new instances.
Headroom is defined as follows:
unit : CPU and RAM.
number of units.
17. Headroom - Example
For example:
Unit: 1024 CPU units & 512 MB RAM
5 units
c3.2xlargeInfrastructure
Distributed Cluster
Containers
m3.medm3.med m3.med