Resource Scheduling using Apache Mesos in Cloud Native Environments

Heterogeneous Resource Scheduling Using
Apache Mesos for Cloud Native Frameworks
Sharma Podila
Senior Software Engineer
Netflix
June 2015

Scale to Internet traffic
… embrace failure

Context
● Operational intelligence, Edge Engineering

Context
● Near real time detection of anomalies from
customers' streaming experience

Context
● Stream processing platform to run ad hoc
queries on live event streams

Context
● Stream processing platform to run ad hoc
queries on live event streams
● Iterative detection of coarse grain and fine
grain signals

Efficient platform
● do it cheap

Efficient platform
● do it cheap
● do it quick

Efficient platform
● do it cheap
● do it quick
● scale with Netflix traffic

Mantis: reactive stream processing
● Cloud native

● Cloud native
● Lightweight jobs

● Cloud native
● Dynamic jobs

● Cloud native
● Dynamic jobs
● Custom SLAs

Bi-directional data sourcingSourceappinstances
0
1
2
999
0
1
2
3
4
5
Source connector
job
Subscribe with query
Data pushed with
subscription ID
User
Job 1
User
Job 2
User
Job 3
Subscription
Discovery

Mantis system overview
MantisMantis
Apache Mesos
Apache Mesos
Mantis
Apache Mesos
ASG
ASG
ASG
Fenzo
Mesos
Framework
ZooKeeper
Instance
Apache Mesos
Slave Cluster

Mantis scheduling model
JobJobJob
Source
Stage1
Stage2
Sink
Jobs may be perpetual or
on-demand/interactive
Slave instances

Apache Mesos Architecture
Mesos slave
FrmWrk2 executor
TaskTask
Mesos slave
FrmWrk2 executor
FrmWrk1 executor
TaskTask
Mesos master Standby master Standby master
Mesos slave
FrmWrk1 executor
TaskTask
FrmWrk1 FrmWrk2
ZooKeeper
quorum

Resource granularity
Instance 1
Instance 1
Task A
Instance 2
Task B
Instance 1
Task A
Instance 1
Task A
Task B
Task C Task D
vs.

Task start latency
vs.
Instance startup in mins. Mesos task startup in <1sec

Do we need yet another Mesos
framework?

A choice of Mesos frameworks
exists in the community

Likely easy to write a new framework

What about scale, performance, fault tolerance, and
availability?

What about scale, performance, fault tolerance, and
availability?
Scheduling is a hard problem to solve

Long term justification needed to
create a new Mesos framework

Our motivations for new framework
● Cloud native
○ Large variation in peak to trough usage
○ Servers are more ephemeral

Our motivations for new framework
● Cloud native
○ Large variation in peak to trough usage
○ Servers are more ephemeral
● Resource usage optimizations

Cluster autoscaling challenge
Host 1 Host 2 Host 3 Host 4

Cluster autoscaling challenge
vs.

Stream locality challenge
Data
stream
Host A
Task1
Host B
Task2
Host C
Task3

Stream locality challenge
Data
stream
Host A
Task1
Host B
Task2
Host C
Task3
Data
stream
Host X
Task1
Task2
Task3
vs.
Reduces network bandwidth usage

Backpressure challenge
On back pressure, scale
horizontally
Processor task
Source
Stage1
Stage2
Stage3
Sink
Hot source Vs. cold source

Fenzo task scheduler
Generic task scheduler
Heterogeneous
Autoscale
Visibility
Plugins for
Constraints, Fitness
High speed

Fenzo usage in frameworks
Mesos master
Mesos framework
Tasks
requests
Available
resource
offers
Fenzo task
scheduler
Task assignment result
• Host1
• Task1
• Task2
• Host2
• Task3
• Task4
Persistence

Scheduling problem
Fitness
Pending
Assigned
UrgencyN tasks to assign from M possible slaves

Scheduling optimizations
Speed Accuracy
First fit assignment Optimal assignment
Real world trade-offs
~ O (1) ~ O (N * M)1
1
Assuming tasks are not reassigned

Scheduling strategy
For each task
Validate constraints
Eval fitness on each host
Until fitness good enough, and
A minimum #hosts evaluated

Task constraints
Soft
Hard
Some built-in
Host attribute based
Co-tasks based
Extensible

Fitness evaluation
Degree of fitness
Composable

Bin packing fitness calculator
Fitness for
Host1 Host2 Host3 Host4 Host5
fitness = usedCPUs / totalCPUs

Fitness for 0.25 0.5 0.75 1.0 0.0

Fitness for 0.25 0.5 0.75 1.0 0.0
✔

Stream locality fitness
Stream locality fitness
= percentage of tasks connecting to the
same stream

Composable fitness calculators
Fitness
= ( BinPackFitness * BinPackWeight +
RuntimePackFitness * RuntimeWeight +
StreamLocalityFitness * StreamWeight
) / 3.0

Cluster autoscaling in Fenzo
ASG/Cluster:
mantisagent
MinIdle: 8
MaxIdle: 20
CooldownSecs:
360
ASG/Cluster:
mantisagent
MinIdle: 8
MaxIdle: 20
CooldownSecs:
360
ASG/cluster:
computeCluster
MinIdle: 8
MaxIdle: 20
CooldownSecs: 360
Fenzo
ScaleUp
action:
Cluster, N
ScaleDown
action:
Cluster,
HostList

Rules based cluster autoscaling
● Set up rules per host attribute value
○ E.g., one autoscale rule per ASG/cluster, one cluster for network-
intensive jobs, another for CPU/memory-intensive jobs
● Sample:
○ ClusterName, MinIdle, MaxIdle, CoolDownSecs
○ NwkClstr,10,20,360; CmputeClstr,5,20,300
#Idle
hosts
Trigger down scale
Trigger up scale
min
max

Shortfall based cluster autoscaling
● Rule-based scale up has a cool down period
○ What if there’s a surge of incoming requests?
● Pending requests trigger shortfall analysis
○ Scale up happens regardless of cool down period
○ Remembers which tasks have already been covered

Combining predictive autoscaling
● EC2 Auto Scaling groups have 3 numbers to control
● Fenzo sets “desirable” value
● Set max value to limit scale up
● Scryer can set min value
1
http://techblog.netflix.com/2013/11/scryer-netflixs-predictive-auto-scaling.html

Job autoscaling
Autoscale
resources of a job’
s stage based on
backpressure or
its usage of CPU,
memory, network

Job autoscaling
job’s
network
bandwidth
usage
number of
job
processes
Example from a
job that reads
from a queue to
process data
across multiple
processes

Why Apache Mesos in a Cloud?
Resource granularity
Task start latency

Heterogeneous Resource Scheduling Using
Apache Mesos for Cloud Native Frameworks
Sharma Podila
@podila

Resource Scheduling using Apache Mesos in Cloud Native Environments

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Resource Scheduling using Apache Mesos in Cloud Native Environments

Similar to Resource Scheduling using Apache Mesos in Cloud Native Environments (20)

Recently uploaded

Recently uploaded (20)

Resource Scheduling using Apache Mesos in Cloud Native Environments