SlideShare ist ein Scribd-Unternehmen logo
1 von 85
GoodFit: Multi-Resource Packing
of Tasks with Dependencies
Cluster Scheduling for Jobs
Jobs
Machines, file-system, network
Cluster Scheduler
matches tasks to resources
Goals
• High cluster utilization
• Fast job completion time
• Predictable perf./ fairness
E.g., BigData (Hive, SCOPE, Spark)
E.g., CloudBuild
Tasks
Dependencies
• Need not keep resource “buffers”
• More dynamic than VM placement (tasks last seconds)
• Aggregate properties are important (eg, all tasks in a job should finish)
Need careful multi-resource planning
Problem
Fragmentation
Current Schedulers Packer Scheduler
Over-allocation of net/disk
Current Schedulers Packer Scheduler
2 tasks/T  3 tasks/T (+50%) 2 tasks/ 2T  2 tasks/T (+100%)
… worse with dependencies
Problem 2
Tt,
𝟏
𝒏
r t, 1- r
t, r
t, 1- r t, 1- r
(T- 2)t,
𝟏
𝒏
r (T- 4)t,
𝟏
𝒏
r ~Tt,
𝟏
𝒏
r
…
…
DAG label= {duration, resource demand}
resource
time
~nT t
…
resource
time
~T t
…
…
Crit. Path Best
Critical path scheduling is n times off since it ignores resource demands
Packers can be d times off since they ignore future work [d resources]
Typical job scheduler infrastructure
+ packing
+ bounded unfairness
+ merge schedules
+ overbook
DAG
AM
DAG
AM
… Node
heartbeat
Task
assignment
Schedule
Constructor
Schedule
Constructor
RM
NM
NM
NM
NM
Main ideas in multi-resource packing
Task packing ~ Multi-dimensional bin packing, but
* Very hard problem (“APX-hard”)
* Available heuristics do not directly apply [task demands change with placement]
Alignment score (A) = D  R
A packing heuristic
 Task’s resources demand vector: D  Machine resource vector: R<
Fit
A job completion time heuristic shortest remaining work, P tasks avg. duration
tasks avg. resource demand
*
*
=
remaining # tasks
Packing
Efficiency
?
delays job completion
loses packing efficiencyJob Completion
Time
Fairness
Trade-offs:
We show that:
{best “perf” |bounded unfairness} ~ best “perf”
loses both
Main ideas in packing dependent tasks
1. Identify troublesome tasks (meat) and place
them first
2. Systematically place other tasks without
deadlocks
3. At runtime, use a precedence order from the
computed schedule + heuristics to (a)
overbook, (b) previous slide.
4. Better lower bounds for DAG completion time
M
P
C
O
time
resource
meat
begin
meat
end
parents
meat
children
Results - 1
Packing
Packing + Deps.
Lower bound
[20K DAGs from Cosmos]
Results - 2
Tez + Packing
Tez + Pack +Deps
[200 jobs from TPC-DS, 200 server cluster]
Bundling
Temporal relaxation of fairness
Map
(disk)
Reduce
(netw.)
Fair share among two identical jobs
50%
50%
50%
50%
2T 4T
Instantaneous fairness
100
%
100
%
100
%
100
%
2T 3TT
1) Temporal relaxation of fairness
a job will finish within (1 + 𝑓)x the time it takes given strict share
2) Optimal trade-off with performance
(1 + 𝑓)x fairness costs (2 + 2𝑓 − 2 𝑓 + 𝑓2)x on make-span
3) A simple (offline) algorithm that achieves the above trade-off
Problem:
Instantaneous fairness can be up to dx worse on makespan (d resources)
Best
Fairness slack 𝒇 Perf loss
0 (perfectly fair) 2x
1 (<2x longer) 1.1x
2 (<3x longer) 1.07x
Bare metal
VM Allocation
Data-parallel Jobs
Job: Tasks
Dependencies
E.g., HDInsight, AzureBatch
E.g., BigData (Yarn, Cosmos, Spark)
E.g., CloudBuild
3500 servers
3500 users
>20M targets/day
~100K servers (40K at Yahoo)
>50K servers
>2EB stored
>6K devs
• Tasks are short-lived (10s of seconds)
• Have peculiar shaped demands
• Composites are important (job needs all tasks to finish)
• OK to kill and restart tasks
• Locality
1) Job scheduling has specific aspects
2) will speed-up the average job (and reduce resource cost)
3) research + practice
Resource aware scheduling improves SLOs and Return/$
Cluster Scheduling for Jobs
Jobs
Machines, file-system, network
Cluster Scheduler
matches tasks to resources
Goals
• High cluster utilization
• Fast job completion time
• Predictable perf./ fairness
• Efficient (milliseconds…)
E.g., HDInsight, AzureBatch
E.g., BigData (Hive, SCOPE, Spark)
E.g., CloudBuild
Tasks
Dependencies
Need careful multi-resource planning
Problem
Fragmentation
Current Schedulers Packer Scheduler
Over-allocation of net/disk
Current Schedulers Packer Scheduler
2 tasks/T  3 tasks/T (+50%) 2 tasks/ 2T  2 tasks/T (+100%)
… worse with dependencies
Problem 2
Tt,
𝟏
𝒏
r t, 1- r
t, r
t, 1- r t, 1- r
(T- 2)t,
𝟏
𝒏
r (T- 4)t,
𝟏
𝒏
r ~Tt,
𝟏
𝒏
r
…
…
DAG label= {duration, resource demand}
resource
time
~nT t
…
resource
time
~T t
…
…
Crit. Path Best
Critical path scheduling is n times off since it ignores resource demands
Packers can be d times off since they ignore future work [d resources]
Typical job scheduler infrastructure
+ packing
+ bounded unfairness
+ merge schedules
+ overbook
DAG
AM
DAG
AM
… Node
heartbeat
Task
assignment
Schedule
Constructor
Schedule
Constructor
RM
NM
NM
NM
NM
Main ideas in packing dependent tasks
1. Identify troublesome tasks (T) and place them
first
2. Systematically place other tasks without dead-
ends
3. At runtime, enforce computed schedule +
heuristics to (a) overbook, (b) previous slide.
4. Better lower bounds for DAG completion time
T
P
C
O
time
resource
Trouble
begin
Trouble
end
parents
trouble
children
Results - 1
Packing
Packing + Deps.
Lower bound
[20K DAGs from Cosmos]
2X1.5X
Results - 2
Tez + Packing
Tez + Pack +Deps
[200 jobs from TPC-DS, 200 server cluster]
Multi-Resource Packing for Cluster Schedulers
Performance of cluster schedulers
We observe that:
1Time to finish a set of jobs
 Resources are fragmented i.e. machines are running below capacity
 Even at 100% usage, goodput is much smaller due to over-allocation
 Even pareto-efficient multi-resource fair schemes result in much lower performance
Tetris
up to 40% improvement in makespan1 and job
completion time with near-perfect fairness
Findings from Bing and Facebook traces analysis
 Tasks need varying amounts of each resource
 Demands for resources are weakly correlated
Diversity in multi-resource requirements:
Multiple resources become tight
This matters because no single bottleneck resource:
 Enough cross-rack network bandwidth to use all CPU cores
25
Upper bounding potential gains
 reduce makespan1 by up to 49%
 reduce avg. job compl. time by up to 46%
26
Why so bad #1
Production schedulers neither pack
tasks nor consider all their relevant
resource demands
#1 Resource Fragmentation
#2 Over-allocation
Current Schedulers “Packer” Scheduler
Machine A
4 GB Memory
Machine B
4 GB Memory
T1: 2 GB
T3: 4 GB
T2: 2 GB
Time
Resource Fragmentation (RF)
STOP
Machine A
4 GB Memory
Machine B
4 GB Memory
T1: 2 GB
T3: 4 GB
T2: 2 GB
Time
Avg. task compl. time = 1 t
27
Current Schedulers
RF increase with the
number of resources
being allocated !
Avg. task compl.time = 1.33 t
Resources allocated
in terms of Slots
Free resources unable
to be assigned to tasks
Current Schedulers “Packer” Scheduler
Machine A
4 GB Memory; 20 MB/s Nw.
Time
T1: 2 GB
Memory
20 MB/s
Nw.
T2: 2 GB
Memory
20 MB/s
Nw.
T3: 2 GB
Memory
Machine A
4 GB Memory; 20 MB/s Nw.Time
T1: 2 GB
Memory
20 MB/s
Nw.
T2: 2 GB
Memory
20 MB/s
Nw.
T3: 2 GB
Memory
STOP
20 MB/s
Nw.
20 MB/s
Nw.
28
Over-Allocation
 Not all tasks resource
demands are
explicitly allocated
 Disk and network
are over-allocated
Avg. task compl.time= 2.33 t Avg. task compl. time = 1.33 t
Current Schedulers
Work Conserving != no fragmentation, over-allocation
 Treat cluster as a big bag of resources
 Hides the impact of resource fragmentation
 Assume job has a fixed resource profile
 Different tasks in the same job have different demands
Multi-resource Fairness Schemes do not help either
Why so bad #2
 The schedule impacts job’s current resource profiles
 Can schedule to create complementarity profiles
Packer Scheduler vs. DRF
 Avg. Job Compl.Time: 50%
 Makespan: 33%
Pareto1 efficient != performant
1no job can increase share without decreasing the share of another
29
Competing objectives
Job completion time
Fairness
vs.
Cluster efficiency
vs.
Current Schedulers
1. Resource Fragmentation
3. Fair allocations sacrifice performance
2. Over-Allocation
30
# 1
Pack tasks along multiple resources to improve
cluster efficiency and reduce makespan
31
Theory Practice
Multi-Resource Packing of Tasks
similar to
Multi-Dimensional Bin Packing
Balls could be tasks
Bin could be machine, time
1APX-Hard is a strict subset of NP-hard
APX-Hard1
Existing heuristics do not directly apply here:
 Assume balls of a fixed size
 Assume balls are known apriori
32
 vary with time / machine placed
 elastic
 cope with online arrival of jobs,
dependencies, cluster activity
Avoiding fragmentation looks like:
 Tight bin packing
 Reduces # of bins used -> reduce makespan
# 1
Packing heuristic
1. Check for fit ensure no over-allocation Over-Allocation
Alignment score (A)
33
A packing heuristic
 Tasks resources demand vector  Machine resource vector<
Fit
“A” works because:
2. Bigger balls get bigger scores
3. Abundant resources used first
Resource Fragmentation
4. Can spread load across machines
# 2
Faster average job completion time
34
35
CHALLENGE
# 2
Shortest Remaining Time First1 (SRTF)
1SRTF – M. Harchol-Balter et al. Connection Scheduling in Web Servers [USITS’99]
schedules jobs in ascending order of their remaining time
Job Completion
Time Heuristic
Q: What is the shortest “remaining time” ?
“remaining work”
remaining # tasks
tasks durations
tasks resource demands
&
&
=
A job completion time heuristic
 Gives a score P to every job
 Extended SRTF to incorporate multiple resources
36
CHALLENGE
# 2
Job Completion
Time Heuristic
Combine A and P scores !
Packing
Efficiency
Completion
Time
?
1: among J runnable jobs
2: score (j) = A(t, R)+ P(j)
3: max task t in j, demand(t) ≤ R (resources free)
4: pick j*, t* = argmax score(j)
A: delays job completion time
P: loss in packing efficiency
# 3
Achieve performance and fairness
37
# 3
38
 A says: “task i should go here to improve packing efficiency”
Feasible solution which typically can satisfy all of them
 P says: “schedule job j next to improve job completion time”
 Fairness says: “this set of jobs should be scheduled next”
Fairness
Heuristic
Performance and fairness do not mix well in general
But ….
We can get “perfect fairness” and much better performance
# 3
39
 Fairness Knob, F  [0, 1)
 F = 0 most efficient scheduling
 F → 1 close to perfect fairness
Pick the best-for-perf. task from among
1-F fraction of jobs furthest from fair share
Fairness
Heuristic
Fairness is not a tight constraint
 Long term fairness not short term fairness
 Lose a bit of fairness for a lot of gains in performance
Heuristic
40
Putting it all together
We saw:
Other things in the paper:
 Packing efficiency
 Prefer small remaining work
 Fairness knob
 Estimate task demands
 Deal with inaccuracies, barriers
 Ingestion / evacuation
Job Manager1
Node Manager1
Cluster-wide Resource Manager
Multi-resource asks;
barrier hint
Track resource usage;
enforce allocations
New logic to match tasks to machines
(+packing, +SRTF, +fairness)
Allocations
Asks
Offers
Resource
availability reports
Yarn architecture
Changes to add Tetris(shown in orange)
Evaluation
 Pluggable scheduler in Yarn 2.4
 250 machine cluster deployment
 Replay Bing and Facebook traces
41
42
Efficiency
Makespan
DRF 28 %
Avg. Job Compl. Time
35%
0
50
100
150
200
0 5000 10000 15000
Utilization(%)
Time (s)
CPU Mem In St
Tetris
Gains from
 avoiding fragmentation
 avoid over-allocation
0
50
100
150
200
0 4500 9000 13500 18000 22500
Utilization(%)
Time (s)
CPU Mem In St
Tetris vs.
Capacity
Scheduler
29 % 30 %
Over-allocation
Lower value => higher resource fragmentation
Utilization(%)
200
150
100
50
0
0 5000 10000 15000
Time (s)
Over-allocation
Lower value => higher resource fragmentation
Capacity Scheduler
43
Fairness
Fairness Knob
 quantifies the extent to which Tetris adheres to fair allocation
No Fairness
F = 0
Makespan
50 %
10 %
25 %
Job Compl.
Time
40 %
23 %
35 %
Avg. Slowdown
[over impacted jobs]
25 %
2 %
5 %
Full Fairness
F → 1
F = 0.25
Pack efficiently
along multiple
resources
Prefer jobs
with less
“remaining
work”
Incorporate
Fairness
 combine heuristics that improve packing efficiency with those that
lower average job completion time
 achieving desired amounts of fairness can coexist with improving
cluster performance
 implemented inside YARN; trace-driven simulations and deployment
show encouraging initial results
We are working towards a Yarn check-in
http://research.microsoft.com/en-us/UM/redmond/projects/tetris/
44
45
Backup slides
Estimating resource requirements
Estimating Resource Demands
Under-utilization
 from:
o finished tasks in the same phase
 peak usage demands estimates
Machine1 - In Network
850
1024
0
512
MBytes/sec
Time (sec)
In Network Used
In Network Free
Resource Tracker
o report unused resources
o aware of other cluster activities: ingestion and evacuation
Resource Tracker
o collecting statistics from recurring jobs
Peak Demand
o inputs size/location of tasks
46
Placement
Impacts network/disk requirements
Packer Scheduler vs. DRF
DRF Scheduler Packer Schedulers
2 tasks
Job Schedule
Resources used
2 tasks 2 tasks
2 tasks 2 tasks 2 tasks
6 tasks 6 tasks 6 tasksA
B
C
18 cores
16 GB
18 cores
16 GB
18 cores
16 GB
t 2t 3t
0 tasks
Job Schedule
Resources used
0 tasks 6 tasks
0 tasks 6 tasks
18 tasksA
B
C
18 cores 18 cores
6 GB
18 cores
6 GB
t 2t 3t
36 GB
Durations:
A: 3t
B: 3t
C: 3t
Durations:
A: t
B: 2t
C: 3t
33%
improvement
Dominant Resource Fairness (DRF)
computes the dominant share (DS) of every user and
seeks to maximize the minimum DS across all users
Cluster [18 Cores, 36 GB Memory]
Job: [Task Prof.], # tasks
A [1 Core, 2 GB], 18
B [3 Cores, 1 GB], 6
C [3 Cores, 1 GB], 6
DS =
𝟏
𝟑
max (qA, qB, qC) (Maximize allocations)
qA + 3qB + 3qC ≤ 18 (CPU constraint)
2qA + 1qB + 1qC ≤ 36 (Memory constraint)
qA
18
=
qB
6
=
qC
6
(Equalize DS) 47
1Time to finish a set of jobs
Machine 1,2: [2 Cores, 4 GB]
Job: [Task Prof.], # tasks
A [2 Cores, 3 GB], 6
B [1 Core, 2 GB], 2
Resources used
4
cores
6 GB
2
tasks
2
tasks
2
tasks
2
tasks
t 2t 3t 4t
Job Schedule
4
cores
6 GB
4
cores
6 GB
2
cores
4 GB
Resources used
2
cores
4 GB
2
tasks
2
tasks
2
tasks
2
tasks
t 2t 3t 4t
Job Schedule
4
cores
6 GB
4
cores
6 GB
4
cores
6 GB
Pack No Pack
Durations:
A: 3t
B: 4t
Durations:
A: 4t
B: t
29% improvement
48
Packing efficiency does not achieve everything
Achieving packing efficiency does not
necessarily improve job completion time
49
Ingestion / evacuation
ingestion = storing incoming data for later analytics
evacuation = data evacuated and re-replicated before
maintenance operations
 e.g. some clusters reports volumes of up to 10 TB per hour
Other cluster activities which produce background traffic
 e.g. rack decommission for machines re-imaging
Resource Tracker reports, used by Tetris to avoid
contention between its tasks and these activities
50
Workload analysis
51
Alternative Packing Heuristics
52
Fairness vs. Efficiency
53
Fairness vs. Efficiency
54
Virtual Machine Packing != Tetris
Virtual Machine Packing
But focus on different challenges and not task packing:
 balance load across servers
 ensure VM availability inspite of failures
 allow for quick software and hardware updates
 NO corresponding entity to a job and hence job completion time is inexpressible
 Explicit resource requirements (e.g. small VM) makes VM packing simpler
Consolidating VMs, with multi-dimensional resource
requirements, on to the fewest number of servers
55
Barrier knob, b  [0, 1)
Tetris gives preference for last tasks in a stage
Offer resources to tasks in a stage preceding a
barrier, where b fraction of tasks have finished
 b = 1 no tasks preferentially treated
56
Starvation Prevention
It could take a long time to accommodate large tasks ?
But …
1. most tasks have demands within one order of magnitude of one another
2. machines report resource availability to the scheduler periodically
 scheduler learn about all the resources freed up by tasks that finish in the
preceding period together => can to reservation for large tasks
57
Cluster load vs. Tetris performance
Packing and Dependency-aware Scheduling for
Data-Parallel Clusters
Performance of cluster schedulers
We observe that:
1Time to finish a set of jobs
 Typically cluster schedulers do dependency-aware scheduling
 OR multi-resource packing
 None of the existing solutions are close to optimal for more than 50% of the
production jobs
Graphene
> 30% improvements in makespan1 and job
completion time for more than 50% of the jobs
2
Findings from Bing traces analysis
Jobs structure have evolved into complex DAGs of tasks
 depth 7
 103 tasks
Median job DAG’s has:
A good cluster scheduler should be
aware of dependencies
1Time to finish a set of jobs
3
Findings from Bing traces analysis
 High coefficient of variation (~1) for many resources
 Demands for resources are weakly correlated
Applications have (very) diverse resource needs:
Multiple resources become tight
This matters because no single bottleneck resource:
 Enough cross-rack network bandwidth to use all CPU cores
61
 CPU, Memory, Network and Disk
A good cluster scheduler should
pack resources
62
Why so bad
Production schedulers
DON’T pack tasks
consider dependencies
ORAND
Dependency-aware Packing
 Breadth First Search (BFS)
63
 Do not account for tasks resource demands
 If so, they assume tasks have homogeneous
demands
OR Consider the DAG structure during the
schedule
 Tetris
 Ignore dependencies
 Takes local greedy choices
 Handle tasks with multiple resource
requirements
 
Any scheduler that is not packing,
is up to n x OPTIMAL (n – number tasks)
Any scheduler that ignores dependencies is
d x OPTIMAL (d – number resource dimensions)
 Critical Path Scheduling
(CPSched)
Where does the “work” lie in a DAG?
“Work” – stages in a DAG where most amount of resources X time is spent
 Large DAGs that are neither a bunch of unrelated stages
nor a chain of stages
 > 40% of the DAGs have most of the “work” on the Critical Path CPSched performs well
 > 30% of the DAGs have most of the “work” such that Packers performs well
For ~50% of the DAGs neither
packers nor critically-based
schedulers may perform well 7
Pack tasks along multiple resources
while consider tasks dependencies
65
 State-of-the art techniques are suboptimal
 Key ideas in Graphene
 Conclusion
State-of-the art scheduling techniques are suboptimal
CPSched / Tetris
3 X Optimal
66
t0: t1:
t2:
t3:
1
{.7, .31}
.01
{.95, .01}
.01
{.1, .7}
.96
{. 2, .68}
.98
{. 1, .01}
.01
{. 01, .01}
t4:
t5:
duration
{rsrc.1, rsrc.2}
task:
CPSched t0 t4 t5
t
t1 t3t2
2t 3t
Time: ~3T
Tetris t0 t1 t2
t
t4 t3t5
2t 3t
Time: ~3T
Optimal t1 t0
t
t4 t3
t2
3t
Time: ~T
t5
Key insights:
 t0, t2, t5 are troublesome tasks
 schedule them as soon as possible
 Total capacity in any dimens. = 1
Schedule construction: identify troublesome tasks
and place accordingly on a virtual resource time
space.
67
# 1
T
P
C
O
…
time
resources
T
…
time
resources
P
O
C
T
Schedule Construction
 Identify tasks that can lead to a poor schedule (troublesome tasks) - T
 more likely to be on the critical path
 more difficult to pack
 Break the others tasks into P, C, O sets based on their relationship with tasks from T
 Place tasks in T on a virtual time space; overlay the others to fill any resultant holes in
this space
Nearly optimal for over three quarters of our
analyzed production DAGs
11
Online component: enforces the desired schedule
of the various DAGs.
69
# 2
DAG
Schedule Construction
Schedule Construction
Preference order
Preference order
- merging schedulesDAG
Runtime component
Node
heartbeat
Task
assignment
Resource Manager
 Prefer jobs with less
remaining work
 Enforces priority ordering
 Local placement
 Multi-resource packing
 Judicious overbooking of
malleable resources
 Deficit counters to bound
unfairness
 Enables implementation
of different fairness
schemes
Job completion time
Online Scheduling
Makespan Being Fair
- bound unfairness
- packing + overbooking
13
Evaluation
 Implemented in Yarn and Tez
 250 machine cluster deployment
 Replay Bing traces and
TPC-DS / TPC-H workloads
71
Makespan
Tetris
29 %
Avg. Job Compl. Time
27%
Graphene vs.
Critical Path
31 % 33 %BFS
23 % 24%
Gains from
 view of the entire DAG
 place the troublesome
tasks first
Efficiency
 more compact schedule
 better packing
 overbooking
15
 combine various mechanisms to improve packing efficiency and
consider tasks dependencies
 constructs a good schedule by placing tasks on a virtual resource time space
 implemented inside YARN and Tez; trace-driven simulations and
deployment show encouraging initial results
73
 online heuristics that softly enforces the desired schedules
Makespan
Tetris
29 %
Avg. Job Compl. Time
27%
Graphene vs.
Critical Path
31 % 33 %BFS
23 % 24%
Gains from
 view of the entire DAG
 place the troublesome
tasks first
Graphene BFSRunning tasks
Efficiency
 more compact schedule
 better packing
 overbooking
15
Multi-Resource Packing Optimizes Cluster Scheduling
Multi-Resource Packing Optimizes Cluster Scheduling
Multi-Resource Packing Optimizes Cluster Scheduling
Multi-Resource Packing Optimizes Cluster Scheduling
Multi-Resource Packing Optimizes Cluster Scheduling
Multi-Resource Packing Optimizes Cluster Scheduling
Multi-Resource Packing Optimizes Cluster Scheduling
Multi-Resource Packing Optimizes Cluster Scheduling
Multi-Resource Packing Optimizes Cluster Scheduling
Multi-Resource Packing Optimizes Cluster Scheduling
Multi-Resource Packing Optimizes Cluster Scheduling

Weitere ähnliche Inhalte

Was ist angesagt?

Data profiling in Apache Calcite
Data profiling in Apache CalciteData profiling in Apache Calcite
Data profiling in Apache CalciteDataWorks Summit
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...DataWorks Summit
 
Databricks clusters in autopilot mode
Databricks clusters in autopilot modeDatabricks clusters in autopilot mode
Databricks clusters in autopilot modePrakash Chockalingam
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureP. Taylor Goetz
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeFlink Forward
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Stormviirya
 
Wayfair Use Case: The four R's of Metrics Delivery
Wayfair Use Case: The four R's of Metrics DeliveryWayfair Use Case: The four R's of Metrics Delivery
Wayfair Use Case: The four R's of Metrics DeliveryInfluxData
 
Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter StormUwe Printz
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Robert Evans
 
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Tathagata Das
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceP. Taylor Goetz
 
Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleDataWorks Summit/Hadoop Summit
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationshadooparchbook
 
Storm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataStorm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataDataWorks Summit
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsData Con LA
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...Spark Summit
 

Was ist angesagt? (20)

Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
 
Data profiling in Apache Calcite
Data profiling in Apache CalciteData profiling in Apache Calcite
Data profiling in Apache Calcite
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
 
Databricks clusters in autopilot mode
Databricks clusters in autopilot modeDatabricks clusters in autopilot mode
Databricks clusters in autopilot mode
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
 
Spark Streaming into context
Spark Streaming into contextSpark Streaming into context
Spark Streaming into context
 
Wayfair Use Case: The four R's of Metrics Delivery
Wayfair Use Case: The four R's of Metrics DeliveryWayfair Use Case: The four R's of Metrics Delivery
Wayfair Use Case: The four R's of Metrics Delivery
 
Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter Storm
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
 
R for hadoopers
R for hadoopersR for hadoopers
R for hadoopers
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 
Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as example
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Storm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataStorm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-Data
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
 

Ähnlich wie Multi-Resource Packing Optimizes Cluster Scheduling

Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종NAVER D2
 
CS3114_09212011.ppt
CS3114_09212011.pptCS3114_09212011.ppt
CS3114_09212011.pptArumugam90
 
Earliest Due Date Algorithm for Task scheduling for cloud computing
Earliest Due Date  Algorithm for Task scheduling for cloud computingEarliest Due Date  Algorithm for Task scheduling for cloud computing
Earliest Due Date Algorithm for Task scheduling for cloud computingPrakash Poudel
 
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabMapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Optimizing Performance - Clojure Remote - Nikola Peric
Optimizing Performance - Clojure Remote - Nikola PericOptimizing Performance - Clojure Remote - Nikola Peric
Optimizing Performance - Clojure Remote - Nikola PericNik Peric
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAlbert Bifet
 
Task allocation and scheduling inmultiprocessors
Task allocation and scheduling inmultiprocessorsTask allocation and scheduling inmultiprocessors
Task allocation and scheduling inmultiprocessorsDon William
 
Processing Big Data: An Introduction to Data Intensive Computing
Processing Big Data: An Introduction to Data Intensive ComputingProcessing Big Data: An Introduction to Data Intensive Computing
Processing Big Data: An Introduction to Data Intensive ComputingCollin Bennett
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerancePallav Jha
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptxShimoFcis
 
Hanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduceHanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduceHanborq Inc.
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduceNewvewm
 
Hadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveHadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveJoydeep Sen Sarma
 
Hanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aHanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aSchubert Zhang
 
In-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common PatternsIn-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common PatternsSrinath Perera
 

Ähnlich wie Multi-Resource Packing Optimizes Cluster Scheduling (20)

Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
 
Distributed systems scheduling
Distributed systems schedulingDistributed systems scheduling
Distributed systems scheduling
 
CS3114_09212011.ppt
CS3114_09212011.pptCS3114_09212011.ppt
CS3114_09212011.ppt
 
Earliest Due Date Algorithm for Task scheduling for cloud computing
Earliest Due Date  Algorithm for Task scheduling for cloud computingEarliest Due Date  Algorithm for Task scheduling for cloud computing
Earliest Due Date Algorithm for Task scheduling for cloud computing
 
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabMapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
 
BIG DATA Session 7 8
BIG DATA Session 7 8BIG DATA Session 7 8
BIG DATA Session 7 8
 
Optimizing Performance - Clojure Remote - Nikola Peric
Optimizing Performance - Clojure Remote - Nikola PericOptimizing Performance - Clojure Remote - Nikola Peric
Optimizing Performance - Clojure Remote - Nikola Peric
 
Hadoop scheduler
Hadoop schedulerHadoop scheduler
Hadoop scheduler
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Task allocation and scheduling inmultiprocessors
Task allocation and scheduling inmultiprocessorsTask allocation and scheduling inmultiprocessors
Task allocation and scheduling inmultiprocessors
 
Processing Big Data: An Introduction to Data Intensive Computing
Processing Big Data: An Introduction to Data Intensive ComputingProcessing Big Data: An Introduction to Data Intensive Computing
Processing Big Data: An Introduction to Data Intensive Computing
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
Hadoop map reduce concepts
Hadoop map reduce conceptsHadoop map reduce concepts
Hadoop map reduce concepts
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptx
 
Hanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduceHanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduce
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduce
 
Hadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveHadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspective
 
Hanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aHanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221a
 
In-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common PatternsIn-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common Patterns
 

Mehr von DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

Mehr von DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Kürzlich hochgeladen

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Kürzlich hochgeladen (20)

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Multi-Resource Packing Optimizes Cluster Scheduling

  • 1. GoodFit: Multi-Resource Packing of Tasks with Dependencies
  • 2. Cluster Scheduling for Jobs Jobs Machines, file-system, network Cluster Scheduler matches tasks to resources Goals • High cluster utilization • Fast job completion time • Predictable perf./ fairness E.g., BigData (Hive, SCOPE, Spark) E.g., CloudBuild Tasks Dependencies • Need not keep resource “buffers” • More dynamic than VM placement (tasks last seconds) • Aggregate properties are important (eg, all tasks in a job should finish)
  • 3. Need careful multi-resource planning Problem Fragmentation Current Schedulers Packer Scheduler Over-allocation of net/disk Current Schedulers Packer Scheduler 2 tasks/T  3 tasks/T (+50%) 2 tasks/ 2T  2 tasks/T (+100%)
  • 4. … worse with dependencies Problem 2 Tt, 𝟏 𝒏 r t, 1- r t, r t, 1- r t, 1- r (T- 2)t, 𝟏 𝒏 r (T- 4)t, 𝟏 𝒏 r ~Tt, 𝟏 𝒏 r … … DAG label= {duration, resource demand} resource time ~nT t … resource time ~T t … … Crit. Path Best Critical path scheduling is n times off since it ignores resource demands Packers can be d times off since they ignore future work [d resources]
  • 5. Typical job scheduler infrastructure + packing + bounded unfairness + merge schedules + overbook DAG AM DAG AM … Node heartbeat Task assignment Schedule Constructor Schedule Constructor RM NM NM NM NM
  • 6. Main ideas in multi-resource packing Task packing ~ Multi-dimensional bin packing, but * Very hard problem (“APX-hard”) * Available heuristics do not directly apply [task demands change with placement] Alignment score (A) = D  R A packing heuristic  Task’s resources demand vector: D  Machine resource vector: R< Fit A job completion time heuristic shortest remaining work, P tasks avg. duration tasks avg. resource demand * * = remaining # tasks Packing Efficiency ? delays job completion loses packing efficiencyJob Completion Time Fairness Trade-offs: We show that: {best “perf” |bounded unfairness} ~ best “perf” loses both
  • 7. Main ideas in packing dependent tasks 1. Identify troublesome tasks (meat) and place them first 2. Systematically place other tasks without deadlocks 3. At runtime, use a precedence order from the computed schedule + heuristics to (a) overbook, (b) previous slide. 4. Better lower bounds for DAG completion time M P C O time resource meat begin meat end parents meat children
  • 8. Results - 1 Packing Packing + Deps. Lower bound [20K DAGs from Cosmos]
  • 9. Results - 2 Tez + Packing Tez + Pack +Deps [200 jobs from TPC-DS, 200 server cluster]
  • 12. Map (disk) Reduce (netw.) Fair share among two identical jobs 50% 50% 50% 50% 2T 4T Instantaneous fairness 100 % 100 % 100 % 100 % 2T 3TT 1) Temporal relaxation of fairness a job will finish within (1 + 𝑓)x the time it takes given strict share 2) Optimal trade-off with performance (1 + 𝑓)x fairness costs (2 + 2𝑓 − 2 𝑓 + 𝑓2)x on make-span 3) A simple (offline) algorithm that achieves the above trade-off Problem: Instantaneous fairness can be up to dx worse on makespan (d resources) Best Fairness slack 𝒇 Perf loss 0 (perfectly fair) 2x 1 (<2x longer) 1.1x 2 (<3x longer) 1.07x
  • 13. Bare metal VM Allocation Data-parallel Jobs Job: Tasks Dependencies E.g., HDInsight, AzureBatch E.g., BigData (Yarn, Cosmos, Spark) E.g., CloudBuild 3500 servers 3500 users >20M targets/day ~100K servers (40K at Yahoo) >50K servers >2EB stored >6K devs
  • 14. • Tasks are short-lived (10s of seconds) • Have peculiar shaped demands • Composites are important (job needs all tasks to finish) • OK to kill and restart tasks • Locality 1) Job scheduling has specific aspects 2) will speed-up the average job (and reduce resource cost) 3) research + practice
  • 15. Resource aware scheduling improves SLOs and Return/$
  • 16. Cluster Scheduling for Jobs Jobs Machines, file-system, network Cluster Scheduler matches tasks to resources Goals • High cluster utilization • Fast job completion time • Predictable perf./ fairness • Efficient (milliseconds…) E.g., HDInsight, AzureBatch E.g., BigData (Hive, SCOPE, Spark) E.g., CloudBuild Tasks Dependencies
  • 17. Need careful multi-resource planning Problem Fragmentation Current Schedulers Packer Scheduler Over-allocation of net/disk Current Schedulers Packer Scheduler 2 tasks/T  3 tasks/T (+50%) 2 tasks/ 2T  2 tasks/T (+100%)
  • 18. … worse with dependencies Problem 2 Tt, 𝟏 𝒏 r t, 1- r t, r t, 1- r t, 1- r (T- 2)t, 𝟏 𝒏 r (T- 4)t, 𝟏 𝒏 r ~Tt, 𝟏 𝒏 r … … DAG label= {duration, resource demand} resource time ~nT t … resource time ~T t … … Crit. Path Best Critical path scheduling is n times off since it ignores resource demands Packers can be d times off since they ignore future work [d resources]
  • 19. Typical job scheduler infrastructure + packing + bounded unfairness + merge schedules + overbook DAG AM DAG AM … Node heartbeat Task assignment Schedule Constructor Schedule Constructor RM NM NM NM NM
  • 20. Main ideas in packing dependent tasks 1. Identify troublesome tasks (T) and place them first 2. Systematically place other tasks without dead- ends 3. At runtime, enforce computed schedule + heuristics to (a) overbook, (b) previous slide. 4. Better lower bounds for DAG completion time T P C O time resource Trouble begin Trouble end parents trouble children
  • 21. Results - 1 Packing Packing + Deps. Lower bound [20K DAGs from Cosmos] 2X1.5X
  • 22. Results - 2 Tez + Packing Tez + Pack +Deps [200 jobs from TPC-DS, 200 server cluster]
  • 23. Multi-Resource Packing for Cluster Schedulers
  • 24. Performance of cluster schedulers We observe that: 1Time to finish a set of jobs  Resources are fragmented i.e. machines are running below capacity  Even at 100% usage, goodput is much smaller due to over-allocation  Even pareto-efficient multi-resource fair schemes result in much lower performance Tetris up to 40% improvement in makespan1 and job completion time with near-perfect fairness
  • 25. Findings from Bing and Facebook traces analysis  Tasks need varying amounts of each resource  Demands for resources are weakly correlated Diversity in multi-resource requirements: Multiple resources become tight This matters because no single bottleneck resource:  Enough cross-rack network bandwidth to use all CPU cores 25 Upper bounding potential gains  reduce makespan1 by up to 49%  reduce avg. job compl. time by up to 46%
  • 26. 26 Why so bad #1 Production schedulers neither pack tasks nor consider all their relevant resource demands #1 Resource Fragmentation #2 Over-allocation
  • 27. Current Schedulers “Packer” Scheduler Machine A 4 GB Memory Machine B 4 GB Memory T1: 2 GB T3: 4 GB T2: 2 GB Time Resource Fragmentation (RF) STOP Machine A 4 GB Memory Machine B 4 GB Memory T1: 2 GB T3: 4 GB T2: 2 GB Time Avg. task compl. time = 1 t 27 Current Schedulers RF increase with the number of resources being allocated ! Avg. task compl.time = 1.33 t Resources allocated in terms of Slots Free resources unable to be assigned to tasks
  • 28. Current Schedulers “Packer” Scheduler Machine A 4 GB Memory; 20 MB/s Nw. Time T1: 2 GB Memory 20 MB/s Nw. T2: 2 GB Memory 20 MB/s Nw. T3: 2 GB Memory Machine A 4 GB Memory; 20 MB/s Nw.Time T1: 2 GB Memory 20 MB/s Nw. T2: 2 GB Memory 20 MB/s Nw. T3: 2 GB Memory STOP 20 MB/s Nw. 20 MB/s Nw. 28 Over-Allocation  Not all tasks resource demands are explicitly allocated  Disk and network are over-allocated Avg. task compl.time= 2.33 t Avg. task compl. time = 1.33 t Current Schedulers
  • 29. Work Conserving != no fragmentation, over-allocation  Treat cluster as a big bag of resources  Hides the impact of resource fragmentation  Assume job has a fixed resource profile  Different tasks in the same job have different demands Multi-resource Fairness Schemes do not help either Why so bad #2  The schedule impacts job’s current resource profiles  Can schedule to create complementarity profiles Packer Scheduler vs. DRF  Avg. Job Compl.Time: 50%  Makespan: 33% Pareto1 efficient != performant 1no job can increase share without decreasing the share of another 29
  • 30. Competing objectives Job completion time Fairness vs. Cluster efficiency vs. Current Schedulers 1. Resource Fragmentation 3. Fair allocations sacrifice performance 2. Over-Allocation 30
  • 31. # 1 Pack tasks along multiple resources to improve cluster efficiency and reduce makespan 31
  • 32. Theory Practice Multi-Resource Packing of Tasks similar to Multi-Dimensional Bin Packing Balls could be tasks Bin could be machine, time 1APX-Hard is a strict subset of NP-hard APX-Hard1 Existing heuristics do not directly apply here:  Assume balls of a fixed size  Assume balls are known apriori 32  vary with time / machine placed  elastic  cope with online arrival of jobs, dependencies, cluster activity Avoiding fragmentation looks like:  Tight bin packing  Reduces # of bins used -> reduce makespan
  • 33. # 1 Packing heuristic 1. Check for fit ensure no over-allocation Over-Allocation Alignment score (A) 33 A packing heuristic  Tasks resources demand vector  Machine resource vector< Fit “A” works because: 2. Bigger balls get bigger scores 3. Abundant resources used first Resource Fragmentation 4. Can spread load across machines
  • 34. # 2 Faster average job completion time 34
  • 35. 35 CHALLENGE # 2 Shortest Remaining Time First1 (SRTF) 1SRTF – M. Harchol-Balter et al. Connection Scheduling in Web Servers [USITS’99] schedules jobs in ascending order of their remaining time Job Completion Time Heuristic Q: What is the shortest “remaining time” ? “remaining work” remaining # tasks tasks durations tasks resource demands & & = A job completion time heuristic  Gives a score P to every job  Extended SRTF to incorporate multiple resources
  • 36. 36 CHALLENGE # 2 Job Completion Time Heuristic Combine A and P scores ! Packing Efficiency Completion Time ? 1: among J runnable jobs 2: score (j) = A(t, R)+ P(j) 3: max task t in j, demand(t) ≤ R (resources free) 4: pick j*, t* = argmax score(j) A: delays job completion time P: loss in packing efficiency
  • 37. # 3 Achieve performance and fairness 37
  • 38. # 3 38  A says: “task i should go here to improve packing efficiency” Feasible solution which typically can satisfy all of them  P says: “schedule job j next to improve job completion time”  Fairness says: “this set of jobs should be scheduled next” Fairness Heuristic Performance and fairness do not mix well in general But …. We can get “perfect fairness” and much better performance
  • 39. # 3 39  Fairness Knob, F  [0, 1)  F = 0 most efficient scheduling  F → 1 close to perfect fairness Pick the best-for-perf. task from among 1-F fraction of jobs furthest from fair share Fairness Heuristic Fairness is not a tight constraint  Long term fairness not short term fairness  Lose a bit of fairness for a lot of gains in performance Heuristic
  • 40. 40 Putting it all together We saw: Other things in the paper:  Packing efficiency  Prefer small remaining work  Fairness knob  Estimate task demands  Deal with inaccuracies, barriers  Ingestion / evacuation Job Manager1 Node Manager1 Cluster-wide Resource Manager Multi-resource asks; barrier hint Track resource usage; enforce allocations New logic to match tasks to machines (+packing, +SRTF, +fairness) Allocations Asks Offers Resource availability reports Yarn architecture Changes to add Tetris(shown in orange)
  • 41. Evaluation  Pluggable scheduler in Yarn 2.4  250 machine cluster deployment  Replay Bing and Facebook traces 41
  • 42. 42 Efficiency Makespan DRF 28 % Avg. Job Compl. Time 35% 0 50 100 150 200 0 5000 10000 15000 Utilization(%) Time (s) CPU Mem In St Tetris Gains from  avoiding fragmentation  avoid over-allocation 0 50 100 150 200 0 4500 9000 13500 18000 22500 Utilization(%) Time (s) CPU Mem In St Tetris vs. Capacity Scheduler 29 % 30 % Over-allocation Lower value => higher resource fragmentation Utilization(%) 200 150 100 50 0 0 5000 10000 15000 Time (s) Over-allocation Lower value => higher resource fragmentation Capacity Scheduler
  • 43. 43 Fairness Fairness Knob  quantifies the extent to which Tetris adheres to fair allocation No Fairness F = 0 Makespan 50 % 10 % 25 % Job Compl. Time 40 % 23 % 35 % Avg. Slowdown [over impacted jobs] 25 % 2 % 5 % Full Fairness F → 1 F = 0.25
  • 44. Pack efficiently along multiple resources Prefer jobs with less “remaining work” Incorporate Fairness  combine heuristics that improve packing efficiency with those that lower average job completion time  achieving desired amounts of fairness can coexist with improving cluster performance  implemented inside YARN; trace-driven simulations and deployment show encouraging initial results We are working towards a Yarn check-in http://research.microsoft.com/en-us/UM/redmond/projects/tetris/ 44
  • 46. Estimating resource requirements Estimating Resource Demands Under-utilization  from: o finished tasks in the same phase  peak usage demands estimates Machine1 - In Network 850 1024 0 512 MBytes/sec Time (sec) In Network Used In Network Free Resource Tracker o report unused resources o aware of other cluster activities: ingestion and evacuation Resource Tracker o collecting statistics from recurring jobs Peak Demand o inputs size/location of tasks 46 Placement Impacts network/disk requirements
  • 47. Packer Scheduler vs. DRF DRF Scheduler Packer Schedulers 2 tasks Job Schedule Resources used 2 tasks 2 tasks 2 tasks 2 tasks 2 tasks 6 tasks 6 tasks 6 tasksA B C 18 cores 16 GB 18 cores 16 GB 18 cores 16 GB t 2t 3t 0 tasks Job Schedule Resources used 0 tasks 6 tasks 0 tasks 6 tasks 18 tasksA B C 18 cores 18 cores 6 GB 18 cores 6 GB t 2t 3t 36 GB Durations: A: 3t B: 3t C: 3t Durations: A: t B: 2t C: 3t 33% improvement Dominant Resource Fairness (DRF) computes the dominant share (DS) of every user and seeks to maximize the minimum DS across all users Cluster [18 Cores, 36 GB Memory] Job: [Task Prof.], # tasks A [1 Core, 2 GB], 18 B [3 Cores, 1 GB], 6 C [3 Cores, 1 GB], 6 DS = 𝟏 𝟑 max (qA, qB, qC) (Maximize allocations) qA + 3qB + 3qC ≤ 18 (CPU constraint) 2qA + 1qB + 1qC ≤ 36 (Memory constraint) qA 18 = qB 6 = qC 6 (Equalize DS) 47
  • 48. 1Time to finish a set of jobs Machine 1,2: [2 Cores, 4 GB] Job: [Task Prof.], # tasks A [2 Cores, 3 GB], 6 B [1 Core, 2 GB], 2 Resources used 4 cores 6 GB 2 tasks 2 tasks 2 tasks 2 tasks t 2t 3t 4t Job Schedule 4 cores 6 GB 4 cores 6 GB 2 cores 4 GB Resources used 2 cores 4 GB 2 tasks 2 tasks 2 tasks 2 tasks t 2t 3t 4t Job Schedule 4 cores 6 GB 4 cores 6 GB 4 cores 6 GB Pack No Pack Durations: A: 3t B: 4t Durations: A: 4t B: t 29% improvement 48 Packing efficiency does not achieve everything Achieving packing efficiency does not necessarily improve job completion time
  • 49. 49 Ingestion / evacuation ingestion = storing incoming data for later analytics evacuation = data evacuated and re-replicated before maintenance operations  e.g. some clusters reports volumes of up to 10 TB per hour Other cluster activities which produce background traffic  e.g. rack decommission for machines re-imaging Resource Tracker reports, used by Tetris to avoid contention between its tasks and these activities
  • 54. 54 Virtual Machine Packing != Tetris Virtual Machine Packing But focus on different challenges and not task packing:  balance load across servers  ensure VM availability inspite of failures  allow for quick software and hardware updates  NO corresponding entity to a job and hence job completion time is inexpressible  Explicit resource requirements (e.g. small VM) makes VM packing simpler Consolidating VMs, with multi-dimensional resource requirements, on to the fewest number of servers
  • 55. 55 Barrier knob, b  [0, 1) Tetris gives preference for last tasks in a stage Offer resources to tasks in a stage preceding a barrier, where b fraction of tasks have finished  b = 1 no tasks preferentially treated
  • 56. 56 Starvation Prevention It could take a long time to accommodate large tasks ? But … 1. most tasks have demands within one order of magnitude of one another 2. machines report resource availability to the scheduler periodically  scheduler learn about all the resources freed up by tasks that finish in the preceding period together => can to reservation for large tasks
  • 57. 57 Cluster load vs. Tetris performance
  • 58. Packing and Dependency-aware Scheduling for Data-Parallel Clusters
  • 59. Performance of cluster schedulers We observe that: 1Time to finish a set of jobs  Typically cluster schedulers do dependency-aware scheduling  OR multi-resource packing  None of the existing solutions are close to optimal for more than 50% of the production jobs Graphene > 30% improvements in makespan1 and job completion time for more than 50% of the jobs 2
  • 60. Findings from Bing traces analysis Jobs structure have evolved into complex DAGs of tasks  depth 7  103 tasks Median job DAG’s has: A good cluster scheduler should be aware of dependencies 1Time to finish a set of jobs 3
  • 61. Findings from Bing traces analysis  High coefficient of variation (~1) for many resources  Demands for resources are weakly correlated Applications have (very) diverse resource needs: Multiple resources become tight This matters because no single bottleneck resource:  Enough cross-rack network bandwidth to use all CPU cores 61  CPU, Memory, Network and Disk A good cluster scheduler should pack resources
  • 62. 62 Why so bad Production schedulers DON’T pack tasks consider dependencies ORAND
  • 63. Dependency-aware Packing  Breadth First Search (BFS) 63  Do not account for tasks resource demands  If so, they assume tasks have homogeneous demands OR Consider the DAG structure during the schedule  Tetris  Ignore dependencies  Takes local greedy choices  Handle tasks with multiple resource requirements   Any scheduler that is not packing, is up to n x OPTIMAL (n – number tasks) Any scheduler that ignores dependencies is d x OPTIMAL (d – number resource dimensions)  Critical Path Scheduling (CPSched)
  • 64. Where does the “work” lie in a DAG? “Work” – stages in a DAG where most amount of resources X time is spent  Large DAGs that are neither a bunch of unrelated stages nor a chain of stages  > 40% of the DAGs have most of the “work” on the Critical Path CPSched performs well  > 30% of the DAGs have most of the “work” such that Packers performs well For ~50% of the DAGs neither packers nor critically-based schedulers may perform well 7
  • 65. Pack tasks along multiple resources while consider tasks dependencies 65  State-of-the art techniques are suboptimal  Key ideas in Graphene  Conclusion
  • 66. State-of-the art scheduling techniques are suboptimal CPSched / Tetris 3 X Optimal 66 t0: t1: t2: t3: 1 {.7, .31} .01 {.95, .01} .01 {.1, .7} .96 {. 2, .68} .98 {. 1, .01} .01 {. 01, .01} t4: t5: duration {rsrc.1, rsrc.2} task: CPSched t0 t4 t5 t t1 t3t2 2t 3t Time: ~3T Tetris t0 t1 t2 t t4 t3t5 2t 3t Time: ~3T Optimal t1 t0 t t4 t3 t2 3t Time: ~T t5 Key insights:  t0, t2, t5 are troublesome tasks  schedule them as soon as possible  Total capacity in any dimens. = 1
  • 67. Schedule construction: identify troublesome tasks and place accordingly on a virtual resource time space. 67 # 1
  • 68. T P C O … time resources T … time resources P O C T Schedule Construction  Identify tasks that can lead to a poor schedule (troublesome tasks) - T  more likely to be on the critical path  more difficult to pack  Break the others tasks into P, C, O sets based on their relationship with tasks from T  Place tasks in T on a virtual time space; overlay the others to fill any resultant holes in this space Nearly optimal for over three quarters of our analyzed production DAGs 11
  • 69. Online component: enforces the desired schedule of the various DAGs. 69 # 2
  • 70. DAG Schedule Construction Schedule Construction Preference order Preference order - merging schedulesDAG Runtime component Node heartbeat Task assignment Resource Manager  Prefer jobs with less remaining work  Enforces priority ordering  Local placement  Multi-resource packing  Judicious overbooking of malleable resources  Deficit counters to bound unfairness  Enables implementation of different fairness schemes Job completion time Online Scheduling Makespan Being Fair - bound unfairness - packing + overbooking 13
  • 71. Evaluation  Implemented in Yarn and Tez  250 machine cluster deployment  Replay Bing traces and TPC-DS / TPC-H workloads 71
  • 72. Makespan Tetris 29 % Avg. Job Compl. Time 27% Graphene vs. Critical Path 31 % 33 %BFS 23 % 24% Gains from  view of the entire DAG  place the troublesome tasks first Efficiency  more compact schedule  better packing  overbooking 15
  • 73.  combine various mechanisms to improve packing efficiency and consider tasks dependencies  constructs a good schedule by placing tasks on a virtual resource time space  implemented inside YARN and Tez; trace-driven simulations and deployment show encouraging initial results 73  online heuristics that softly enforces the desired schedules
  • 74. Makespan Tetris 29 % Avg. Job Compl. Time 27% Graphene vs. Critical Path 31 % 33 %BFS 23 % 24% Gains from  view of the entire DAG  place the troublesome tasks first Graphene BFSRunning tasks Efficiency  more compact schedule  better packing  overbooking 15