Resource-Aware Scheduling for Hadoop

Na>onal University of Singapore 
School of Compu>ng 
Department of Informa>on Systems 

Lu Wei 
Project No: H064420 
Supervisor: Professor Tan Kian‐Lee 

RESOURCE‐AWARE SCHEDULING 
FOR HADOOP  

1

MapReduce & Hadoop 

2

MapReduce 
• Distributed data processing framework by 
Google 
• Job 
– Map func>on 
– Reduce func>on 

3

Hadoop Architecture 

4

Exis>ng Schedulers 

5

Early Schedulers 
• FIFO: MapReduce default, by Google 
– Priority level & submission >me 
– Data locality 
– Problem: starva>on of other jobs in presence of a 
long running job 
• Hadoop On Demand (HOD): by Yahoo! 
– Fairness: Sta>c node alloca>on using Torque 
Resource Manager 
– Problem: Poor data locality & underu>liza>on 

6

Mainstream Schedulers 
• Fair Scheduler: by Facebook 
– Fairness: dynamic resource redistribu>on  
– Challenges:  
• data locality – solved with delayed scheduling 
• Reduce/map dependence – solved with copy‐compute 
splibng 
• Capacity Scheduler: by Yahoo! 
– Similar to Fair Scheduler 
– Special support for memory intensive jobs 
7

Alterna>ve Schedulers 
• Adap>ve Scheduler (2010‐2011) 
– Goal/deadline orientated 
– Adap>vely establish predic>ons by job matching 
– Problem: strong assump>ons & ques>onable 
performance 
• Machine Learning Approach (2010) 
– Naïve Bayes & Proceptron with the aid of user hints 
– Befer performance than FIFO 
– Underu>liza>on during learning phase & Overhead  

8

Exis>ng Schedulers 
Scheduler  Pro  Con  Resource‐Awareness 
FIFO  High throughput  Starva>on of short  Data locality 
jobs 
HOD  Sharing of cluster  Poor data locality &  ‐ 
underu>liza>on 
Fair Scheduler  Fairness & dynamic  Complicated  Data locality 
resource re‐ conﬁgura>on  Copy‐compute 
alloca>on  splibng 
Capacity Scheduler  Similar to FS  Similar to FS  Special support for 
memory intensive jobs 
Adap>ve Scheduler  Adap>ve approach  Strong assump>ons  Resource u>liza>on 
& ques>onable  control using job 
performance  matching 
Machine Learning  Reported befer  Underu>liza>on  Resource u>liza>on 
performance than  during learning  control using pafern 
FIFO  phase & overhead  classiﬁca>on 

9

Mo>va>ons 
• Heterogeneity by Conﬁgura>on 
– Hardware capacity diﬀerences among a cluster 
• Heterogeneity by Usage 
– All task slots are treated equally without 
considera>ons of resource status of current node 
or resource demand of queuing jobs 
– Possible that a CPU busy node is assigned a CPU 
intensive job; and an I/O busy node assigned an I/
O intensive job   

10

Resource‐Aware Scheduler 

11

Design Overview 
1. Capture  
– the job’s resource demand characteris>cs  
– the TaskTracker’s sta>c capability & run>me 
usage status  
2. Combine and Transform into quan>ﬁed 
measurements 
3. Predict how fast a given TaskTracker is 
expected to ﬁnish a given task 
4. Apply scheduling policy of choice  
12

Design Details 
• TaskTracker Proﬁling 
– Resource scores: represent availability 
– Sampled every second (at every heartbeat) for 
each TaskTracker 

13

Design Details 
• Task Based Job Sampling 
– Assump>on: 
tsample = ts−cpu + ts−disk + ts−network
– Target measurements: 
Task resource demand 
€

TaskTracker resource 
statuses 

– Technique:   
• Periodical re‐sampling: avoid over‐reliance on one job sample   14

Design Details 
• Task Processing Time Es>ma>on 
testimate = te −cpu + te −disk + te −network
cs−cpu
testimate = ts−cpu × + te −disk −in + te −disk −out + te −disk −spill + te −network −in + te −network −out
ccpu
cs−disk −read s
€ te −disk −in = ts−disk −in ×
cdisk −read
× disk −in
ss−disk −in
€
ss−disk −spill
sdisk −spill = × sin
Ss−in
€
sout βs−oi −ratio × sin
snetwork −out = =
N total −reduce N total −reduce
€ 15 

€

Design Details 
• Scheduling policies 
– Map Tasks 
• Shortest Job First (SJF) 
• Starva>on of long running jobs: addressed by periodical 
re‐sampling  
– Reduce Tasks 
• Naïve I/O Biasing 
– Do not schedule I/O intensive job on I/O busy node when 
there are other reduce slots with higher disk I/O availability 
– I/O intensive job: judged using map phase sample 
– I/O busy node: disk I/O scores below cluster average  

16

Implementa>on 
Es>mated task 
MapTaskFinishTim processing >me  Resource 
eEs>mator  Scheduler 

Resource Scores  Sample task processing >me & data sizes  

TaskTracker  JobTracker 
TaskTrackerStatus  MapSampleReport Job proﬁles  MyJobInProgress 
ResourceStatus  Logger  JobInProgress 
Resource Proﬁles  HashMap<JobID, 
TaskInProgress 
MapSampleReport> 
ResourceCalculator Task 
Plugin  TaskStatus 
SampleTaskStatus 
hfps://github.com/weilu/Hadoop‐Resource‐Aware‐Scheduler  17

Evalua>on & Results 

18

Es>ma>on Accuracy 
• Cluster Configura>on I 
– Shared with other users and other applica>ons 
– 1 master, 10 slave nodes 
– 1Gbps network, same rack 
– Each node:  
• 4 processors: Intel Xeon E5607 Quad Core CPU (2.26GHz),  
• 32GBmemory, and  
• 1TB hard disk 
• Hadoop Configura>on 
– HDFS block size: 64MB 
– Data replica>on: 1 
– Each node: 
• Map slots: 1 
• Reduce slots: 2 
– Specula>ve map & reduce tasks: off 
– Completed maps required before scheduling reduce: 1 out of 1000 total maps  

19

• Workload descrip>on: 
– I/O workload: word count 
• Counts the occurrence of each word in given input ﬁles  
• Mapper: Scans through the input; outputs each word with itself as 
the key and 1 as the value, sorted on the key value. 
• Reducer: Collects those with the same key by adding up the value; 
outputs the key and total occurrence   
– CPU workload: pi es>ma>on 
• Approximate the value of pi by coun>ng the number of points that 
fall within the unit quarter circle 
• Mapper: Reads coordinates of points; counts points inside/outside 
of the inscribed circle of the square. 
• Reducer: Accumulates numbers of points inside/outside results 
from the mappers  

20

• I/O Workload 1  (Resource Scheduler, wordcount, 10 node, 5G in data, single job)  
Es?mated vs. Actual Task Execu?on Time   

es>mate  actual 
160000 

140000 

120000 

100000 

80000 

60000 

40000 

20000 

0 

21

• I/O Workload 2  (Resource Scheduler, wordcount, 10 node, 5G in data, single job) 
Es?mated vs. Actual Task Execu?on Time   

es>mate  actual 
45000 

40000 

35000 

30000 

25000 

20000 

22

• CPU Workload 1  Resource Scheduler pi   
(10 node, 100maps, 108points each, Single job)   

es>mated  actual 
6000 

5000 

4000 

3000 

2000 

1000 

0 

23

Resource Scheduler pi   

• CPU Workload 2 
(10 node, 100maps, 109points each, Single job)   

es>mated  actual 
50000 
45000 
40000 
35000 
30000 
25000 
20000 
15000 
10000 
5000 
0 

24

Performance Benchmark:  
Resource Scheduler vs. FIFO Scheduler  
• Cluster Configura>on II (Diff to Configura>on I) 
– Reserved and unshared 
– 1 master, 5 slave nodes 
• Workload Descrip>on 
– Single I/O job: word count 
Overhead Evalua>on 
– Single CPU job: pi es>ma>on 
– Simultaneous submission of I/O job and CPU job  
Baseline establishment: reality test  

25

Resource‐Homogeneous Environment 
• Overhead Evalua>on  

Table 9 – evalua?on and results: word count in resource‐homogeneous environment 3runs (summary) 

Table 10 – evalua?on and results: pi es?ma?on in resource‐homogeneous environment 3runs (summary) 
26

• FIFO vs Resource Scheduler in a Resource‐Homogeneous 
Environment  

27

• Analysis  FIFO vs Resource Scheduler in a Resource‐
Homogeneous Environment 
– Negligible overhead  (Simultaneous submission of an I/O job 
and a CPU job )  
– Resource Scheduler performs  1700 
worse: slowdown in all  1650 
measured dimensions and case  1600 

– Reason: Resource scheduler has  1550 

more concurrent running  1500 

reducers compe>ng for  1450 
worst 

resources  1400  average 

1350  best 
– Expect: Same performance in a 
1300 
busy cluster (all reduce slots are 
1250 
constantly ﬁlled with running 
1200 
tasks)  FIFO  Resource  FIFO  Resource 

total map >me (sec)  total job >me (sec) 

28

Resource‐Heterogeneous Environment 
• Environment Simula>on 
– CPU interven>on: Non‐MapReduce Pi es>ma>on 
– Disk I/O interven>on: dd 50G write‐read 

• Simulated Environment 
– 3 CPU busy nodes + 2 Disk IO busy nodes   29

• FIFO vs Resource Scheduler in a Resource‐Heterogeneous 
Environment (Sequen>al submission of 2 jobs)  

30

• FIFO vs Resource Scheduler in a Resource‐Heterogeneous 
Environment (Concurrent submission of 2 jobs)  

31

FIFO vs Resource Scheduler in a Resource‐
Heterogeneous Environment   Total map ?me  
(Simultaneous submission of an I/O job and a  percentage slowdown of resource to FIFO scheduler 
CPU job )    16.00% 
14.00% 
2700 
12.00% 
2550  10.00%  homogenous 
8.00%  environment 
2400  6.00% 
heterogenous 
4.00%  environment  
2250 
2.00% 
0.00% 
2100 
Best  Average  Worst 
1950 
worst 
Total job ?me  
percentage slowdown of resource to FIFO scheduler   
1800  average 
20.00% 
best  18.00% 
1650 
16.00% 
14.00% 
1500 
12.00%  homogenous 
10.00%  environment 
1350  8.00% 
6.00%  heterogenous 
1200  4.00%  environment 
FIFO  Resource  FIFO  Resource  2.00% 
0.00% 
‐2.00%  Best  Average  Worst 
Total map >me (sec)  Total job >me (sec) 
‐4.00% 

32

Conclusion 
• Resource based map task processing >me es>ma>on is sa>sfactory 
• Resource scheduler did not manage to outperform FIFO scheduler 
in resource‐homogenous environment and most cases of resource 
heterogeneous environment due to extra concurrent reduce tasks 
• However we veriﬁed that resource scheduler is indeed resource 
aware – it performs befer when moved from a resource‐
homogeneous environment to a resource‐heterogeneous 
environment: 
– Smaller percentage slowdown compared to FIFO in all cases and all 
measured dimensions 
– Observed speedup compared to FIFO in worse cases due to I/O biasing 
scheduling during reduce stage 

33

Recommenda>ons for Future Work 
• Evalua>on 
– Heavier workload & busy cluster 
• Observe overhead 
• Benchmark performance 
• Scheduling policy 
– Map Task 
• Highest Response Ra>o Next (HRRN) 
testimated + twaiting twaiting
priority = = 1+
testimated testimated
– Reduce Task 
• CPU Biasing for CPU intensive jobs 
€

34

Resource-Aware Scheduling for Hadoop

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (6)

Ähnlich wie Resource-Aware Scheduling for Hadoop

Ähnlich wie Resource-Aware Scheduling for Hadoop (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Resource-Aware Scheduling for Hadoop