SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Downloaden Sie, um offline zu lesen
Scheduling
Hadoopas batch processing system 
•Hadoopwas designed mainly for running large batch jobs such as web indexing and log mining. 
•Users submitted jobs to a queue, and the cluster ran them in order. 
•Soon, another use case became attractive: 
–Sharing a MapReducecluster between multiple users.
The benefits of sharing 
•With all the data in one place, users can run queries that they may never have been able to execute otherwise, and 
•Costs go down because system utilization is higher than building a separate Hadoopcluster for each group. 
•However, sharing requires support from the Hadoopjob scheduler to 
–provide guaranteed capacity to production jobs and 
–good response time to interactive jobs while allocating resources fairly between users.
Approaches to Sharing 
•FIFO : In FIFO scheduling, a JobTrackerpulls jobs from a work queue, oldest job first. This schedule had no concept of the priority or size of the job 
•Fair : Assign resources to jobs such that on average over time, each job gets an equal share of the available resources. The result is that jobs that require less time are able to access the CPU and finish intermixed with the execution of jobs that require more time to execute. This behavior allows for some interactivity among Hadoopjobs and permits greater responsiveness of the Hadoopcluster to the variety of job types submitted. 
•Capacity : In capacity scheduling, instead of pools, several queues are created, each with a configurable number of map and reduce slots. Each queue is also assigned a guaranteed capacity (where the overall capacity of the cluster is the sum of each queue's capacity). Queues are monitored; if a queue is not consuming its allocated capacity, this excess capacity can be temporarily allocated to other queues.
FIFO Scheduling 
Job Queue
FIFO Scheduling 
Job Queue
FIFO Scheduling 
Job Queue
Hadoop default scheduler (FIFO) 
–Problem:short jobs get stuck behind long ones 
•Separate clusters 
–Problem 1:poor utilization 
–Problem 2:costly data replication 
•Full replication across clusters nearly infeasible at Facebook/Yahoo! scale 
•Partial replication prevents cross-dataset queries
Fair Scheduling 
Job Queue
Fair Scheduling 
Job Queue
Fair Scheduler Basics 
•Group jobs into “pools” 
•Assign each pool a guaranteed minimum share 
•Divide excess capacity evenly between pools
Pools 
•Determined from a configurable job property 
–Default in 0.20: user.name (one pool per user) 
•Pools have properties: 
–Minimum map slots 
–Minimum reduce slots 
–Limit on # of running jobs
Example Pool Allocations 
entire cluster100 slots 
emp1 
emp2 
finance 
min share = 40 
emp6 
min share = 30 
job 2 
15 slots 
job 3 
15 slots 
job 1 
30 slots 
job 4 
40 slots
Scheduling Algorithm 
•Split each pool’s min share among its jobs 
•Split each pool’s total share among its jobs 
•When a slot needs to be assigned: 
–If there is any job below its min share, schedule it 
–Else schedule the job that we’ve been most unfair to (based on “deficit”)
Scheduler Dashboard
Scheduler Dashboard 
Change priority 
Change pool 
FIFO mode (for testing)
Additional Features 
•Weights for unequal sharing: 
–Job weights based on priority (each level = 2x) 
–Job weights based on size 
–Pool weights 
•Limits for # of running jobs: 
–Per user 
–Per pool
Installing the Fair Scheduler 
•Build it: 
–ant package 
•Place it on the classpath: 
–cp build/contrib/fairscheduler/*.jar lib
Configuration Files 
•Hadoop config(conf/mapred-site.xml) 
–Contains scheduler options, pointer to pools file 
•Pools file (pools.xml) 
–Contains min share allocations and limits on pools 
–Reloaded every 15 seconds at runtime
Minimal hadoop-site.xml 
<property> 
<name>mapred.jobtracker.taskScheduler</name> 
<value>org.apache.hadoop.mapred.FairScheduler</ value> 
</property> 
<property> 
<name>mapred.fairscheduler.allocation.file</name> 
<value>/path/to/pools.xml</value> 
</property>
Minimal pools.xml 
<?xml version="1.0"?> 
<allocations> 
</allocations>
Configuring a Pool 
<?xml version="1.0"?> 
<allocations> 
<pool name=“emp4"> 
<minMaps>10</minMaps> 
<minReduces>5</minReduces> 
</pool> 
</allocations>
Setting Running Job Limits 
<?xml version="1.0"?> 
<allocations> 
<pool name=“emp4"> 
<minMaps>10</minMaps> 
<minReduces>5</minReduces> 
<maxRunningJobs>3</maxRunningJobs> 
</pool> 
<user name=“emp1"> 
<maxRunningJobs>1</maxRunningJobs> 
</user> 
</allocations>
Default Per-User Running Job Limit 
<?xml version="1.0"?> 
<allocations> 
<pool name=“emp4"> 
<minMaps>10</minMaps> 
<minReduces>5</minReduces> 
<maxRunningJobs>3</maxRunningJobs> 
</pool> 
<user name=“emp1"> 
<maxRunningJobs>1</maxRunningJobs> 
</user> 
<userMaxJobsDefault>10</userMaxJobsDefault> 
</allocations>
Other Parameters 
mapred.fairscheduler.assignmultiple: 
•Assign a map and a reduce on each heartbeat; improves ramp-up speed and throughput; recommendation: set to true
Other Parameters 
mapred.fairscheduler.poolnameproperty: 
•Which JobConfproperty sets what pool a job is in 
-Default: user.name (one pool per user) 
-Can make up your own, e.g. “pool.name”, and pass in JobConfwith conf.set(“pool.name”, “mypool”)
Useful Setting 
<property> 
<name>mapred.fairscheduler.poolnameproperty</name> 
<value>pool.name</value> 
</property> 
<property> 
<name>pool.name</name> 
<value>${user.name}</value> 
</property> 
Make pool.name default to user.name
Issues with Fair Scheduler 
–Fine-grained sharing at level of map & reduce tasks 
–Predictable response times and user isolation 
•Problem:data locality 
–For efficiency, must run tasks near their input data 
–Strictly following any job queuing policy hurts locality: job picked by policy may not have data on free nodes 
•Solution:delay scheduling 
–Relax queuing policy for limited time to achieve locality
The Problem 
Job 2 
Master 
Job 1 
Scheduling order 
Slave 
Slave 
Slave 
Slave 
Slave 
Slave 
4 
2 
1 
1 
2 
2 
3 
3 
9 
5 
3 
3 
6 
7 
5 
6 
9 
4 
8 
7 
8 
2 
1 
1 
Task 2 
Task 5 
Task 3 
Task 1 
Task 7 
Task 4 
File 1: 
File 2:
The Problem 
Job 1 
Master 
Job 2 
Scheduling order 
Slave 
Slave 
Slave 
Slave 
Slave 
Slave 
4 
2 
1 
2 
2 
3 
9 
5 
3 
3 
6 
7 
5 
6 
9 
4 
8 
7 
8 
2 
1 
1 
Task 2 
Task 5 
3 
Task 1 
File 1: 
File 2: 
Task 1 
7 
Task 2 
Task 4 
3 
Problem: Fair decision hurts locality 
Especially bad for jobs with small input files 
1 
3
Solution: Delay Scheduling 
•Relax queuing policy to make jobs wait for a limited time if they cannot launch local tasks 
•Result: Very short wait time (1-5s) is enough to get nearly 100% locality
Delay Scheduling Example 
Job 1 
Master 
Job 2 
Scheduling order 
Slave 
Slave 
Slave 
Slave 
Slave 
Slave 
4 
2 
1 
1 
2 
2 
3 
3 
9 
5 
3 
3 
6 
7 
5 
6 
9 
4 
8 
7 
8 
2 
1 
1 
Task 2 
3 
File 1: 
File 2: 
Task 8 
7 
Task 2 
Task 4 
6 
Idea: Wait a short time to get data-local scheduling opportunities 
5 
Task 1 
1 
Task 3 
Wait!
Delay Scheduling Details 
•Scan jobs in order given by queuing policy, picking first that is permitted to launch a task 
•Jobs must wait before being permitted to launch non-local tasks 
–If wait < T1, only allow node-local tasks 
–If T1< wait < T2, also allow rack-local 
–If wait > T2, also allow off-rack 
•Increase a job’s time waited when it is skipped
Capacity Scheduler 
•Organizes jobs into queues 
•Queue shares as %’s of cluster 
•FIFO scheduling within each queue 
•Supports preemption 
•Queues are monitored; if a queue is not consuming its allocated capacity, this excess capacity can be temporarily allocated to other queues.
End of session 
Day –1: Scheduling

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Simplilearn
 
Cloud Computing Principles and Paradigms: 6 on the management of virtual mach...
Cloud Computing Principles and Paradigms: 6 on the management of virtual mach...Cloud Computing Principles and Paradigms: 6 on the management of virtual mach...
Cloud Computing Principles and Paradigms: 6 on the management of virtual mach...
Majid Hajibaba
 

Was ist angesagt? (20)

Cloud Security And Privacy
Cloud Security And PrivacyCloud Security And Privacy
Cloud Security And Privacy
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
SQOOP PPT
SQOOP PPTSQOOP PPT
SQOOP PPT
 
Introducing Delta Live Tables: Make Reliable ETL Easy on Delta Lake
Introducing Delta Live Tables: Make Reliable ETL Easy on Delta LakeIntroducing Delta Live Tables: Make Reliable ETL Easy on Delta Lake
Introducing Delta Live Tables: Make Reliable ETL Easy on Delta Lake
 
Hadoop
HadoopHadoop
Hadoop
 
Unit 5-apache hive
Unit 5-apache hiveUnit 5-apache hive
Unit 5-apache hive
 
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
 
Cloud Computing Principles and Paradigms: 6 on the management of virtual mach...
Cloud Computing Principles and Paradigms: 6 on the management of virtual mach...Cloud Computing Principles and Paradigms: 6 on the management of virtual mach...
Cloud Computing Principles and Paradigms: 6 on the management of virtual mach...
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
 
CCS335 – CLOUD COMPUTING.pptx
CCS335 – CLOUD COMPUTING.pptxCCS335 – CLOUD COMPUTING.pptx
CCS335 – CLOUD COMPUTING.pptx
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 

Andere mochten auch (6)

Hadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveHadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspective
 
Distributed Deep Learning on Hadoop Clusters
Distributed Deep Learning on Hadoop ClustersDistributed Deep Learning on Hadoop Clusters
Distributed Deep Learning on Hadoop Clusters
 
Hadoop Internals
Hadoop InternalsHadoop Internals
Hadoop Internals
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
 
Cs6703 grid and cloud computing unit 4
Cs6703 grid and cloud computing unit 4Cs6703 grid and cloud computing unit 4
Cs6703 grid and cloud computing unit 4
 
Cloud computing ppt
Cloud computing pptCloud computing ppt
Cloud computing ppt
 

Ähnlich wie Hadoop scheduler

Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloud
elliando dias
 
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time SystemsSara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
knowdiff
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
Pallav Jha
 

Ähnlich wie Hadoop scheduler (20)

Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloud
 
Hadoop
HadoopHadoop
Hadoop
 
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time SystemsSara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
 
BIG DATA Session 7 8
BIG DATA Session 7 8BIG DATA Session 7 8
BIG DATA Session 7 8
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Hadoop MapReduce Introduction and Deep Insight
Hadoop MapReduce Introduction and Deep InsightHadoop MapReduce Introduction and Deep Insight
Hadoop MapReduce Introduction and Deep Insight
 
tdtechtalk20160330johan
tdtechtalk20160330johantdtechtalk20160330johan
tdtechtalk20160330johan
 
Scalable Hadoop in the cloud
Scalable Hadoop in the cloudScalable Hadoop in the cloud
Scalable Hadoop in the cloud
 
Hadoop availability
Hadoop availabilityHadoop availability
Hadoop availability
 
High Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of ViewHigh Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of View
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
Data Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoTData Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoT
 
Survey on Job Schedulers in Hadoop Cluster
Survey on Job Schedulers in Hadoop ClusterSurvey on Job Schedulers in Hadoop Cluster
Survey on Job Schedulers in Hadoop Cluster
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Hadoop
HadoopHadoop
Hadoop
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]
 
MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPC
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 

Mehr von Subhas Kumar Ghosh

07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent
Subhas Kumar Ghosh
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
Subhas Kumar Ghosh
 
02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis
Subhas Kumar Ghosh
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparator
Subhas Kumar Ghosh
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
Subhas Kumar Ghosh
 

Mehr von Subhas Kumar Ghosh (20)

07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
 
05 k-means clustering
05 k-means clustering05 k-means clustering
05 k-means clustering
 
03 hive query language (hql)
03 hive query language (hql)03 hive query language (hql)
03 hive query language (hql)
 
02 data warehouse applications with hive
02 data warehouse applications with hive02 data warehouse applications with hive
02 data warehouse applications with hive
 
01 hbase
01 hbase01 hbase
01 hbase
 
06 pig etl features
06 pig etl features06 pig etl features
06 pig etl features
 
05 pig user defined functions (udfs)
05 pig user defined functions (udfs)05 pig user defined functions (udfs)
05 pig user defined functions (udfs)
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
 
03 pig intro
03 pig intro03 pig intro
03 pig intro
 
02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tips
 
Hadoop Day 3
Hadoop Day 3Hadoop Day 3
Hadoop Day 3
 
Hadoop exercise
Hadoop exerciseHadoop exercise
Hadoop exercise
 
Hadoop map reduce v2
Hadoop map reduce v2Hadoop map reduce v2
Hadoop map reduce v2
 
Hadoop job chaining
Hadoop job chainingHadoop job chaining
Hadoop job chaining
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparator
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitioner
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operation
 

Kürzlich hochgeladen

➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 

Kürzlich hochgeladen (20)

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 

Hadoop scheduler

  • 2. Hadoopas batch processing system •Hadoopwas designed mainly for running large batch jobs such as web indexing and log mining. •Users submitted jobs to a queue, and the cluster ran them in order. •Soon, another use case became attractive: –Sharing a MapReducecluster between multiple users.
  • 3. The benefits of sharing •With all the data in one place, users can run queries that they may never have been able to execute otherwise, and •Costs go down because system utilization is higher than building a separate Hadoopcluster for each group. •However, sharing requires support from the Hadoopjob scheduler to –provide guaranteed capacity to production jobs and –good response time to interactive jobs while allocating resources fairly between users.
  • 4. Approaches to Sharing •FIFO : In FIFO scheduling, a JobTrackerpulls jobs from a work queue, oldest job first. This schedule had no concept of the priority or size of the job •Fair : Assign resources to jobs such that on average over time, each job gets an equal share of the available resources. The result is that jobs that require less time are able to access the CPU and finish intermixed with the execution of jobs that require more time to execute. This behavior allows for some interactivity among Hadoopjobs and permits greater responsiveness of the Hadoopcluster to the variety of job types submitted. •Capacity : In capacity scheduling, instead of pools, several queues are created, each with a configurable number of map and reduce slots. Each queue is also assigned a guaranteed capacity (where the overall capacity of the cluster is the sum of each queue's capacity). Queues are monitored; if a queue is not consuming its allocated capacity, this excess capacity can be temporarily allocated to other queues.
  • 8. Hadoop default scheduler (FIFO) –Problem:short jobs get stuck behind long ones •Separate clusters –Problem 1:poor utilization –Problem 2:costly data replication •Full replication across clusters nearly infeasible at Facebook/Yahoo! scale •Partial replication prevents cross-dataset queries
  • 11. Fair Scheduler Basics •Group jobs into “pools” •Assign each pool a guaranteed minimum share •Divide excess capacity evenly between pools
  • 12. Pools •Determined from a configurable job property –Default in 0.20: user.name (one pool per user) •Pools have properties: –Minimum map slots –Minimum reduce slots –Limit on # of running jobs
  • 13. Example Pool Allocations entire cluster100 slots emp1 emp2 finance min share = 40 emp6 min share = 30 job 2 15 slots job 3 15 slots job 1 30 slots job 4 40 slots
  • 14. Scheduling Algorithm •Split each pool’s min share among its jobs •Split each pool’s total share among its jobs •When a slot needs to be assigned: –If there is any job below its min share, schedule it –Else schedule the job that we’ve been most unfair to (based on “deficit”)
  • 16. Scheduler Dashboard Change priority Change pool FIFO mode (for testing)
  • 17. Additional Features •Weights for unequal sharing: –Job weights based on priority (each level = 2x) –Job weights based on size –Pool weights •Limits for # of running jobs: –Per user –Per pool
  • 18. Installing the Fair Scheduler •Build it: –ant package •Place it on the classpath: –cp build/contrib/fairscheduler/*.jar lib
  • 19. Configuration Files •Hadoop config(conf/mapred-site.xml) –Contains scheduler options, pointer to pools file •Pools file (pools.xml) –Contains min share allocations and limits on pools –Reloaded every 15 seconds at runtime
  • 20. Minimal hadoop-site.xml <property> <name>mapred.jobtracker.taskScheduler</name> <value>org.apache.hadoop.mapred.FairScheduler</ value> </property> <property> <name>mapred.fairscheduler.allocation.file</name> <value>/path/to/pools.xml</value> </property>
  • 21. Minimal pools.xml <?xml version="1.0"?> <allocations> </allocations>
  • 22. Configuring a Pool <?xml version="1.0"?> <allocations> <pool name=“emp4"> <minMaps>10</minMaps> <minReduces>5</minReduces> </pool> </allocations>
  • 23. Setting Running Job Limits <?xml version="1.0"?> <allocations> <pool name=“emp4"> <minMaps>10</minMaps> <minReduces>5</minReduces> <maxRunningJobs>3</maxRunningJobs> </pool> <user name=“emp1"> <maxRunningJobs>1</maxRunningJobs> </user> </allocations>
  • 24. Default Per-User Running Job Limit <?xml version="1.0"?> <allocations> <pool name=“emp4"> <minMaps>10</minMaps> <minReduces>5</minReduces> <maxRunningJobs>3</maxRunningJobs> </pool> <user name=“emp1"> <maxRunningJobs>1</maxRunningJobs> </user> <userMaxJobsDefault>10</userMaxJobsDefault> </allocations>
  • 25. Other Parameters mapred.fairscheduler.assignmultiple: •Assign a map and a reduce on each heartbeat; improves ramp-up speed and throughput; recommendation: set to true
  • 26. Other Parameters mapred.fairscheduler.poolnameproperty: •Which JobConfproperty sets what pool a job is in -Default: user.name (one pool per user) -Can make up your own, e.g. “pool.name”, and pass in JobConfwith conf.set(“pool.name”, “mypool”)
  • 27. Useful Setting <property> <name>mapred.fairscheduler.poolnameproperty</name> <value>pool.name</value> </property> <property> <name>pool.name</name> <value>${user.name}</value> </property> Make pool.name default to user.name
  • 28. Issues with Fair Scheduler –Fine-grained sharing at level of map & reduce tasks –Predictable response times and user isolation •Problem:data locality –For efficiency, must run tasks near their input data –Strictly following any job queuing policy hurts locality: job picked by policy may not have data on free nodes •Solution:delay scheduling –Relax queuing policy for limited time to achieve locality
  • 29. The Problem Job 2 Master Job 1 Scheduling order Slave Slave Slave Slave Slave Slave 4 2 1 1 2 2 3 3 9 5 3 3 6 7 5 6 9 4 8 7 8 2 1 1 Task 2 Task 5 Task 3 Task 1 Task 7 Task 4 File 1: File 2:
  • 30. The Problem Job 1 Master Job 2 Scheduling order Slave Slave Slave Slave Slave Slave 4 2 1 2 2 3 9 5 3 3 6 7 5 6 9 4 8 7 8 2 1 1 Task 2 Task 5 3 Task 1 File 1: File 2: Task 1 7 Task 2 Task 4 3 Problem: Fair decision hurts locality Especially bad for jobs with small input files 1 3
  • 31. Solution: Delay Scheduling •Relax queuing policy to make jobs wait for a limited time if they cannot launch local tasks •Result: Very short wait time (1-5s) is enough to get nearly 100% locality
  • 32. Delay Scheduling Example Job 1 Master Job 2 Scheduling order Slave Slave Slave Slave Slave Slave 4 2 1 1 2 2 3 3 9 5 3 3 6 7 5 6 9 4 8 7 8 2 1 1 Task 2 3 File 1: File 2: Task 8 7 Task 2 Task 4 6 Idea: Wait a short time to get data-local scheduling opportunities 5 Task 1 1 Task 3 Wait!
  • 33. Delay Scheduling Details •Scan jobs in order given by queuing policy, picking first that is permitted to launch a task •Jobs must wait before being permitted to launch non-local tasks –If wait < T1, only allow node-local tasks –If T1< wait < T2, also allow rack-local –If wait > T2, also allow off-rack •Increase a job’s time waited when it is skipped
  • 34. Capacity Scheduler •Organizes jobs into queues •Queue shares as %’s of cluster •FIFO scheduling within each queue •Supports preemption •Queues are monitored; if a queue is not consuming its allocated capacity, this excess capacity can be temporarily allocated to other queues.
  • 35. End of session Day –1: Scheduling