1. MAPREDUSE MODEL
Presented By:
Kalyani wankhede (Roll No.606008)
Guided by:
Prof. Himangi Pande
Seminar On
2. Outline…
What is MapReduce?
MapReduce used for?
MapReduce Runtime
MapReduce Programming Model
Example: Word Count
Fault Tolerance in MapReduce
6-Nov-14
2
3. What is MapReduce?
Simple data-parallel programming model designed for scalability
and fault-tolerance
Pioneered by Google
Processes 20 petabytes of data per day
Popularized by Hadoop project
6-Nov-14
3
4. MapReduce used for?
At Google:
Index construction for Google Search
Statistical machine translation
At Facebook:
Data mining
Ad optimization
Spam detection
In research:
Bioinformatics
Natural language processing
6-Nov-14
4
5. MapReduce “Runtime”
Handles
Scheduling
Data distribution
Synchronization
Errors and faults
Speculative execution
6-Nov-14
5
6. MapReduce Programming Model
Consists of two components:
Job Tracker (master node):
Accepting job requests
Splitting data input
Assigned task to be executed in parallel
Monitoring process and handling failures
Many Task Tracker (slave nodes)
Executes tasks
Task can be either map or reduce(running in parallel)
6-Nov-14
6
8. Example: Word Count
def mapper(line):
foreach word in line.split():
output(word, 1)
def reducer(key, values):
output(key, sum(values))
6-Nov-14
8
9. Word Count Execution
Input Map Shuffle & Sort Reduce Output
the quick
brown fox
the fox ate
the mouse
how now
brown cow
Map
Map
Map
Reduc
e
Reduc
e
brown, 2
fox, 2
how, 1
now, 1
the, 3
ate, 1
cow, 1
mouse, 1
quick, 1
the, 1
brown, 1
fox, 1
quick, 1
the, 1
fox, 1
the, 1
how, 1
now, 1
brown, 1
ate, 1
mouse, 1
cow, 1
6-Nov-14
9
10. An optimization: Combiner
Works for associative functions like sum, count, max
Decreases size of intermediate data
Example: map-side aggregation for Word Count:
def combiner(key, values):
output(key, sum(values))
6-Nov-14
10
11. Word Count with Combiner
Input Map & Combine Shuffle & Sort Reduce Output
the quick
brown fox
the fox ate
the mouse
how now
brown cow
Map
Map
Map
Reduc
e
Reduc
e
brown, 2
fox, 2
how, 1
now, 1
the, 3
ate, 1
cow, 1
mouse, 1
quick, 1
the, 1
brown, 1
fox, 1
quick, 1
the, 2
fox, 1
how, 1
now, 1
brown, 1
ate, 1
mouse, 1
cow, 1
6-Nov-14
11
12. Fault Tolerance in MapReduce
1. If a task crashes:
Retry on another node
2. If a node crashes:
Re-launch its current tasks on other nodes
3. If a task is going slowly:
Launch second copy of task on another node
Take the output of whichever copy finishes first, and kill the other
6-Nov-14
12