Topic 5: MapReduce Theory and Implementation

5: MapReduce Theory and Implementation

Zubair Nabi

zubair.nabi@itu.edu.pk

April 18, 2013

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 1 / 34

Outline

1 Introduction

2 Programming Model

3 Implementation

4 Reﬁnements

5 Hadoop


Common computations at Google

Process large amounts of data generated from crawled documents,
web request logs, etc.



Compute inverted index, graph structure of web documents,
summaries of pages crawled per host, etc.



Common properties:
1 Computation is conceptually simple and is distributed across hundreds
or thousands of machines to leverage parallelism



Common properties:
2 Input data is large



Common properties:
2 Input data is large
3 The original simple computation is made complex by system-level code
to deal with issues of work assignment and distribution, and
fault-tolerance


Enter MapReduce

Based on the insights mentioned in the previous slide, 2 Google
Engineers, Jeff Dean and Sanjay Ghemawat, in 2004 designed
MapReduce


Enter MapReduce

MapReduce
Abstraction that helps the programmer express simple computations


Enter MapReduce

MapReduce
Hides the gory details of parallelization, fault-tolerance, data distribution,
and load balancing


Enter MapReduce

MapReduce
and load balancing
Relies on user-provided map and reduce primitives present in functional
languages


Enter MapReduce

MapReduce
and load balancing
Relies on user-provided map and reduce primitives present in functional
languages
Leverages one key insight: Most of the computation at Google involved
applying a map operator to each logical record in the input dataset to
obtain a set of intermediate key/value pairs and then applying a reduce
operation to all values with the same key, for aggregation


Outline

1 Introduction

2 Programming Model

3 Implementation

4 Reﬁnements

5 Hadoop


Programming Model

Input: Set of key/value pairs


Programming Model

Output: Set of key/value pairs


Programming Model

Output: Set of key/value pairs
The user provides the entire computation in the form of two functions:
map and reduce


User-deﬁned functions

1 Map
Takes an input pair and produces a set of intermediate key/value pairs



1 Map
The framework groups together the intermediate values by key for
consumption by the Reduce



1 Map
2 Reduce
Takes as input a key and a list of associated values



1 Map
2 Reduce
Takes as input a key and a list of associated values
In the common case, it merges these values to result in a smaller set of
values


Example: Word Count

Counting the occurrence of each word in a large collection of documents


Example: Word Count

1 Map
Emits each word and the value 1


Example: Word Count

1 Map
Emits each word and the value 1
2 Reduce
Sums together all counts emitted for a particular word


Example: Word Count(2)

1 map( String key , String value ):

2 // key: document name

3 // value : document contents

4 for each word w in value:

5 EmitIntermediate (w, "1");

6
7 reduce ( String key , Iterator values ):

8 // key: a word

9 // values : a list of counts

10 int result = 0;

11 for each v in values :

12 result += ParseInt (v);

13 Emit( AsString ( result ));


Types

User-supplied map and reduce functions have associated types
1 Map
map(k1, v1) → list(k2, v2)


Types

User-supplied map and reduce functions have associated types
1 Map
map(k1, v1) → list(k2, v2)
2 Reduce
reduce(k2, list(v2)) → list(v2)


More applications

Distributed Grep
1 Map
Emits a line if its matches a user-provided pattern
2 Reduce
Identity function


More applications

Distributed Grep
1 Map
Emits a line if its matches a user-provided pattern
2 Reduce
Identity function
Count of URL Access Frequency
1 Map
Similar to Word Count map. Instead of words we have URLs
2 Reduce
Similar to Word Count reduce


More applications (2)

Inverted Index
1 Map
Emits a sequence of < word, document_ID >
2 Reduce
Emits < word, list(document_ID) >


More applications (2)

Inverted Index
1 Map
Emits a sequence of < word, document_ID >
2 Reduce
Emits < word, list(document_ID) >
Distributed Sort
1 Map
Identity
2 Reduce
Identity


Outline

1 Introduction

2 Programming Model

3 Implementation

4 Reﬁnements

5 Hadoop


Cluster architecture

A large cluster of shared-nothing commodity machines connected via
Ethernet
Each node is an x86 system running Linux with local memory



Ethernet
Commodity networking hardware connected in the form of a tree
topology



Ethernet
topology
As clusters consist of hundreds or thousands of machines, failure is
pretty common



Ethernet
topology
pretty common
Each machine consists of local hard-drives



Ethernet
topology
pretty common
Google Filesystem runs atop of these disks which employs replication to
ensure availability and reliability



Ethernet
topology
pretty common
Google Filesystem runs atop of these disks which employs replication to
ensure availability and reliability
Jobs are submitted to a scheduler, which maps tasks within that job to
available machines within the cluster


MapReduce architecture

1 Master: In charge of all meta data, work scheduling and distribution,
and job orchestration


MapReduce architecture

1 Master: In charge of all meta data, work scheduling and distribution,
and job orchestration
2 Workers: Contain slots to execute map or reduce functions


Execution

1 The user writes map and reduce functions and stitches together a
MapReduce speciﬁcation with the location of the input dataset, number
of reduce tasks, and other attributes


Execution

2 The master logically splits the input dataset into M splits, where
M = (Input_dataset_size)/(GFS_block _size)


Execution

The GFS block size is typically a multiple of 64MB


Execution

3 It then earmarks M map tasks and assigns them to workers. Each
worker has a conﬁgurable number of task slots. Each time a worker
completes a task, the master assigns it more pending map tasks


Execution

3 It then earmarks M map tasks and assigns them to workers. Each
worker has a conﬁgurable number of task slots. Each time a worker
completes a task, the master assigns it more pending map tasks
4 Once all map tasks have completed, the master assigns R reduce
tasks to worker nodes


Mappers

1 A map worker reads the contents of the input split that it has been
assigned


Mappers

assigned
2 It parses the ﬁle and converts it to key/value pairs and invokes the
user-deﬁned map function for each pair


Mappers

assigned
3 The intermediate key/value pairs after the application of the map logic
are collected (buffered) in memory


Mappers

assigned
3 The intermediate key/value pairs after the application of the map logic
are collected (buffered) in memory
4 Once the buffered key/value pairs exceed a threshold they are written
to local disk and partitioned (using a partitioning function) into R
partitions. The location of each partition is passed to the master


Reducers

1 A reduce worker gets locations of its input partitions from the master
and uses HTTP requests to retrieve them


Reducers

2 Once it has read all its input, it sorts it by key to group together all
occurrences of the same key


Reducers

3 It then invokes the user-deﬁned reduce for each key and passes it the
key and its associated values


Reducers

3 It then invokes the user-defined reduce for each key and passes it the
key and its associated values
4 The key/value pairs generated after the application of the reduce logic
are then written to a final output file, which is subsequently written to
the distributed filesystem


Book-keeping by the Master

The master contains meta-data for all jobs running in the cluster



For each map and reduce tasks, it stores the state (pending,
in-progress, or completed) and the ID of the worker on which it is
executing (in-progress state)



For each map and reduce tasks, it stores the state (pending,
in-progress, or completed) and the ID of the worker on which it is
executing (in-progress state)
It stores the locations and sizes of partitions for each map task


Fault-tolerance

For large compute clusters, failures are the norm rather than the exception


Fault-tolerance

1 Worker:
Each worker sends a periodic heartbeat signal to the master


Fault-tolerance

1 Worker:
If the master does not receive a heartbeat from a worker in a certain
amount of time, it marks the worker as failed


Fault-tolerance

1 Worker:
In-progress map and reduce tasks are simply re-executed on other
nodes. Same goes for completed map tasks (as their output is lost on
machine failure)


Fault-tolerance

1 Worker:
machine failure)
Completed reduce tasks are not re-executed as their output resides on


Fault-tolerance

1 Worker:
machine failure)
2 Master:
The entire computation is marked as failed


Fault-tolerance

1 Worker:
machine failure)
2 Master:
The entire computation is marked as failed
But simple to keep the master soft state and re-spawn


Locality

Network bandwidth is a scare resource in typical clusters


Locality

GFS slices ﬁles into 64MB blocks and stores 3 replicas across the
cluster


Locality

GFS slices ﬁles into 64MB blocks and stores 3 replicas across the
cluster
The master exploits this information by scheduling a map task near its
input data. Preference is in the order, node-local, rack/switch-local, and
any


Speculative re-execution

Every now and then the entire computation is held-up by a “straggler”
task



task
Stragglers can arise due to a number of reasons, such as machine
load, network trafﬁc, software/hardware bugs, etc.



task
To deal with stragglers, the master speculatively re-executes slow tasks
on other machines



task
To deal with stragglers, the master speculatively re-executes slow tasks
on other machines
The task is marked as completed whenever the primary or the backup
ﬁnishes its execution


Scalability

Possible to run on multiple scales: from single nodes to data centers
with tens of thousands of nodes


Scalability

Possible to run on multiple scales: from single nodes to data centers
with tens of thousands of nodes
Nodes can be added/removed on the ﬂy to scale up/down


Outline

1 Introduction

2 Programming Model

3 Implementation

4 Reﬁnements

5 Hadoop


Partitioning

By default MapReduce uses hash partitioning to partition the key
space
hash(key) % R


Partitioning

By default MapReduce uses hash partitioning to partition the key
space
hash(key) % R
Optionally, the user can provide a custom partitioning function to say,
negate skew or to ensure that certain keys always end up at a
particular reduce worker


Combiner function

For reduce functions which are commutative and associative, the user
can additionally provide a combiner function which is applied to the
output of the map for local merging


Combiner function

For reduce functions which are commutative and associative, the user
can additionally provide a combiner function which is applied to the
output of the map for local merging
Typically, the same reduce function is used as a combiner


Input/output formats

By default, the library supports a number of input/output formats



For instance, text as input and key/value pairs as output



Optionally, the user can specify custom input readers and output
writers



Optionally, the user can specify custom input readers and output
writers
For instance, to read/write from/to a database


Outline

1 Introduction

2 Programming Model

3 Implementation

4 Reﬁnements

5 Hadoop


Hadoop

Open-source implementation of MapReduce, developed by Doug
Cutting originally at Yahoo! in 2004


Hadoop

Now a top-level Apache open-source project


Hadoop

Implemented in Java (Google’s in-house implementation is in C++)


Hadoop

Implemented in Java (Google’s in-house implementation is in C++)
Comes with an associated distributed ﬁlesystem, HDFS (clone of GFS)


References

Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simpliﬁed
data processing on large clusters. In Proceedings of the 6th
Symposium on Operating Systems Design & Implementation -
(OSDI’04), Vol. 6. USENIX Association, Berkeley, CA, USA.


Topic 5: MapReduce Theory and Implementation

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Topic 5: MapReduce Theory and Implementation

Ähnlich wie Topic 5: MapReduce Theory and Implementation (20)

Mehr von Zubair Nabi

Mehr von Zubair Nabi (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Topic 5: MapReduce Theory and Implementation