SlideShare ist ein Scribd-Unternehmen logo
1 von 137
Downloaden Sie, um offline zu lesen
MapReduce Algorithm Design	

Jimmy Lin	

University of Maryland	

Monday, May 13, 2013	

WWW 2013 Tutorial, Rio de Janeiro	

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States
See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details
Source: Wikipedia (All Souls College, Oxford)
From the Ivory Tower…
Source: Wikipedia (Factory)
… to building sh*t that works
Source: Wikipedia (All Souls College, Oxford)
… and back.
More about me…	

¢  Past MapReduce teaching experience:	

l  Numerous tutorials	

l  Several semester-long MapReduce courses	

¢  Lin  Dyer MapReduce textbook	

http://mapreduce.cc/
Follow me at @lintool	

http://lintool.github.io/MapReduce-course-2013s/
What we’ll cover	

¢  Big data	

¢  MapReduce overview	

¢  Importance of local aggregation	

¢  Sequencing computations	

¢  Iterative graph algorithms	

¢  MapReduce and abstract algebra	

Focus on design patterns and general principles
What we won’t cover	

¢  MapReduce for machine learning (supervised and unsupervised)	

¢  MapReduce for similar item detection	

¢  MapReduce for information retrieval	

¢  Hadoop for data warehousing	

¢  Extensions and alternatives to MapReduce
Source: Wikipedia (Hard disk drive)
Big Data
How much data?	

10 PB data, 75B DB
calls per day (6/2012)	

processes 20 PB a day (2008)	

crawls 20B web pages a day (2012)	

100 PB of user data + 
500 TB/day (8/2012)	

Wayback Machine: 240B web
pages archived, 5 PB (1/2013)	

LHC: ~15 PB a year
	

LSST: 6-10 PB a year 
(~2015)	

640K ought to be
enough for anybody.	

150 PB on 50k+ servers 
running 15k apps (6/2011)	

S3: 449B objects, peak 290k
request/second (7/2011)	

1T objects (6/2012)	

SKA: 0.3 – 1.5 EB 
per year (~2020)
Source: Wikipedia (Everest)
Why big data?	

 Science	

Engineering	

Commerce
Emergence of the 4th Paradigm	

Data-intensive e-Science	

Maximilien Brice, © CERN
Science
Engineering	

The unreasonable effectiveness of data	

Count and normalize!	

Source: Wikipedia (Three Gorges Dam)
No data like more data!	

(Banko and Brill, ACL 2001)
(Brants et al., EMNLP 2007)
s/knowledge/data/g;
Commerce	

Know thy customers	

Data → Insights → Competitive advantages 	

Source: Wikiedia (Shinjuku, Tokyo)
How big data?	

Why big data?	

Source: Wikipedia (Noctilucent cloud)
Source: Google
MapReduce
Typical Big Data Problem	

¢  Iterate over a large number of records	

¢  Extract something of interest from each	

¢  Shuffle and sort intermediate results	

¢  Aggregate intermediate results	

¢  Generate final output	

Key idea: provide a functional
abstraction for these two operations	

Map	

Reduce	

(Dean and Ghemawat, OSDI 2004)
g g g g g
f f f f fMap	

Fold	

Roots in Functional Programming
MapReduce	

¢  Programmers specify two functions:	

map (k1, v1) → [k2, v2]	

reduce (k2, [v2]) → [k3, v3]	

l  All values with the same key are sent to the same reducer	

¢  The execution framework handles everything else…
mapmap map map
Shuffle and Sort: aggregate values by keys
reduce reduce reduce
k1 k2 k3 k4 k5 k6v1 v2 v3 v4 v5 v6
ba 1 2 c c3 6 a c5 2 b c7 8
a 1 5 b 2 7 c 2 3 6 8
r1 s1 r2 s2 r3 s3
MapReduce	

¢  Programmers specify two functions:	

map (k, v) → k’, v’*	

reduce (k’, v’) → k’, v’*	

l  All values with the same key are sent to the same reducer	

¢  The execution framework handles everything else…	

What’s “everything else”?
MapReduce “Runtime”	

¢  Handles scheduling	

l  Assigns workers to map and reduce tasks	

¢  Handles “data distribution”	

l  Moves processes to data	

¢  Handles synchronization	

l  Gathers, sorts, and shuffles intermediate data	

¢  Handles errors and faults	

l  Detects worker failures and restarts	

¢  Everything happens on top of a distributed filesystem
MapReduce	

¢  Programmers specify two functions:	

map (k, v) → k’, v’*	

reduce (k’, v’) → k’, v’*	

l  All values with the same key are reduced together	

¢  The execution framework handles everything else…	

¢  Not quite…usually, programmers also specify:	

partition (k’, number of partitions) → partition for k’	

l  Often a simple hash of the key, e.g., hash(k’) mod n	

l  Divides up key space for parallel reduce operations	

combine (k’, v’) → k’, v’*	

l  Mini-reducers that run in memory after the map phase	

l  Used as an optimization to reduce network traffic
combinecombine combine combine
ba 1 2 c 9 a c5 2 b c7 8
partition partition partition partition
mapmap map map
k1 k2 k3 k4 k5 k6v1 v2 v3 v4 v5 v6
ba 1 2 c c3 6 a c5 2 b c7 8
Shuffle and Sort: aggregate values by keys
reduce reduce reduce
a 1 5 b 2 7 c 2 9 8
r1 s1 r2 s2 r3 s3
c 2 3 6 8
Two more details…	

¢  Barrier between map and reduce phases	

l  But intermediate data can be copied over as soon as mappers finish	

¢  Keys arrive at each reducer in sorted order	

l  No enforced ordering across reducers
What’s the big deal?	

¢  Developers need the right level of abstraction	

l  Moving beyond the von Neumann architecture	

l  We need better programming models	

¢  Abstractions hide low-level details from the developers	

l  No more race conditions, lock contention, etc.	

¢  MapReduce separating the what from how	

l  Developer specifies the computation that needs to be performed	

l  Execution framework (“runtime”) handles actual execution
Source: Google
The datacenter is the computer!
Source: Google
MapReduce can refer to…	

¢  The programming model	

¢  The execution framework (aka “runtime”)	

¢  The specific implementation	

Usage is usually clear from context!
MapReduce Implementations	

¢  Google has a proprietary implementation in C++	

l  Bindings in Java, Python	

¢  Hadoop is an open-source implementation in Java	

l  Development led by Yahoo, now an Apache project	

l  Used in production at Yahoo, Facebook, Twitter, LinkedIn, Netflix, …	

l  The de facto big data processing platform	

l  Rapidly expanding software ecosystem	

¢  Lots of custom research implementations	

l  For GPUs, cell processors, etc.
MapReduce algorithm design	

¢  The execution framework handles “everything else”…	

l  Scheduling: assigns workers to map and reduce tasks	

l  “Data distribution”: moves processes to data	

l  Synchronization: gathers, sorts, and shuffles intermediate data	

l  Errors and faults: detects worker failures and restarts	

¢  Limited control over data and execution flow	

l  All algorithms must expressed in m, r, c, p	

¢  You don’t know:	

l  Where mappers and reducers run	

l  When a mapper or reducer begins or finishes	

l  Which input a particular mapper is processing	

l  Which intermediate key a particular reducer is processing
Implementation Details	

Source: www.flickr.com/photos/8773361@N05/2524173778/
Adapted from (Ghemawat et al., SOSP 2003)
(file name, block id)
(block id, block location)
instructions to datanode
datanode state
(block id, byte range)
block data
HDFS namenode
HDFS datanode
Linux file system
…
HDFS datanode
Linux file system
…
File namespace
/foo/bar
block 3df2
Application
HDFS Client
HDFS Architecture
Putting everything together…	

datanode daemon
Linux file system
…
tasktracker
slave node
datanode daemon
Linux file system
…
tasktracker
slave node
datanode daemon
Linux file system
…
tasktracker
slave node
namenode
namenode daemon
job submission node
jobtracker
Shuffle and Sort	

Mapper	

Reducer	

other mappers	

other reducers	

circular buffer 
(in memory)	

spills (on disk)	

merged spills 
(on disk)	

intermediate files 
(on disk)	

Combiner	

Combiner
Preserving State	

Mapper object	

setup	

map	

cleanup	

state	

one object per task	

Reducer object	

setup	

reduce	

close	

state	

one call per input 
key-value pair	

one call per 
intermediate key	

API initialization hook	

API cleanup hook
Implementation Don’ts	

¢  Don’t unnecessarily create objects	

l  Object creation is costly	

l  Garbage collection is costly	

¢  Don’t buffer objects	

l  Processes have limited heap size (remember, commodity machines)	

l  May work for small datasets, but won’t scale!
Secondary Sorting	

¢  MapReduce sorts input to reducers by key	

l  Values may be arbitrarily ordered	

¢  What if want to sort value also?	

l  E.g., k → (v1, r), (v3, r), (v4, r), (v8, r)…
Secondary Sorting: Solutions	

¢  Solution 1:	

l  Buffer values in memory, then sort	

l  Why is this a bad idea?	

¢  Solution 2:	

l  “Value-to-key conversion” design pattern: form composite intermediate
key, (k, v1)	

l  Let execution framework do the sorting	

l  Preserve state across multiple key-value pairs to handle processing	

l  Anything else we need to do?
Local Aggregation	

Source: www.flickr.com/photos/bunnieswithsharpteeth/490935152/
Importance of Local Aggregation	

¢  Ideal scaling characteristics:	

l  Twice the data, twice the running time	

l  Twice the resources, half the running time	

¢  Why can’t we achieve this?	

l  Synchronization requires communication	

l  Communication kills performance (network is slow!)	

¢  Thus… avoid communication!	

l  Reduce intermediate data via local aggregation	

l  Combiners can help
Word Count: Baseline	

What’s the impact of combiners?
Word Count: Version 1	

Are combiners still needed?
Word Count: Version 2	

Are combiners still needed?
Design Pattern for Local Aggregation	

¢  “In-mapper combining”	

l  Fold the functionality of the combiner into the mapper by preserving
state across multiple map calls	

¢  Advantages	

l  Speed	

l  Why is this faster than actual combiners?	

¢  Disadvantages	

l  Explicit memory management required	

l  Potential for order-dependent bugs
Combiner Design	

¢  Combiners and reducers share same method signature	

l  Sometimes, reducers can serve as combiners	

l  Often, not…	

¢  Remember: combiner are optional optimizations	

l  Should not affect algorithm correctness	

l  May be run 0, 1, or multiple times	

¢  Example: find average of integers associated with the same key
Computing the Mean: Version 1	

Why can’t we use reducer as combiner?
Computing the Mean: Version 2	

Why doesn’t this work?
Computing the Mean: Version 3	

Fixed?
Computing the Mean: Version 4	

Are combiners still needed?
Sequencing Computations	

Source: www.flickr.com/photos/richardandgill/565921252/
Sequencing Computations	

1.  Turn synchronization into a sorting problem	

l  Leverage the fact that keys arrive at reducers in sorted order	

l  Manipulate the sort order and partitioning scheme to deliver partial
results at appropriate junctures	

2.  Create appropriate algebraic structures to capture computation	

l  Build custom data structures to accumulate partial results
Algorithm Design: Running Example	

¢  Term co-occurrence matrix for a text collection	

l  M = N x N matrix (N = vocabulary size)	

l  Mij: number of times i and j co-occur in some context 
(for concreteness, let’s say context = sentence)	

¢  Why?	

l  Distributional profiles as a way of measuring semantic distance	

l  Semantic distance useful for many language processing tasks	

l  Basis for large classes of more sophisticated algorithms
MapReduce: Large Counting Problems	

¢  Term co-occurrence matrix for a text collection
= specific instance of a large counting problem	

l  A large event space (number of terms)	

l  A large number of observations (the collection itself)	

l  Goal: keep track of interesting statistics about the events	

¢  Basic approach	

l  Mappers generate partial counts	

l  Reducers aggregate partial counts	

How do we aggregate partial counts efficiently?
First Try: “Pairs”	

¢  Each mapper takes a sentence:	

l  Generate all co-occurring term pairs	

l  For all pairs, emit (a, b) → count	

¢  Reducers sum up counts associated with these pairs	

¢  Use combiners!
Pairs: Pseudo-Code
“Pairs” Analysis	

¢  Advantages	

l  Easy to implement, easy to understand	

¢  Disadvantages	

l  Lots of pairs to sort and shuffle around (upper bound?)	

l  Not many opportunities for combiners to work
Another Try: “Stripes”	

¢  Idea: group together pairs into an associative array	

¢  Each mapper takes a sentence:	

l  Generate all co-occurring term pairs	

l  For each term, emit a → { b: countb, c: countc, d: countd … }	

¢  Reducers perform element-wise sum of associative arrays	

(a, b) → 1
(a, c) → 2
(a, d) → 5
(a, e) → 3
(a, f) → 2
a → { b: 1, c: 2, d: 5, e: 3, f: 2 }
a → { b: 1, d: 5, e: 3 }
a → { b: 1, c: 2, d: 2, f: 2 }
a → { b: 2, c: 2, d: 7, e: 3, f: 2 }
+
Key idea: cleverly-constructed data structure	

for aggregating partial results
Stripes: Pseudo-Code
“Stripes” Analysis	

¢  Advantages	

l  Far less sorting and shuffling of key-value pairs	

l  Can make better use of combiners	

¢  Disadvantages	

l  More difficult to implement	

l  Underlying object more heavyweight	

l  Fundamental limitation in terms of size of event space
Cluster size: 38 cores
Data Source: Associated Press Worldstream (APW) of the English Gigaword Corpus (v3),
which contains 2.27 million documents (1.8 GB compressed, 5.7 GB uncompressed)
Relative Frequencies	

¢  How do we estimate relative frequencies from counts?	

¢  Why do we want to do this?	

¢  How do we do this with MapReduce?	

f(B|A) =
N(A, B)
N(A)
=
N(A, B)
P
B0 N(A, B0)
f(B|A): “Stripes” 	

¢  Easy!	

l  One pass to compute (a, *)	

l  Another pass to directly compute f(B|A)	

a → {b1:3, b2 :12, b3 :7, b4 :1, … }
f(B|A): “Pairs” 	

¢  What’s the issue?	

l  Computing relative frequencies requires marginal counts	

l  But the marginal cannot be computed until you see all counts	

l  Buffering is a bad idea!	

¢  Solution:	

l  What if we could get the marginal count to arrive at the reducer first?
f(B|A): “Pairs” 	

¢  For this to work:	

l  Must emit extra (a, *) for every bn in mapper	

l  Must make sure all a’s get sent to same reducer (use partitioner)	

l  Must make sure (a, *) comes first (define sort order)	

l  Must hold state in reducer across different key-value pairs	

(a, b1) → 3
(a, b2) → 12
(a, b3) → 7
(a, b4) → 1
…
(a, *) → 32
(a, b1) → 3 / 32
(a, b2) → 12 / 32
(a, b3) → 7 / 32
(a, b4) → 1 / 32
…
Reducer holds this value in memory
“Order Inversion”	

¢  Common design pattern:	

l  Take advantage of sorted key order at reducer to sequence
computations	

l  Get the marginal counts to arrive at the reducer before the joint counts	

¢  Optimization:	

l  Apply in-memory combining pattern to accumulate marginal counts
Synchronization: Pairs vs. Stripes	

¢  Approach 1: turn synchronization into an ordering problem	

l  Sort keys into correct order of computation	

l  Partition key space so that each reducer gets the appropriate set of
partial results	

l  Hold state in reducer across multiple key-value pairs to perform
computation	

l  Illustrated by the “pairs” approach	

¢  Approach 2: construct data structures to accumulate partial
results	

l  Each reducer receives all the data it needs to complete the computation	

l  Illustrated by the “stripes” approach
Issues and Tradeoffs	

¢  Number of key-value pairs	

l  Object creation overhead	

l  Time for sorting and shuffling pairs across the network	

¢  Size of each key-value pair	

l  De/serialization overhead
Lots are algorithms are just
fancy conditional counts!	

Source: http://www.flickr.com/photos/guvnah/7861418602/
Hidden Markov Models	

An HMM is characterized by:	

l  N states:	

l  N x N Transition probability matrix	

l  V observation symbols:	

l  N x |V| Emission probability matrix

	

l  Prior probabilities vector	

aij = p(qj|qi)
X
j
aij = 1 8i
A = [aij]
NX
i=1
⇡i = 1
= (A, B, ⇧)
Q = {q1, q2, . . . qN }
O = {o1, o2, . . . oV }
B = [biv]
biv = bi(ov) = p(ov|qi)
⇧ = [⇡i, ⇡2, . . . ⇡N ]
Forward-Backward	

t(j)
....	

qj
....	

↵t(j)
otot 1 ot+1
↵t(j) = P(o1, o2 . . . ot, qt = j| ) t(j) = P(ot+1, ot+2...oT |qt = i, )
Estimating Emissions Probabilities	

¢  Basic idea:	

¢  Let’s define:	

¢  Thus:	

bj(vk) =	

expected number of times in state j and observing symbol vk	

expected number of times in state j	

t(j) =
P(qt = j, O| )
P(O| )
=
↵t(j) t(j)
P(O| )
ˆbj(vk) =
PT
i=1Ot=vk
t(j)
PT
i=1 t(j)
Forward-Backward	

....	

qj
....	

otot 1 ot+1 ot+2
qi
↵t(i) t+1(j)
aijbj(ot+1)
Estimating Transition Probabilities	

¢  Basic idea:	

¢  Let’s define:	

¢  Thus:	

aij =	

expected number of transitions from state i to state j	

expected number of transitions from state i	

⇠t(i, j) =
↵t(i)aijbj(ot+1) t+1(j)
P(O| )
ˆaij =
PT 1
t=1 ⇠t(i, j)
PT 1
t=1
PN
j=1 ⇠t(i, j)
MapReduce Implementation: Mapper	

1: class Mapper
2: method Initialize(integer iteration)
3: hS, Oi ReadModel
4: ✓ hA, B, ⇡i ReadModelParams(iteration)
5: method Map(sample id, sequence x)
6: ↵ Forward(x, ✓) . cf. Section 6.2.2
7: Backward(x, ✓) . cf. Section 6.2.4
8: I new AssociativeArray . Initial state expectations
9: for all q 2 S do . Loop over states
10: I{q} ↵1(q) · 1(q)
11: O new AssociativeArray of AssociativeArray . Emissions
12: for t = 1 to |x| do . Loop over observations
13: for all q 2 S do . Loop over states
14: O{q}{xt} O{q}{xt} + ↵t(q) · t(q)
15: t t + 1
16: T new AssociativeArray of AssociativeArray . Transitions
17: for t = 1 to |x| 1 do . Loop over observations
18: for all q 2 S do . Loop over states
19: for all r 2 S do . Loop over states
20: T{q}{r} T{q}{r} + ↵t(q) · Aq(r) · Br(xt+1) · t+1(r)
21: t t + 1
22: Emit(string ‘initial’, stripe I)
23: for all q 2 S do . Loop over states
24: Emit(string ‘emit from ’ + q, stripe O{q})
25: Emit(string ‘transit from ’ + q, stripe T{q})
ˆbj(vk) =
PT
i=1Ot=vk
t(j)
PT
i=1 t(j)
ˆaij =
PT 1
t=1 ⇠t(i, j)
PT 1
t=1
PN
j=1 ⇠t(i, j)
t(j) =
↵t(j) t(j)
P(O| )
⇠t(i, j) =
↵t(i)aijbj(ot+1) t+1(j)
P(O| )
MapReduce Implementation: Reducer	

1: class Combiner
2: method Combine(string t, stripes [C1, C2, . . .])
3: Cf new AssociativeArray
4: for all stripe C 2 stripes [C1, C2, . . .] do
5: Sum(Cf , C)
6: Emit(string t, stripe Cf )
1: class Reducer
2: method Reduce(string t, stripes [C1, C2, . . .])
3: Cf new AssociativeArray
4: for all stripe C 2 stripes [C1, C2, . . .] do
5: Sum(Cf , C)
6: z 0
7: for all hk, vi 2 Cf do
8: z z + v
9: Pf new AssociativeArray . Final parameters vector
10: for all hk, vi 2 Cf do
11: Pf {k} v/z
12: Emit(string t, stripe Pf )
Figure 6.9: Combiner and reducer pseudo-code for training hidden Markov models using EM.
The HMMs considered in this book are fully parameterized by multinomial distributions, so
reducers do not require special logic to handle di↵erent types of model parameters (since they
are all of the same type).
ˆbj(vk) =
PT
i=1Ot=vk
t(j)
PT
i=1 t(j)
ˆaij =
PT 1
t=1 ⇠t(i, j)
PT 1
t=1
PN
j=1 ⇠t(i, j)
t(j) =
↵t(j) t(j)
P(O| )
⇠t(i, j) =
↵t(i)aijbj(ot+1) t+1(j)
P(O| )
Iterative Algorithms: Graphs	

Source: Wikipedia (Water wheel)
What’s a graph?	

¢  G = (V,E), where	

l  V represents the set of vertices (nodes)	

l  E represents the set of edges (links)	

l  Both vertices and edges may contain additional information	

¢  Different types of graphs:	

l  Directed vs. undirected edges	

l  Presence or absence of cycles	

¢  Graphs are everywhere:	

l  Hyperlink structure of the web	

l  Physical structure of computers on the Internet	

l  Interstate highway system	

l  Social networks
Source: Wikipedia (Königsberg)
Source: Wikipedia (Kaliningrad)
Some Graph Problems	

¢  Finding shortest paths	

l  Routing Internet traffic and UPS trucks	

¢  Finding minimum spanning trees	

l  Telco laying down fiber	

¢  Finding Max Flow	

l  Airline scheduling	

¢  Identify “special” nodes and communities	

l  Breaking up terrorist cells, spread of avian flu	

¢  Bipartite matching	

l  Monster.com, Match.com	

¢  And of course... PageRank
Graphs and MapReduce	

¢  A large class of graph algorithms involve:	

l  Performing computations at each node: based on node features, edge
features, and local link structure	

l  Propagating computations: “traversing” the graph	

¢  Key questions:	

l  How do you represent graph data in MapReduce?	

l  How do you traverse a graph in MapReduce?	

In reality: graph algorithms 
in MapReduce suck!
Representing Graphs	

¢  G = (V, E)	

¢  Two common representations	

l  Adjacency matrix	

l  Adjacency list
Adjacency Matrices	

Represent a graph as an n x n square matrix M	

l  n = |V|	

l  Mij = 1 means a link from node i to j	

1	

 2	

 3	

 4	

1	

 0	

 1	

 0	

 1	

2	

 1	

 0	

 1	

 1	

3	

 1	

 0	

 0	

 0	

4	

 1	

 0	

 1	

 0	

1	

2	

3	

4
Adjacency Matrices: Critique	

¢  Advantages:	

l  Amenable to mathematical manipulation	

l  Iteration over rows and columns corresponds to computations on
outlinks and inlinks	

¢  Disadvantages:	

l  Lots of zeros for sparse matrices	

l  Lots of wasted space
Adjacency Lists	

Take adjacency matrices… and throw away all the zeros	

1: 2, 4	

2: 1, 3, 4	

3: 1	

4: 1, 3	

1 2 3 4
1 0 1 0 1
2 1 0 1 1
3 1 0 0 0
4 1 0 1 0
Adjacency Lists: Critique	

¢  Advantages:	

l  Much more compact representation	

l  Easy to compute over outlinks	

¢  Disadvantages:	

l  Much more difficult to compute over inlinks
Single-Source Shortest Path	

¢  Problem: find shortest path from a source node to one or
more target nodes	

l  Shortest might also mean lowest weight or cost	

¢  Single processor machine: Dijkstra’s Algorithm	

¢  MapReduce: parallel breadth-first search (BFS)
Finding the Shortest Path	

¢  Consider simple case of equal edge weights	

¢  Solution to the problem can be defined inductively	

¢  Here’s the intuition:	

l  Define: b is reachable from a if b is on adjacency list of a	

	

DISTANCETO(s) = 0	

l  For all nodes p reachable from s, 
	

DISTANCETO(p) = 1	

l  For all nodes n reachable from some other set of nodes M,
	

DISTANCETO(n) = 1 + min(DISTANCETO(m), m ∈ M)	

s	

m3	

m2	

m1	

n	

…	

…	

…	

d1	

d2	

d3
Source: Wikipedia (Wave)
Visualizing Parallel BFS	

n0
n3
n2
n1
n7
n6
n5
n4
n9
n8
From Intuition to Algorithm	

¢  Data representation:	

l  Key: node n	

l  Value: d (distance from start), adjacency list (nodes reachable from n)	

l  Initialization: for all nodes except for start node, d = ∞	

¢  Mapper:	

l  ∀m ∈ adjacency list: emit (m, d + 1)	

¢  Sort/Shuffle	

l  Groups distances by reachable nodes	

¢  Reducer:	

l  Selects minimum distance path for each reachable node	

l  Additional bookkeeping needed to keep track of actual path
Multiple Iterations Needed	

¢  Each MapReduce iteration advances the “frontier” by one hop	

l  Subsequent iterations include more and more reachable nodes as
frontier expands	

l  Multiple iterations are needed to explore entire graph	

¢  Preserving graph structure:	

l  Problem: Where did the adjacency list go?	

l  Solution: mapper emits (n, adjacency list) as well
BFS Pseudo-Code
Stopping Criterion	

¢  When a node is first discovered, we’ve found the shortest path	

l  Maximum number of iterations is equal to the diameter of the graph	

¢  Practicalities of implementation in MapReduce
Comparison to Dijkstra	

¢  Dijkstra’s algorithm is more efficient 	

l  At each step, only pursues edges from minimum-cost path inside frontier	

¢  MapReduce explores all paths in parallel	

l  Lots of “waste”	

l  Useful work is only done at the “frontier”	

¢  Why can’t we do better using MapReduce?
Single Source: Weighted Edges	

¢  Now add positive weights to the edges	

l  Why can’t edge weights be negative?	

¢  Simple change: add weight w for each edge in adjacency list	

l  In mapper, emit (m, d + wp) instead of (m, d + 1) for each node m	

¢  That’s it?
Stopping Criterion	

¢  How many iterations are needed in parallel BFS (positive edge
weight case)?	

¢  When a node is first discovered, we’ve found the shortest path	

Not true!
Additional Complexities	

s
p
q
r
search frontier
10
n1
n2
n3
n4
n5
n6 n7
n8
n9
1
1
1
1
1
1
1
1
Stopping Criterion	

¢  How many iterations are needed in parallel BFS (positive edge
weight case)?	

¢  Practicalities of implementation in MapReduce
All-Pairs?	

¢  Floyd-Warshall Algorithm: difficult to MapReduce-ify…	

¢  Multiple-source shortest paths in MapReduce: run multiple
parallel BFS simultaneously	

l  Assume source nodes {s0, s1, … sn}	

l  Instead of emitting a single distance, emit an array of distances, with
respect to each source	

l  Reducer selects minimum for each element in array	

¢  Does this scale?
Application: Social Search	

Source: Wikipedia (Crowd)
Social Search	

¢  When searching, how to rank friends named “John”?	

l  Assume undirected graphs	

l  Rank matches by distance to user	

¢  Naïve implementations:	

l  Precompute all-pairs distances	

l  Compute distances at query time	

¢  Can we do better?
Landmark Approach (aka sketches)	

¢  Select n seeds {s0, s1, … sn}	

¢  Compute distances from seeds to every node:	

l  What can we conclude about distances?	

l  Insight: landmarks bound the maximum path length	

¢  Lots of details:	

l  How to more tightly bound distances	

l  How to select landmarks (random isn’t the best…)	

¢  Use multi-source parallel BFS implementation in MapReduce!	

A 	

=	

[2, 1, 1]	

B 	

=	

[1, 1, 2]	

C 	

=	

[4, 3, 1]	

D	

=	

[1, 2, 4]
Source: Wikipedia (Wave)
pause/
Graphs and MapReduce	

¢  A large class of graph algorithms involve:	

l  Performing computations at each node: based on node features, edge
features, and local link structure	

l  Propagating computations: “traversing” the graph	

¢  Generic recipe:	

l  Represent graphs as adjacency lists	

l  Perform local computations in mapper	

l  Pass along partial results via outlinks, keyed by destination node	

l  Perform aggregation in reducer on inlinks to a node	

l  Iterate until convergence: controlled by external “driver”	

l  Don’t forget to pass the graph structure between iterations
Given page x with inlinks t1…tn, where	

l  C(t) is the out-degree of t	

l  α is probability of random jump	

l  N is the total number of nodes in the graph	

PageRank	

X	

t1	

t2	

tn	

…	

PR(x) = ↵
✓
1
N
◆
+ (1 ↵)
nX
i=1
PR(ti)
C(ti)
Computing PageRank	

¢  Properties of PageRank	

l  Can be computed iteratively	

l  Effects at each iteration are local	

¢  Sketch of algorithm:	

l  Start with seed PRi values	

l  Each page distributes PRi “credit” to all pages it links to	

l  Each target page adds up “credit” from multiple in-bound links to
compute PRi+1	

l  Iterate until values converge
Simplified PageRank	

¢  First, tackle the simple case:	

l  No random jump factor	

l  No dangling nodes	

¢  Then, factor in these complexities…	

l  Why do we need the random jump?	

l  Where do dangling nodes come from?
Sample PageRank Iteration (1)	

n1 (0.2)
n4 (0.2)
n3 (0.2)
n5 (0.2)
n2 (0.2)
0.1
0.1
0.2 0.2
0.1
0.1
0.066 0.066
0.066
n1 (0.066)
n4 (0.3)
n3 (0.166)
n5 (0.3)
n2 (0.166)Iteration 1
Sample PageRank Iteration (2)	

n1 (0.066)
n4 (0.3)
n3 (0.166)
n5 (0.3)
n2 (0.166)
0.033
0.033
0.3 0.166
0.083
0.083
0.1 0.1
0.1
n1 (0.1)
n4 (0.2)
n3 (0.183)
n5 (0.383)
n2 (0.133)Iteration 2
PageRank in MapReduce	

n5 [n1, n2, n3]n1 [n2, n4] n2 [n3, n5] n3 [n4] n4 [n5]
n2 n4 n3 n5 n1 n2 n3n4 n5
n2 n4n3 n5n1 n2 n3 n4 n5
n5 [n1, n2, n3]n1 [n2, n4] n2 [n3, n5] n3 [n4] n4 [n5]
Map
Reduce
PageRank Pseudo-Code
Complete PageRank	

¢  Two additional complexities	

l  What is the proper treatment of dangling nodes?	

l  How do we factor in the random jump factor?	

¢  Solution: 	

l  Second pass to redistribute “missing PageRank mass” and account for
random jumps	

l  p is PageRank value from before, p' is updated PageRank value	

l  N is the number of nodes in the graph	

l  m is the missing PageRank mass	

¢  Additional optimization: make it a single pass!	

p0
= ↵
✓
1
N
◆
+ (1 ↵)
⇣m
N
+ p
⌘
PageRank Convergence	

¢  Alternative convergence criteria	

l  Iterate until PageRank values don’t change	

l  Iterate until PageRank rankings don’t change	

l  Fixed number of iterations	

¢  Convergence for web graphs?	

l  Not a straightforward question	

¢  Watch out for link spam:	

l  Link farms	

l  Spider traps	

l  …
Beyond PageRank	

¢  Variations of PageRank	

l  Weighted edges	

l  Personalized PageRank	

¢  Variants on graph random walks	

l  Hubs and authorities (HITS)	

l  SALSA
Other Classes of Graph Algorithms	

¢  Subgraph pattern matching	

¢  Computing simple graph statistics	

l  Degree vertex distributions	

¢  Computing more complex graph statistics	

l  Clustering coefficients	

l  Counting triangles
mapper	

 mapper	

 mapper	

 mapper	

reducer	

compute partial gradient	

single reducer	

mappers	

update model 	

iterate until convergence	

✓(t+1)
✓(t) (t) 1
n
nX
i=0
r`(f(xi; ✓(t)
), yi)
Batch Gradient Descent in MapReduce
Source: http://www.flickr.com/photos/fusedforces/4324320625/
MapReduce sucks at iterative algorithms	

¢  Hadoop task startup time	

¢  Stragglers	

¢  Needless graph shuffling	

¢  Checkpointing at each iteration
In-Mapper Combining	

¢  Use combiners	

l  Perform local aggregation on map output	

l  Downside: intermediate data is still materialized	

¢  Better: in-mapper combining	

l  Preserve state across multiple map calls, aggregate messages in buffer,
emit buffer contents at end	

l  Downside: requires memory management	

setup	

map	

cleanup	

buffer	

Emit all key-value pairs at once
Better Partitioning	

¢  Default: hash partitioning	

l  Randomly assign nodes to partitions	

¢  Observation: many graphs exhibit local structure	

l  E.g., communities in social networks	

l  Better partitioning creates more opportunities for local aggregation	

¢  Unfortunately, partitioning is hard!	

l  Sometimes, chick-and-egg… 	

l  But cheap heuristics sometimes available	

l  For webgraphs: range partition on domain-sorted URLs
Schimmy Design Pattern	

¢  Basic implementation contains two dataflows:	

l  Messages (actual computations)	

l  Graph structure (“bookkeeping”)	

¢  Schimmy: separate the two dataflows, shuffle only the messages	

l  Basic idea: merge join between graph structure and messages	

S T
both relations sorted by join key
S1 T1 S2 T2 S3 T3
both relations consistently partitioned and sorted by join key
S1 T1
Do the Schimmy!	

¢  Schimmy = reduce side parallel merge join between graph
structure and messages	

l  Consistent partitioning between input and intermediate data	

l  Mappers emit only messages (actual computation)	

l  Reducers read graph structure directly from HDFS	

S2 T2 S3 T3
ReducerReducerReducer
intermediate data
(messages)
intermediate data
(messages)
intermediate data
(messages)
from HDFS
(graph structure)
from HDFS
(graph structure)
from HDFS
(graph structure)
Experiments	

¢  Cluster setup:	

l  10 workers, each 2 cores (3.2 GHz Xeon), 4GB RAM, 367 GB disk	

l  Hadoop 0.20.0 on RHELS 5.3	

¢  Dataset:	

l  First English segment of ClueWeb09 collection	

l  50.2m web pages (1.53 TB uncompressed, 247 GB compressed)	

l  Extracted webgraph: 1.4 billion edges, 7.0 GB	

l  Dataset arranged in crawl order	

¢  Setup:	

l  Measured per-iteration running time (5 iterations)	

l  100 partitions
Results	

“Best Practices”
Results	

+18%
1.4b
674m
Results	

+18%
-15%
1.4b
674m
Results	

+18%
-15%
-60%
1.4b
674m
86m
Results	

+18%
-15%
-60%
-69%
1.4b
674m
86m
Sequencing Computations	

Source: www.flickr.com/photos/richardandgill/565921252/
Sequencing Computations	

1.  Turn synchronization into a sorting problem	

l  Leverage the fact that keys arrive at reducers in sorted order	

l  Manipulate the sort order and partitioning scheme to deliver partial
results at appropriate junctures	

2.  Create appropriate algebraic structures to capture computation	

l  Build custom data structures to accumulate partial results	

Monoids!
Monoids!	

¢  What’s a monoid?	

¢  An algebraic structure with	

l  A single associative binary operation	

l  An identity	

¢  Examples:	

l  Natural numbers form a commutative monoid under + with identity 0	

l  Natural numbers form a commutative monoid under × with identity 1	

l  Finite strings form a monoid under concatenation with identity “”	

l  …
Monoids and MapReduce	

¢  Recall averaging example: why does it work?	

l  AVG is non-associative	

l  Tuple of (sum, count) forms a monoid under element-wise addition	

l  Destroy the monoid at end to compute average	

l  Also explains the various failed algorithms	

¢  “Stripes” pattern works in the same way!	

l  Associate arrays form a monoid under element-wise addition 	

Go forth and monoidify!
Abstract Algebra and MapReduce	

¢  Create appropriate algebraic structures to capture computation	

¢  Algebraic properties	

l  Associative: order doesn’t matter!	

l  Commutative: grouping doesn’t matter!	

l  Idempotent: duplicates don’t matter!	

l  Identity: this value doesn’t matter!	

l  Zero: other values don’t matter!	

l  …	

¢  Different combinations lead to monoids, groups, rings, lattices,
etc.	

Source: Guy Steele
Recent thoughts, see: Jimmy Lin. Monoidify! Monoids as a Design Principle
for Efficient MapReduce Algorithms. arXiv:1304.7544, April 2013.
Source: Google
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

GIS As A Platform for Special Forces
GIS As A Platform for Special ForcesGIS As A Platform for Special Forces
GIS As A Platform for Special ForcesEsri India
 
CityGML Integration Into the ArcGIS Platform
CityGML Integration Into the ArcGIS PlatformCityGML Integration Into the ArcGIS Platform
CityGML Integration Into the ArcGIS PlatformSafe Software
 
Writing Schema based GML with FME
Writing Schema based GML with FMEWriting Schema based GML with FME
Writing Schema based GML with FMESafe Software
 
Unified big data architecture
Unified big data architectureUnified big data architecture
Unified big data architectureDataWorks Summit
 
QGIS 실습 (총 7차시)
QGIS 실습 (총 7차시)QGIS 실습 (총 7차시)
QGIS 실습 (총 7차시)Byeong-Hyeok Yu
 
Digitization at the ministry of lands uganda-kasumba geofrey
Digitization at the ministry of lands uganda-kasumba geofreyDigitization at the ministry of lands uganda-kasumba geofrey
Digitization at the ministry of lands uganda-kasumba geofreyKasumba Geofrey
 
Automated Application Integration with FME & Cityworks Webinar
Automated Application Integration with FME & Cityworks WebinarAutomated Application Integration with FME & Cityworks Webinar
Automated Application Integration with FME & Cityworks WebinarSafe Software
 
Introduction to GeoNode
Introduction to GeoNodeIntroduction to GeoNode
Introduction to GeoNodeGeoSolutions
 
지리정보체계(GIS) - [1] GIS 데이터 유형, 구조 알기
지리정보체계(GIS) - [1] GIS 데이터 유형, 구조 알기지리정보체계(GIS) - [1] GIS 데이터 유형, 구조 알기
지리정보체계(GIS) - [1] GIS 데이터 유형, 구조 알기Byeong-Hyeok Yu
 
Ros: 站在巨人的肩膀上
Ros: 站在巨人的肩膀上Ros: 站在巨人的肩膀上
Ros: 站在巨人的肩膀上建銘 林
 
Nityo pravam sap capability deck
Nityo pravam   sap capability deckNityo pravam   sap capability deck
Nityo pravam sap capability deckPrasan (AKA) Jeff
 
공간정보거점대학 PostGIS 고급과정
공간정보거점대학 PostGIS 고급과정공간정보거점대학 PostGIS 고급과정
공간정보거점대학 PostGIS 고급과정JungHwan Yun
 
SRE - Engenharia de Confiabilidade de Sites
SRE - Engenharia de Confiabilidade de SitesSRE - Engenharia de Confiabilidade de Sites
SRE - Engenharia de Confiabilidade de SitesFabricio Goncalves
 

Was ist angesagt? (17)

GIS As A Platform for Special Forces
GIS As A Platform for Special ForcesGIS As A Platform for Special Forces
GIS As A Platform for Special Forces
 
CityGML Integration Into the ArcGIS Platform
CityGML Integration Into the ArcGIS PlatformCityGML Integration Into the ArcGIS Platform
CityGML Integration Into the ArcGIS Platform
 
Writing Schema based GML with FME
Writing Schema based GML with FMEWriting Schema based GML with FME
Writing Schema based GML with FME
 
GIS_Intro_March_2014
GIS_Intro_March_2014GIS_Intro_March_2014
GIS_Intro_March_2014
 
Unified big data architecture
Unified big data architectureUnified big data architecture
Unified big data architecture
 
Big query
Big queryBig query
Big query
 
QGIS 실습 (총 7차시)
QGIS 실습 (총 7차시)QGIS 실습 (총 7차시)
QGIS 실습 (총 7차시)
 
Digitization at the ministry of lands uganda-kasumba geofrey
Digitization at the ministry of lands uganda-kasumba geofreyDigitization at the ministry of lands uganda-kasumba geofrey
Digitization at the ministry of lands uganda-kasumba geofrey
 
Automated Application Integration with FME & Cityworks Webinar
Automated Application Integration with FME & Cityworks WebinarAutomated Application Integration with FME & Cityworks Webinar
Automated Application Integration with FME & Cityworks Webinar
 
Introduction to GeoNode
Introduction to GeoNodeIntroduction to GeoNode
Introduction to GeoNode
 
지리정보체계(GIS) - [1] GIS 데이터 유형, 구조 알기
지리정보체계(GIS) - [1] GIS 데이터 유형, 구조 알기지리정보체계(GIS) - [1] GIS 데이터 유형, 구조 알기
지리정보체계(GIS) - [1] GIS 데이터 유형, 구조 알기
 
Ros: 站在巨人的肩膀上
Ros: 站在巨人的肩膀上Ros: 站在巨人的肩膀上
Ros: 站在巨人的肩膀上
 
Nityo pravam sap capability deck
Nityo pravam   sap capability deckNityo pravam   sap capability deck
Nityo pravam sap capability deck
 
공간정보거점대학 PostGIS 고급과정
공간정보거점대학 PostGIS 고급과정공간정보거점대학 PostGIS 고급과정
공간정보거점대학 PostGIS 고급과정
 
Introduction to GIS
Introduction to GISIntroduction to GIS
Introduction to GIS
 
SRE - Engenharia de Confiabilidade de Sites
SRE - Engenharia de Confiabilidade de SitesSRE - Engenharia de Confiabilidade de Sites
SRE - Engenharia de Confiabilidade de Sites
 
Cartography and Web GIS - Jack Dangermond
Cartography and Web GIS - Jack DangermondCartography and Web GIS - Jack Dangermond
Cartography and Web GIS - Jack Dangermond
 

Ähnlich wie MapReduce Algorithm Design Principles and Patterns

Streaming Python on Hadoop
Streaming Python on HadoopStreaming Python on Hadoop
Streaming Python on HadoopVivian S. Zhang
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentationNoha Elprince
 
An Introduction to MapReduce
An Introduction to MapReduce An Introduction to MapReduce
An Introduction to MapReduce Sina Ebrahimi
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCAapo Kyrölä
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine ParallelismSri Prasanna
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I💻 Anton Gerdelan
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13
 
H2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt DowleH2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt DowleSri Ambati
 
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Massimo Schenone
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAlbert Bifet
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
Integrative Parallel Programming in HPC
Integrative Parallel Programming in HPCIntegrative Parallel Programming in HPC
Integrative Parallel Programming in HPCVictor Eijkhout
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Sparkdatamantra
 

Ähnlich wie MapReduce Algorithm Design Principles and Patterns (20)

Streaming Python on Hadoop
Streaming Python on HadoopStreaming Python on Hadoop
Streaming Python on Hadoop
 
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
 
Hadoop classes in mumbai
Hadoop classes in mumbaiHadoop classes in mumbai
Hadoop classes in mumbai
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentation
 
An Introduction to MapReduce
An Introduction to MapReduce An Introduction to MapReduce
An Introduction to MapReduce
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PC
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I
 
Scalding
ScaldingScalding
Scalding
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
H2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt DowleH2O Design and Infrastructure with Matt Dowle
H2O Design and Infrastructure with Matt Dowle
 
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Integrative Parallel Programming in HPC
Integrative Parallel Programming in HPCIntegrative Parallel Programming in HPC
Integrative Parallel Programming in HPC
 
Pregel
PregelPregel
Pregel
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
 
vega
vegavega
vega
 

Mehr von Gabriela Agustini

Como a cultura maker vai mudar o modo de produção global
Como a cultura maker vai mudar o modo de produção globalComo a cultura maker vai mudar o modo de produção global
Como a cultura maker vai mudar o modo de produção globalGabriela Agustini
 
Cidadãos como protagonistas das transformações sociais
Cidadãos como protagonistas das transformações sociaisCidadãos como protagonistas das transformações sociais
Cidadãos como protagonistas das transformações sociaisGabriela Agustini
 
Movimento Maker e Educação
Movimento Maker e EducaçãoMovimento Maker e Educação
Movimento Maker e EducaçãoGabriela Agustini
 
Diversidade cultural gilberto gil
Diversidade cultural gilberto gilDiversidade cultural gilberto gil
Diversidade cultural gilberto gilGabriela Agustini
 
Social Entrepreneurship - International School of Law and Technology
Social Entrepreneurship - International School of Law and TechnologySocial Entrepreneurship - International School of Law and Technology
Social Entrepreneurship - International School of Law and TechnologyGabriela Agustini
 
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?Gabriela Agustini
 
Makersfor Global Good Report
Makersfor Global Good ReportMakersfor Global Good Report
Makersfor Global Good ReportGabriela Agustini
 
Apresentação olabi institucional interna - abril 17
Apresentação olabi institucional interna - abril 17Apresentação olabi institucional interna - abril 17
Apresentação olabi institucional interna - abril 17Gabriela Agustini
 
Pretalab- apresentação institucional
Pretalab- apresentação institucionalPretalab- apresentação institucional
Pretalab- apresentação institucionalGabriela Agustini
 
Cultura e tecnologia - aula2
Cultura e tecnologia - aula2Cultura e tecnologia - aula2
Cultura e tecnologia - aula2Gabriela Agustini
 
Cultura e tecnologia - aula1
Cultura e tecnologia - aula1Cultura e tecnologia - aula1
Cultura e tecnologia - aula1Gabriela Agustini
 
Global Innovation Gathering featured in Make Magazine Germany
Global Innovation Gathering featured in Make Magazine GermanyGlobal Innovation Gathering featured in Make Magazine Germany
Global Innovation Gathering featured in Make Magazine GermanyGabriela Agustini
 
Inovação de baixo para cima e o poder dos cidadãos
Inovação de baixo para cima e o poder dos cidadãos Inovação de baixo para cima e o poder dos cidadãos
Inovação de baixo para cima e o poder dos cidadãos Gabriela Agustini
 
Makerspaces e hubs de inovação
Makerspaces e hubs de inovaçãoMakerspaces e hubs de inovação
Makerspaces e hubs de inovaçãoGabriela Agustini
 

Mehr von Gabriela Agustini (20)

Como a cultura maker vai mudar o modo de produção global
Como a cultura maker vai mudar o modo de produção globalComo a cultura maker vai mudar o modo de produção global
Como a cultura maker vai mudar o modo de produção global
 
Cidadãos como protagonistas das transformações sociais
Cidadãos como protagonistas das transformações sociaisCidadãos como protagonistas das transformações sociais
Cidadãos como protagonistas das transformações sociais
 
Inovação digital
Inovação digital Inovação digital
Inovação digital
 
Movimento Maker e Educação
Movimento Maker e EducaçãoMovimento Maker e Educação
Movimento Maker e Educação
 
Cultura digital - Aula 4
Cultura digital - Aula 4Cultura digital - Aula 4
Cultura digital - Aula 4
 
Cultura Digital- aula 3
Cultura Digital- aula 3Cultura Digital- aula 3
Cultura Digital- aula 3
 
Cultura Digital- aula 2
Cultura Digital- aula 2Cultura Digital- aula 2
Cultura Digital- aula 2
 
Diversidade cultural gilberto gil
Diversidade cultural gilberto gilDiversidade cultural gilberto gil
Diversidade cultural gilberto gil
 
Social Entrepreneurship - International School of Law and Technology
Social Entrepreneurship - International School of Law and TechnologySocial Entrepreneurship - International School of Law and Technology
Social Entrepreneurship - International School of Law and Technology
 
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
 
Makersfor Global Good Report
Makersfor Global Good ReportMakersfor Global Good Report
Makersfor Global Good Report
 
Apresentação olabi institucional interna - abril 17
Apresentação olabi institucional interna - abril 17Apresentação olabi institucional interna - abril 17
Apresentação olabi institucional interna - abril 17
 
7 Forum Nacional de Museus
7 Forum Nacional de Museus7 Forum Nacional de Museus
7 Forum Nacional de Museus
 
Apresentacao metashop
Apresentacao metashopApresentacao metashop
Apresentacao metashop
 
Pretalab- apresentação institucional
Pretalab- apresentação institucionalPretalab- apresentação institucional
Pretalab- apresentação institucional
 
Cultura e tecnologia - aula2
Cultura e tecnologia - aula2Cultura e tecnologia - aula2
Cultura e tecnologia - aula2
 
Cultura e tecnologia - aula1
Cultura e tecnologia - aula1Cultura e tecnologia - aula1
Cultura e tecnologia - aula1
 
Global Innovation Gathering featured in Make Magazine Germany
Global Innovation Gathering featured in Make Magazine GermanyGlobal Innovation Gathering featured in Make Magazine Germany
Global Innovation Gathering featured in Make Magazine Germany
 
Inovação de baixo para cima e o poder dos cidadãos
Inovação de baixo para cima e o poder dos cidadãos Inovação de baixo para cima e o poder dos cidadãos
Inovação de baixo para cima e o poder dos cidadãos
 
Makerspaces e hubs de inovação
Makerspaces e hubs de inovaçãoMakerspaces e hubs de inovação
Makerspaces e hubs de inovação
 

Kürzlich hochgeladen

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Kürzlich hochgeladen (20)

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

MapReduce Algorithm Design Principles and Patterns

  • 1. MapReduce Algorithm Design Jimmy Lin University of Maryland Monday, May 13, 2013 WWW 2013 Tutorial, Rio de Janeiro This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details
  • 2. Source: Wikipedia (All Souls College, Oxford) From the Ivory Tower…
  • 3. Source: Wikipedia (Factory) … to building sh*t that works
  • 4. Source: Wikipedia (All Souls College, Oxford) … and back.
  • 5. More about me… ¢  Past MapReduce teaching experience: l  Numerous tutorials l  Several semester-long MapReduce courses ¢  Lin Dyer MapReduce textbook http://mapreduce.cc/ Follow me at @lintool http://lintool.github.io/MapReduce-course-2013s/
  • 6. What we’ll cover ¢  Big data ¢  MapReduce overview ¢  Importance of local aggregation ¢  Sequencing computations ¢  Iterative graph algorithms ¢  MapReduce and abstract algebra Focus on design patterns and general principles
  • 7. What we won’t cover ¢  MapReduce for machine learning (supervised and unsupervised) ¢  MapReduce for similar item detection ¢  MapReduce for information retrieval ¢  Hadoop for data warehousing ¢  Extensions and alternatives to MapReduce
  • 8. Source: Wikipedia (Hard disk drive) Big Data
  • 9. How much data? 10 PB data, 75B DB calls per day (6/2012) processes 20 PB a day (2008) crawls 20B web pages a day (2012) 100 PB of user data + 500 TB/day (8/2012) Wayback Machine: 240B web pages archived, 5 PB (1/2013) LHC: ~15 PB a year LSST: 6-10 PB a year (~2015) 640K ought to be enough for anybody. 150 PB on 50k+ servers running 15k apps (6/2011) S3: 449B objects, peak 290k request/second (7/2011) 1T objects (6/2012) SKA: 0.3 – 1.5 EB per year (~2020)
  • 10. Source: Wikipedia (Everest) Why big data? Science Engineering Commerce
  • 11. Emergence of the 4th Paradigm Data-intensive e-Science Maximilien Brice, © CERN Science
  • 12. Engineering The unreasonable effectiveness of data Count and normalize! Source: Wikipedia (Three Gorges Dam)
  • 13. No data like more data! (Banko and Brill, ACL 2001) (Brants et al., EMNLP 2007) s/knowledge/data/g;
  • 14. Commerce Know thy customers Data → Insights → Competitive advantages Source: Wikiedia (Shinjuku, Tokyo)
  • 15. How big data? Why big data? Source: Wikipedia (Noctilucent cloud)
  • 17. Typical Big Data Problem ¢  Iterate over a large number of records ¢  Extract something of interest from each ¢  Shuffle and sort intermediate results ¢  Aggregate intermediate results ¢  Generate final output Key idea: provide a functional abstraction for these two operations Map Reduce (Dean and Ghemawat, OSDI 2004)
  • 18. g g g g g f f f f fMap Fold Roots in Functional Programming
  • 19. MapReduce ¢  Programmers specify two functions: map (k1, v1) → [k2, v2] reduce (k2, [v2]) → [k3, v3] l  All values with the same key are sent to the same reducer ¢  The execution framework handles everything else…
  • 20. mapmap map map Shuffle and Sort: aggregate values by keys reduce reduce reduce k1 k2 k3 k4 k5 k6v1 v2 v3 v4 v5 v6 ba 1 2 c c3 6 a c5 2 b c7 8 a 1 5 b 2 7 c 2 3 6 8 r1 s1 r2 s2 r3 s3
  • 21. MapReduce ¢  Programmers specify two functions: map (k, v) → k’, v’* reduce (k’, v’) → k’, v’* l  All values with the same key are sent to the same reducer ¢  The execution framework handles everything else… What’s “everything else”?
  • 22. MapReduce “Runtime” ¢  Handles scheduling l  Assigns workers to map and reduce tasks ¢  Handles “data distribution” l  Moves processes to data ¢  Handles synchronization l  Gathers, sorts, and shuffles intermediate data ¢  Handles errors and faults l  Detects worker failures and restarts ¢  Everything happens on top of a distributed filesystem
  • 23. MapReduce ¢  Programmers specify two functions: map (k, v) → k’, v’* reduce (k’, v’) → k’, v’* l  All values with the same key are reduced together ¢  The execution framework handles everything else… ¢  Not quite…usually, programmers also specify: partition (k’, number of partitions) → partition for k’ l  Often a simple hash of the key, e.g., hash(k’) mod n l  Divides up key space for parallel reduce operations combine (k’, v’) → k’, v’* l  Mini-reducers that run in memory after the map phase l  Used as an optimization to reduce network traffic
  • 24. combinecombine combine combine ba 1 2 c 9 a c5 2 b c7 8 partition partition partition partition mapmap map map k1 k2 k3 k4 k5 k6v1 v2 v3 v4 v5 v6 ba 1 2 c c3 6 a c5 2 b c7 8 Shuffle and Sort: aggregate values by keys reduce reduce reduce a 1 5 b 2 7 c 2 9 8 r1 s1 r2 s2 r3 s3 c 2 3 6 8
  • 25. Two more details… ¢  Barrier between map and reduce phases l  But intermediate data can be copied over as soon as mappers finish ¢  Keys arrive at each reducer in sorted order l  No enforced ordering across reducers
  • 26. What’s the big deal? ¢  Developers need the right level of abstraction l  Moving beyond the von Neumann architecture l  We need better programming models ¢  Abstractions hide low-level details from the developers l  No more race conditions, lock contention, etc. ¢  MapReduce separating the what from how l  Developer specifies the computation that needs to be performed l  Execution framework (“runtime”) handles actual execution
  • 27. Source: Google The datacenter is the computer!
  • 29. MapReduce can refer to… ¢  The programming model ¢  The execution framework (aka “runtime”) ¢  The specific implementation Usage is usually clear from context!
  • 30. MapReduce Implementations ¢  Google has a proprietary implementation in C++ l  Bindings in Java, Python ¢  Hadoop is an open-source implementation in Java l  Development led by Yahoo, now an Apache project l  Used in production at Yahoo, Facebook, Twitter, LinkedIn, Netflix, … l  The de facto big data processing platform l  Rapidly expanding software ecosystem ¢  Lots of custom research implementations l  For GPUs, cell processors, etc.
  • 31. MapReduce algorithm design ¢  The execution framework handles “everything else”… l  Scheduling: assigns workers to map and reduce tasks l  “Data distribution”: moves processes to data l  Synchronization: gathers, sorts, and shuffles intermediate data l  Errors and faults: detects worker failures and restarts ¢  Limited control over data and execution flow l  All algorithms must expressed in m, r, c, p ¢  You don’t know: l  Where mappers and reducers run l  When a mapper or reducer begins or finishes l  Which input a particular mapper is processing l  Which intermediate key a particular reducer is processing
  • 33. Adapted from (Ghemawat et al., SOSP 2003) (file name, block id) (block id, block location) instructions to datanode datanode state (block id, byte range) block data HDFS namenode HDFS datanode Linux file system … HDFS datanode Linux file system … File namespace /foo/bar block 3df2 Application HDFS Client HDFS Architecture
  • 34. Putting everything together… datanode daemon Linux file system … tasktracker slave node datanode daemon Linux file system … tasktracker slave node datanode daemon Linux file system … tasktracker slave node namenode namenode daemon job submission node jobtracker
  • 35. Shuffle and Sort Mapper Reducer other mappers other reducers circular buffer (in memory) spills (on disk) merged spills (on disk) intermediate files (on disk) Combiner Combiner
  • 36. Preserving State Mapper object setup map cleanup state one object per task Reducer object setup reduce close state one call per input key-value pair one call per intermediate key API initialization hook API cleanup hook
  • 37. Implementation Don’ts ¢  Don’t unnecessarily create objects l  Object creation is costly l  Garbage collection is costly ¢  Don’t buffer objects l  Processes have limited heap size (remember, commodity machines) l  May work for small datasets, but won’t scale!
  • 38. Secondary Sorting ¢  MapReduce sorts input to reducers by key l  Values may be arbitrarily ordered ¢  What if want to sort value also? l  E.g., k → (v1, r), (v3, r), (v4, r), (v8, r)…
  • 39. Secondary Sorting: Solutions ¢  Solution 1: l  Buffer values in memory, then sort l  Why is this a bad idea? ¢  Solution 2: l  “Value-to-key conversion” design pattern: form composite intermediate key, (k, v1) l  Let execution framework do the sorting l  Preserve state across multiple key-value pairs to handle processing l  Anything else we need to do?
  • 41. Importance of Local Aggregation ¢  Ideal scaling characteristics: l  Twice the data, twice the running time l  Twice the resources, half the running time ¢  Why can’t we achieve this? l  Synchronization requires communication l  Communication kills performance (network is slow!) ¢  Thus… avoid communication! l  Reduce intermediate data via local aggregation l  Combiners can help
  • 42. Word Count: Baseline What’s the impact of combiners?
  • 43. Word Count: Version 1 Are combiners still needed?
  • 44. Word Count: Version 2 Are combiners still needed?
  • 45. Design Pattern for Local Aggregation ¢  “In-mapper combining” l  Fold the functionality of the combiner into the mapper by preserving state across multiple map calls ¢  Advantages l  Speed l  Why is this faster than actual combiners? ¢  Disadvantages l  Explicit memory management required l  Potential for order-dependent bugs
  • 46. Combiner Design ¢  Combiners and reducers share same method signature l  Sometimes, reducers can serve as combiners l  Often, not… ¢  Remember: combiner are optional optimizations l  Should not affect algorithm correctness l  May be run 0, 1, or multiple times ¢  Example: find average of integers associated with the same key
  • 47. Computing the Mean: Version 1 Why can’t we use reducer as combiner?
  • 48. Computing the Mean: Version 2 Why doesn’t this work?
  • 49. Computing the Mean: Version 3 Fixed?
  • 50. Computing the Mean: Version 4 Are combiners still needed?
  • 52. Sequencing Computations 1.  Turn synchronization into a sorting problem l  Leverage the fact that keys arrive at reducers in sorted order l  Manipulate the sort order and partitioning scheme to deliver partial results at appropriate junctures 2.  Create appropriate algebraic structures to capture computation l  Build custom data structures to accumulate partial results
  • 53. Algorithm Design: Running Example ¢  Term co-occurrence matrix for a text collection l  M = N x N matrix (N = vocabulary size) l  Mij: number of times i and j co-occur in some context (for concreteness, let’s say context = sentence) ¢  Why? l  Distributional profiles as a way of measuring semantic distance l  Semantic distance useful for many language processing tasks l  Basis for large classes of more sophisticated algorithms
  • 54. MapReduce: Large Counting Problems ¢  Term co-occurrence matrix for a text collection = specific instance of a large counting problem l  A large event space (number of terms) l  A large number of observations (the collection itself) l  Goal: keep track of interesting statistics about the events ¢  Basic approach l  Mappers generate partial counts l  Reducers aggregate partial counts How do we aggregate partial counts efficiently?
  • 55. First Try: “Pairs” ¢  Each mapper takes a sentence: l  Generate all co-occurring term pairs l  For all pairs, emit (a, b) → count ¢  Reducers sum up counts associated with these pairs ¢  Use combiners!
  • 57. “Pairs” Analysis ¢  Advantages l  Easy to implement, easy to understand ¢  Disadvantages l  Lots of pairs to sort and shuffle around (upper bound?) l  Not many opportunities for combiners to work
  • 58. Another Try: “Stripes” ¢  Idea: group together pairs into an associative array ¢  Each mapper takes a sentence: l  Generate all co-occurring term pairs l  For each term, emit a → { b: countb, c: countc, d: countd … } ¢  Reducers perform element-wise sum of associative arrays (a, b) → 1 (a, c) → 2 (a, d) → 5 (a, e) → 3 (a, f) → 2 a → { b: 1, c: 2, d: 5, e: 3, f: 2 } a → { b: 1, d: 5, e: 3 } a → { b: 1, c: 2, d: 2, f: 2 } a → { b: 2, c: 2, d: 7, e: 3, f: 2 } + Key idea: cleverly-constructed data structure for aggregating partial results
  • 60. “Stripes” Analysis ¢  Advantages l  Far less sorting and shuffling of key-value pairs l  Can make better use of combiners ¢  Disadvantages l  More difficult to implement l  Underlying object more heavyweight l  Fundamental limitation in terms of size of event space
  • 61. Cluster size: 38 cores Data Source: Associated Press Worldstream (APW) of the English Gigaword Corpus (v3), which contains 2.27 million documents (1.8 GB compressed, 5.7 GB uncompressed)
  • 62.
  • 63. Relative Frequencies ¢  How do we estimate relative frequencies from counts? ¢  Why do we want to do this? ¢  How do we do this with MapReduce? f(B|A) = N(A, B) N(A) = N(A, B) P B0 N(A, B0)
  • 64. f(B|A): “Stripes” ¢  Easy! l  One pass to compute (a, *) l  Another pass to directly compute f(B|A) a → {b1:3, b2 :12, b3 :7, b4 :1, … }
  • 65. f(B|A): “Pairs” ¢  What’s the issue? l  Computing relative frequencies requires marginal counts l  But the marginal cannot be computed until you see all counts l  Buffering is a bad idea! ¢  Solution: l  What if we could get the marginal count to arrive at the reducer first?
  • 66. f(B|A): “Pairs” ¢  For this to work: l  Must emit extra (a, *) for every bn in mapper l  Must make sure all a’s get sent to same reducer (use partitioner) l  Must make sure (a, *) comes first (define sort order) l  Must hold state in reducer across different key-value pairs (a, b1) → 3 (a, b2) → 12 (a, b3) → 7 (a, b4) → 1 … (a, *) → 32 (a, b1) → 3 / 32 (a, b2) → 12 / 32 (a, b3) → 7 / 32 (a, b4) → 1 / 32 … Reducer holds this value in memory
  • 67. “Order Inversion” ¢  Common design pattern: l  Take advantage of sorted key order at reducer to sequence computations l  Get the marginal counts to arrive at the reducer before the joint counts ¢  Optimization: l  Apply in-memory combining pattern to accumulate marginal counts
  • 68. Synchronization: Pairs vs. Stripes ¢  Approach 1: turn synchronization into an ordering problem l  Sort keys into correct order of computation l  Partition key space so that each reducer gets the appropriate set of partial results l  Hold state in reducer across multiple key-value pairs to perform computation l  Illustrated by the “pairs” approach ¢  Approach 2: construct data structures to accumulate partial results l  Each reducer receives all the data it needs to complete the computation l  Illustrated by the “stripes” approach
  • 69. Issues and Tradeoffs ¢  Number of key-value pairs l  Object creation overhead l  Time for sorting and shuffling pairs across the network ¢  Size of each key-value pair l  De/serialization overhead
  • 70. Lots are algorithms are just fancy conditional counts! Source: http://www.flickr.com/photos/guvnah/7861418602/
  • 71. Hidden Markov Models An HMM is characterized by: l  N states: l  N x N Transition probability matrix l  V observation symbols: l  N x |V| Emission probability matrix l  Prior probabilities vector aij = p(qj|qi) X j aij = 1 8i A = [aij] NX i=1 ⇡i = 1 = (A, B, ⇧) Q = {q1, q2, . . . qN } O = {o1, o2, . . . oV } B = [biv] biv = bi(ov) = p(ov|qi) ⇧ = [⇡i, ⇡2, . . . ⇡N ]
  • 72. Forward-Backward t(j) .... qj .... ↵t(j) otot 1 ot+1 ↵t(j) = P(o1, o2 . . . ot, qt = j| ) t(j) = P(ot+1, ot+2...oT |qt = i, )
  • 73. Estimating Emissions Probabilities ¢  Basic idea: ¢  Let’s define: ¢  Thus: bj(vk) = expected number of times in state j and observing symbol vk expected number of times in state j t(j) = P(qt = j, O| ) P(O| ) = ↵t(j) t(j) P(O| ) ˆbj(vk) = PT i=1Ot=vk t(j) PT i=1 t(j)
  • 74. Forward-Backward .... qj .... otot 1 ot+1 ot+2 qi ↵t(i) t+1(j) aijbj(ot+1)
  • 75. Estimating Transition Probabilities ¢  Basic idea: ¢  Let’s define: ¢  Thus: aij = expected number of transitions from state i to state j expected number of transitions from state i ⇠t(i, j) = ↵t(i)aijbj(ot+1) t+1(j) P(O| ) ˆaij = PT 1 t=1 ⇠t(i, j) PT 1 t=1 PN j=1 ⇠t(i, j)
  • 76. MapReduce Implementation: Mapper 1: class Mapper 2: method Initialize(integer iteration) 3: hS, Oi ReadModel 4: ✓ hA, B, ⇡i ReadModelParams(iteration) 5: method Map(sample id, sequence x) 6: ↵ Forward(x, ✓) . cf. Section 6.2.2 7: Backward(x, ✓) . cf. Section 6.2.4 8: I new AssociativeArray . Initial state expectations 9: for all q 2 S do . Loop over states 10: I{q} ↵1(q) · 1(q) 11: O new AssociativeArray of AssociativeArray . Emissions 12: for t = 1 to |x| do . Loop over observations 13: for all q 2 S do . Loop over states 14: O{q}{xt} O{q}{xt} + ↵t(q) · t(q) 15: t t + 1 16: T new AssociativeArray of AssociativeArray . Transitions 17: for t = 1 to |x| 1 do . Loop over observations 18: for all q 2 S do . Loop over states 19: for all r 2 S do . Loop over states 20: T{q}{r} T{q}{r} + ↵t(q) · Aq(r) · Br(xt+1) · t+1(r) 21: t t + 1 22: Emit(string ‘initial’, stripe I) 23: for all q 2 S do . Loop over states 24: Emit(string ‘emit from ’ + q, stripe O{q}) 25: Emit(string ‘transit from ’ + q, stripe T{q}) ˆbj(vk) = PT i=1Ot=vk t(j) PT i=1 t(j) ˆaij = PT 1 t=1 ⇠t(i, j) PT 1 t=1 PN j=1 ⇠t(i, j) t(j) = ↵t(j) t(j) P(O| ) ⇠t(i, j) = ↵t(i)aijbj(ot+1) t+1(j) P(O| )
  • 77. MapReduce Implementation: Reducer 1: class Combiner 2: method Combine(string t, stripes [C1, C2, . . .]) 3: Cf new AssociativeArray 4: for all stripe C 2 stripes [C1, C2, . . .] do 5: Sum(Cf , C) 6: Emit(string t, stripe Cf ) 1: class Reducer 2: method Reduce(string t, stripes [C1, C2, . . .]) 3: Cf new AssociativeArray 4: for all stripe C 2 stripes [C1, C2, . . .] do 5: Sum(Cf , C) 6: z 0 7: for all hk, vi 2 Cf do 8: z z + v 9: Pf new AssociativeArray . Final parameters vector 10: for all hk, vi 2 Cf do 11: Pf {k} v/z 12: Emit(string t, stripe Pf ) Figure 6.9: Combiner and reducer pseudo-code for training hidden Markov models using EM. The HMMs considered in this book are fully parameterized by multinomial distributions, so reducers do not require special logic to handle di↵erent types of model parameters (since they are all of the same type). ˆbj(vk) = PT i=1Ot=vk t(j) PT i=1 t(j) ˆaij = PT 1 t=1 ⇠t(i, j) PT 1 t=1 PN j=1 ⇠t(i, j) t(j) = ↵t(j) t(j) P(O| ) ⇠t(i, j) = ↵t(i)aijbj(ot+1) t+1(j) P(O| )
  • 78. Iterative Algorithms: Graphs Source: Wikipedia (Water wheel)
  • 79. What’s a graph? ¢  G = (V,E), where l  V represents the set of vertices (nodes) l  E represents the set of edges (links) l  Both vertices and edges may contain additional information ¢  Different types of graphs: l  Directed vs. undirected edges l  Presence or absence of cycles ¢  Graphs are everywhere: l  Hyperlink structure of the web l  Physical structure of computers on the Internet l  Interstate highway system l  Social networks
  • 82. Some Graph Problems ¢  Finding shortest paths l  Routing Internet traffic and UPS trucks ¢  Finding minimum spanning trees l  Telco laying down fiber ¢  Finding Max Flow l  Airline scheduling ¢  Identify “special” nodes and communities l  Breaking up terrorist cells, spread of avian flu ¢  Bipartite matching l  Monster.com, Match.com ¢  And of course... PageRank
  • 83. Graphs and MapReduce ¢  A large class of graph algorithms involve: l  Performing computations at each node: based on node features, edge features, and local link structure l  Propagating computations: “traversing” the graph ¢  Key questions: l  How do you represent graph data in MapReduce? l  How do you traverse a graph in MapReduce? In reality: graph algorithms in MapReduce suck!
  • 84. Representing Graphs ¢  G = (V, E) ¢  Two common representations l  Adjacency matrix l  Adjacency list
  • 85. Adjacency Matrices Represent a graph as an n x n square matrix M l  n = |V| l  Mij = 1 means a link from node i to j 1 2 3 4 1 0 1 0 1 2 1 0 1 1 3 1 0 0 0 4 1 0 1 0 1 2 3 4
  • 86. Adjacency Matrices: Critique ¢  Advantages: l  Amenable to mathematical manipulation l  Iteration over rows and columns corresponds to computations on outlinks and inlinks ¢  Disadvantages: l  Lots of zeros for sparse matrices l  Lots of wasted space
  • 87. Adjacency Lists Take adjacency matrices… and throw away all the zeros 1: 2, 4 2: 1, 3, 4 3: 1 4: 1, 3 1 2 3 4 1 0 1 0 1 2 1 0 1 1 3 1 0 0 0 4 1 0 1 0
  • 88. Adjacency Lists: Critique ¢  Advantages: l  Much more compact representation l  Easy to compute over outlinks ¢  Disadvantages: l  Much more difficult to compute over inlinks
  • 89. Single-Source Shortest Path ¢  Problem: find shortest path from a source node to one or more target nodes l  Shortest might also mean lowest weight or cost ¢  Single processor machine: Dijkstra’s Algorithm ¢  MapReduce: parallel breadth-first search (BFS)
  • 90. Finding the Shortest Path ¢  Consider simple case of equal edge weights ¢  Solution to the problem can be defined inductively ¢  Here’s the intuition: l  Define: b is reachable from a if b is on adjacency list of a DISTANCETO(s) = 0 l  For all nodes p reachable from s, DISTANCETO(p) = 1 l  For all nodes n reachable from some other set of nodes M, DISTANCETO(n) = 1 + min(DISTANCETO(m), m ∈ M) s m3 m2 m1 n … … … d1 d2 d3
  • 93. From Intuition to Algorithm ¢  Data representation: l  Key: node n l  Value: d (distance from start), adjacency list (nodes reachable from n) l  Initialization: for all nodes except for start node, d = ∞ ¢  Mapper: l  ∀m ∈ adjacency list: emit (m, d + 1) ¢  Sort/Shuffle l  Groups distances by reachable nodes ¢  Reducer: l  Selects minimum distance path for each reachable node l  Additional bookkeeping needed to keep track of actual path
  • 94. Multiple Iterations Needed ¢  Each MapReduce iteration advances the “frontier” by one hop l  Subsequent iterations include more and more reachable nodes as frontier expands l  Multiple iterations are needed to explore entire graph ¢  Preserving graph structure: l  Problem: Where did the adjacency list go? l  Solution: mapper emits (n, adjacency list) as well
  • 96. Stopping Criterion ¢  When a node is first discovered, we’ve found the shortest path l  Maximum number of iterations is equal to the diameter of the graph ¢  Practicalities of implementation in MapReduce
  • 97. Comparison to Dijkstra ¢  Dijkstra’s algorithm is more efficient l  At each step, only pursues edges from minimum-cost path inside frontier ¢  MapReduce explores all paths in parallel l  Lots of “waste” l  Useful work is only done at the “frontier” ¢  Why can’t we do better using MapReduce?
  • 98. Single Source: Weighted Edges ¢  Now add positive weights to the edges l  Why can’t edge weights be negative? ¢  Simple change: add weight w for each edge in adjacency list l  In mapper, emit (m, d + wp) instead of (m, d + 1) for each node m ¢  That’s it?
  • 99. Stopping Criterion ¢  How many iterations are needed in parallel BFS (positive edge weight case)? ¢  When a node is first discovered, we’ve found the shortest path Not true!
  • 101. Stopping Criterion ¢  How many iterations are needed in parallel BFS (positive edge weight case)? ¢  Practicalities of implementation in MapReduce
  • 102. All-Pairs? ¢  Floyd-Warshall Algorithm: difficult to MapReduce-ify… ¢  Multiple-source shortest paths in MapReduce: run multiple parallel BFS simultaneously l  Assume source nodes {s0, s1, … sn} l  Instead of emitting a single distance, emit an array of distances, with respect to each source l  Reducer selects minimum for each element in array ¢  Does this scale?
  • 104. Social Search ¢  When searching, how to rank friends named “John”? l  Assume undirected graphs l  Rank matches by distance to user ¢  Naïve implementations: l  Precompute all-pairs distances l  Compute distances at query time ¢  Can we do better?
  • 105. Landmark Approach (aka sketches) ¢  Select n seeds {s0, s1, … sn} ¢  Compute distances from seeds to every node: l  What can we conclude about distances? l  Insight: landmarks bound the maximum path length ¢  Lots of details: l  How to more tightly bound distances l  How to select landmarks (random isn’t the best…) ¢  Use multi-source parallel BFS implementation in MapReduce! A = [2, 1, 1] B = [1, 1, 2] C = [4, 3, 1] D = [1, 2, 4]
  • 107. Graphs and MapReduce ¢  A large class of graph algorithms involve: l  Performing computations at each node: based on node features, edge features, and local link structure l  Propagating computations: “traversing” the graph ¢  Generic recipe: l  Represent graphs as adjacency lists l  Perform local computations in mapper l  Pass along partial results via outlinks, keyed by destination node l  Perform aggregation in reducer on inlinks to a node l  Iterate until convergence: controlled by external “driver” l  Don’t forget to pass the graph structure between iterations
  • 108. Given page x with inlinks t1…tn, where l  C(t) is the out-degree of t l  α is probability of random jump l  N is the total number of nodes in the graph PageRank X t1 t2 tn … PR(x) = ↵ ✓ 1 N ◆ + (1 ↵) nX i=1 PR(ti) C(ti)
  • 109. Computing PageRank ¢  Properties of PageRank l  Can be computed iteratively l  Effects at each iteration are local ¢  Sketch of algorithm: l  Start with seed PRi values l  Each page distributes PRi “credit” to all pages it links to l  Each target page adds up “credit” from multiple in-bound links to compute PRi+1 l  Iterate until values converge
  • 110. Simplified PageRank ¢  First, tackle the simple case: l  No random jump factor l  No dangling nodes ¢  Then, factor in these complexities… l  Why do we need the random jump? l  Where do dangling nodes come from?
  • 111. Sample PageRank Iteration (1) n1 (0.2) n4 (0.2) n3 (0.2) n5 (0.2) n2 (0.2) 0.1 0.1 0.2 0.2 0.1 0.1 0.066 0.066 0.066 n1 (0.066) n4 (0.3) n3 (0.166) n5 (0.3) n2 (0.166)Iteration 1
  • 112. Sample PageRank Iteration (2) n1 (0.066) n4 (0.3) n3 (0.166) n5 (0.3) n2 (0.166) 0.033 0.033 0.3 0.166 0.083 0.083 0.1 0.1 0.1 n1 (0.1) n4 (0.2) n3 (0.183) n5 (0.383) n2 (0.133)Iteration 2
  • 113. PageRank in MapReduce n5 [n1, n2, n3]n1 [n2, n4] n2 [n3, n5] n3 [n4] n4 [n5] n2 n4 n3 n5 n1 n2 n3n4 n5 n2 n4n3 n5n1 n2 n3 n4 n5 n5 [n1, n2, n3]n1 [n2, n4] n2 [n3, n5] n3 [n4] n4 [n5] Map Reduce
  • 115. Complete PageRank ¢  Two additional complexities l  What is the proper treatment of dangling nodes? l  How do we factor in the random jump factor? ¢  Solution: l  Second pass to redistribute “missing PageRank mass” and account for random jumps l  p is PageRank value from before, p' is updated PageRank value l  N is the number of nodes in the graph l  m is the missing PageRank mass ¢  Additional optimization: make it a single pass! p0 = ↵ ✓ 1 N ◆ + (1 ↵) ⇣m N + p ⌘
  • 116. PageRank Convergence ¢  Alternative convergence criteria l  Iterate until PageRank values don’t change l  Iterate until PageRank rankings don’t change l  Fixed number of iterations ¢  Convergence for web graphs? l  Not a straightforward question ¢  Watch out for link spam: l  Link farms l  Spider traps l  …
  • 117. Beyond PageRank ¢  Variations of PageRank l  Weighted edges l  Personalized PageRank ¢  Variants on graph random walks l  Hubs and authorities (HITS) l  SALSA
  • 118. Other Classes of Graph Algorithms ¢  Subgraph pattern matching ¢  Computing simple graph statistics l  Degree vertex distributions ¢  Computing more complex graph statistics l  Clustering coefficients l  Counting triangles
  • 119. mapper mapper mapper mapper reducer compute partial gradient single reducer mappers update model iterate until convergence ✓(t+1) ✓(t) (t) 1 n nX i=0 r`(f(xi; ✓(t) ), yi) Batch Gradient Descent in MapReduce
  • 121. MapReduce sucks at iterative algorithms ¢  Hadoop task startup time ¢  Stragglers ¢  Needless graph shuffling ¢  Checkpointing at each iteration
  • 122. In-Mapper Combining ¢  Use combiners l  Perform local aggregation on map output l  Downside: intermediate data is still materialized ¢  Better: in-mapper combining l  Preserve state across multiple map calls, aggregate messages in buffer, emit buffer contents at end l  Downside: requires memory management setup map cleanup buffer Emit all key-value pairs at once
  • 123. Better Partitioning ¢  Default: hash partitioning l  Randomly assign nodes to partitions ¢  Observation: many graphs exhibit local structure l  E.g., communities in social networks l  Better partitioning creates more opportunities for local aggregation ¢  Unfortunately, partitioning is hard! l  Sometimes, chick-and-egg… l  But cheap heuristics sometimes available l  For webgraphs: range partition on domain-sorted URLs
  • 124. Schimmy Design Pattern ¢  Basic implementation contains two dataflows: l  Messages (actual computations) l  Graph structure (“bookkeeping”) ¢  Schimmy: separate the two dataflows, shuffle only the messages l  Basic idea: merge join between graph structure and messages S T both relations sorted by join key S1 T1 S2 T2 S3 T3 both relations consistently partitioned and sorted by join key
  • 125. S1 T1 Do the Schimmy! ¢  Schimmy = reduce side parallel merge join between graph structure and messages l  Consistent partitioning between input and intermediate data l  Mappers emit only messages (actual computation) l  Reducers read graph structure directly from HDFS S2 T2 S3 T3 ReducerReducerReducer intermediate data (messages) intermediate data (messages) intermediate data (messages) from HDFS (graph structure) from HDFS (graph structure) from HDFS (graph structure)
  • 126. Experiments ¢  Cluster setup: l  10 workers, each 2 cores (3.2 GHz Xeon), 4GB RAM, 367 GB disk l  Hadoop 0.20.0 on RHELS 5.3 ¢  Dataset: l  First English segment of ClueWeb09 collection l  50.2m web pages (1.53 TB uncompressed, 247 GB compressed) l  Extracted webgraph: 1.4 billion edges, 7.0 GB l  Dataset arranged in crawl order ¢  Setup: l  Measured per-iteration running time (5 iterations) l  100 partitions
  • 133. Sequencing Computations 1.  Turn synchronization into a sorting problem l  Leverage the fact that keys arrive at reducers in sorted order l  Manipulate the sort order and partitioning scheme to deliver partial results at appropriate junctures 2.  Create appropriate algebraic structures to capture computation l  Build custom data structures to accumulate partial results Monoids!
  • 134. Monoids! ¢  What’s a monoid? ¢  An algebraic structure with l  A single associative binary operation l  An identity ¢  Examples: l  Natural numbers form a commutative monoid under + with identity 0 l  Natural numbers form a commutative monoid under × with identity 1 l  Finite strings form a monoid under concatenation with identity “” l  …
  • 135. Monoids and MapReduce ¢  Recall averaging example: why does it work? l  AVG is non-associative l  Tuple of (sum, count) forms a monoid under element-wise addition l  Destroy the monoid at end to compute average l  Also explains the various failed algorithms ¢  “Stripes” pattern works in the same way! l  Associate arrays form a monoid under element-wise addition Go forth and monoidify!
  • 136. Abstract Algebra and MapReduce ¢  Create appropriate algebraic structures to capture computation ¢  Algebraic properties l  Associative: order doesn’t matter! l  Commutative: grouping doesn’t matter! l  Idempotent: duplicates don’t matter! l  Identity: this value doesn’t matter! l  Zero: other values don’t matter! l  … ¢  Different combinations lead to monoids, groups, rings, lattices, etc. Source: Guy Steele Recent thoughts, see: Jimmy Lin. Monoidify! Monoids as a Design Principle for Efficient MapReduce Algorithms. arXiv:1304.7544, April 2013.