MapReduce for scientific simulation analysis

A hands on introduction
to scientiﬁc data analysis
with Hadoop !
!
A matrix computations perspective

DAVID F. GLEICH, PURDUE UNIVERSITY
ICME MAPREDUCE WORKSHOP @ STANFORD

1
David Gleich · Purdue
MRWorkshop

Who is this for?

workshop project groups

those curious about "
“MapReduce” and “Hadoop”

those who think about "
problems as matrices

2
MRWorkshop

What should you get out of it?

1. understand some problems that
MapReduce solves effectively.

2. techniques to solve them using
Hadoop and dumbo

3. learn some Hadoop words

3
MRWorkshop

What you won’t learn …

latest and greatest in "
MapReduce algorithms

how to improve the perform-"
ance of your Hadoop job

how to write wordcount "
in Hadoop

4
MRWorkshop

Slides will be online soon.

Code samples and short tutorials at
github.com/dgleich/mrmatrix

5
MRWorkshop

1.  HPC vs. Data (redux)
2.  MapReduce vs. Hadoop
3.  Dive into Hadoop with
Hadoop streaming
4.  Sparse matrix methods "
with Hadoop

6
MRWorkshop

High performance
computing
vs.
Data intensive
computing

7
MRWorkshop

224k Cores

10 PB drive
80k cores"
1.7 Pﬂops
50 PB drive

? Pﬂops
7 MW

? MW
Custom "
interconnect" GB ethernet

$104 M
$?? M

45 GB/core
625 GB/core

8
MRWorkshop

icme-hadoop1
12 nodes; 4-core i7 processor, 24GB/node, 1GB ethernet

12 TB/node, 3000 GB/core, 50 TB usable space (3x redundancy)

9
MRWorkshop

MapReduce is designed to
solve a different set of problems

10
MRWorkshop

Supercomputer Data computing cluster Engineer

Each multi-day HPC A data cluster can … enabling engineers to query
simulation generates hold hundreds or thousands and analyze months of simulation
gigabytes of data. of old simulations … data for all sorts of neat purposes.

11
MRWorkshop

MapReduce and!
Hadoop overview

12
MRWorkshop

The MapReduce
programming model
Input a list of (key, value) pairs
Map apply a function f to all pairs
Reduce apply a function g to "
all values with key k (for all k)
Output a list of (key, value) pairs

13
MRWorkshop

The MapReduce
programming model

Map function f must be side-effect free
Reduce function g must be side-effect free

14
MRWorkshop

The MapReduce
programming model

All map functions can be done in parallel
All reduce functions (for key k) can be done
in parallel

15
MRWorkshop

The MapReduce
programming model
!
Shufﬂe group all pairs with key k together"
(sorting sufﬁces)

16
MRWorkshop

Mesh point variance in MapReduce
Run 1
Run 2
Run 3

T=1
T=2
T=3
T=1
T=2
T=3
T=1
T=2
T=3

17
MRWorkshop

Run 1
Run 2
Run 3

T=1
T=2
T=3
T=1
T=2
T=3
T=1
T=2
T=3
M
M
M

1. Each mapper out- 2. Shufﬂe moves all
puts the mesh points values from the same
with the same key.
mesh point to the
R
R
same reducer.

3. Reducers just
compute a numerical
variance.

18
MRWorkshop

MapReduce vs. Hadoop.
MapReduce! Hadoop!
A computation An implementation
model with:
of MapReduce
Map - a local data using the HDFS
transform
parallel ﬁle-system.

Shufﬂe - a grouping Others !
function
Pheonix++, Twisted,
Google MapReduce,
Reduce – " spark, …
an aggregation

19
MRWorkshop

Why so many limitations?

20
MRWorkshop

Data scalability
Maps
M M
1
2
1
M Reduce
2
M M M
R 3
4
3
M
R
4
M M
5
5
M Shufﬂe

The idea !
Bring the computations to the data
MR can schedule map functions without
moving data.

21
MRWorkshop

Run 1
Run 2
Run 3

T=1
T=2
T=3
T=1
T=2
T=3
T=1
T=2
T=3
M
M
M

1. Each mapper out- 2. Shufﬂe moves all
puts the mesh points values from the same
with the same key.
mesh point to the
R
R
same reducer.

3. Reducers just
compute a numerical
variance.
Bring the computations
to the data!

22
MRWorkshop

heartbreak on node rs252
After waiting in the queue for a month and "
after 24 hours of ﬁnding eigenvalues, one node randomly hiccups.

23
MRWorkshop

Fault tolerant
Input stored in triplicate
Reduce input/"
M output on disk
M R
M
R
M
Map output"
persisted to disk"
before shufﬂe

Redundant input helps make maps data-local
Just one type of communication: shufﬂe

24
MRWorkshop

Fault injection
200
Faults (200M by 200)
Time to completion (sec)

With 1/5
tasks failing,
No faults (200M by 200)
the job only
takes twice
100
Faults (800M by 10)
as long.

No faults "
(800M by 10)

10
100
1000

1/Prob(failure) – mean number of success per failure

25
MRWorkshop

Diving into Hadoop
(with python)

26
MRWorkshop

Tools I like

hadoop streaming
dumbo
mrjob
hadoopy
C++

27
MRWorkshop

Tools I don’t use but other
people seem to like …

pig
java
hbase
Eclipse
Cassandra

28
MRWorkshop

hadoop streaming

the map function is a program"
(key,value) pairs are sent via stdin"
output (key,value) pairs goes to stdout

the reduce function is a program"
(key,value) pairs are sent via stdin"
keys are grouped"
output (key,value) pairs goes to stdout

29
MRWorkshop

dumbo
a wrapper around hadoop streaming for
map and reduce functions in python

#!/usr/bin/env dumbo

def mapper(key,value):
""" Each record is a line of text.
key=<byte that the line starts in the file>
value=<line of text>
"""
valarray = [float(v) for v in value.split()]
yield key, sum(valarray)

if __name__=='__main__':
import dumbo
import dumbo.lib
dumbo.run(mapper,dumbo.lib.identityreducer)

30
MRWorkshop

Synthetic data test 100,000,000-by-500 matrix (~500GB)
How can Hadoop streaming
Codes implemented in MapReduce streaming

possibly be fast?
Matrix stored as TypedBytes lists of doubles
Python frameworks use Numpy+Atlas
Custom C++ TypedBytes reader/writer with Atlas
500 GBnon-streaming the R in a QR factorization.
too
New matrix. Computing Java implementation
Iter 1 Iter 1 Iter 2 Overall
QR (secs.) Total (secs.) Total (secs.) Total (secs.)
Dumbo 67725 960 217 1177
Hadoopy 70909 612 118 730
C++ 15809 350 37 387
Java 436 66 502

C++ in streaming beats a native Java implementation.
All timing results from the Hadoop job tracker
David Gleich (Sandia) MapReduce 2011 16/22

31
MRWorkshop

Demo 1
1. generate data
2. get data to hadoop
3. run row sums
4. see row sums!

32
MRWorkshop

How does Hadoop know
key = byte in file"
value = line of text!
!
InputFormat!
Map a file on HDFS to (key,value) pairs

TextInputFormat!
Map a text file to (<byte offset>, <line>)
pairs

33
MRWorkshop

The Hadoop Distributed File System (HDFS)
and a big text ﬁle
HDFS stores ﬁles in 64MB chunks
Each chunk is a FileSplit
FileSplits are stored in parallel

A InputFormat converts FileSplits
into a sequence of key-val records
FileSplits can cross record borders"
(a small bit of communication)

34
MRWorkshop

Tall-and-skinny matrix
storage in MapReduce
A : m x n, m ≫ n

A1

Key is an arbitrary row-id
A2
Value is the 1 x n array "
for a row
A3

A4
Each submatrix Ai is an "
InputSplit (the input to a"
map task).

35
MRWorkshop

hadoop! MPI!
output row-sum for parallel load
all local rows
for my-batch-of-rows
compute row-sum
parallel save

36
MRWorkshop

Isn’t reading and writing text
ﬁles rather inefﬁcient?

37
MRWorkshop

Sequence Files and !
OutputFormat
SequenceFile
An internal Hadoop ﬁle format to store
(key, value) pairs efﬁciently. Used between
map and reduce steps.

OutputFormat
Map (key, value) pairs to output on disk

TextOutputFormat
Map (key,value) pairs to keytvalue strings

38
MRWorkshop

typedbytes

A simple binary serialization scheme.
[<1-byte-type-ﬂag> <binary-value>]*
Roughly equivalent to JSON

(Optionally) used to communicate to and
from Hadoop streaming.

39
MRWorkshop

typedbytes example

def _read(self):
t = unpack_type(self.file.read(1))[0]
self.t = t
return self.handler_table[t](self)

def read_vector(self):
r = self._read
count = unpack_int(self.file.read(4))[0]
return tuple(r() for i in xrange(count))

40
MRWorkshop

Demo 2
Column sums

41
MRWorkshop

Column sums in dumbo


""" Each record is a line of text. """
for col,val in enumerate(valarray):
yield col, val

def reducer(col,values):
yield col, sum(values)

import dumbo
import dumbo.lib
dumbo.run(mapper,reducer)

42
MRWorkshop

Isn’t this just moving the data
to the computation?
MPI!
parallel load
Yes.
for my-batch-of-rows

update sum of each
columns
It seems much"
parallel reduce partial
worse than MPI.
column sums
parallel save

43
MRWorkshop

The MapReduce
programming model
Combine apply g to local values with key k!
Shufﬂe group all pairs with key k together!
all values with key k
!

44
MRWorkshop

Column sums in dumbo


""" Each record is a line of text. """
for col,val in enumerate(valarray):
yield col, val

def reducer(col,values):
yield col, sum(values)

import dumbo
import dumbo.lib
dumbo.run(mapper,reducer,combiner=reducer)

45
MRWorkshop

How many mappers and
reducers?

The number of maps is the number of
InputSplits.

You choose how many reducers.
Each reducer outputs to a separate ﬁle.

46
MRWorkshop

Demo 3
Column sums with multiple
reducers

47
MRWorkshop

Which reducer does my key
go to?

Partitioner!
Map a given key to a reducer

HashPartitioner!
Randomly distribute keys

48
MRWorkshop

Sparse matrix methods

49
MRWorkshop

of a graph, 4 9 storing the matrix by columns corresponds to storing the
1 10 then 7 6
graph as an in-edge list.
13 4 ci 2 3 3 4 2 5 3 6 4 6
Storing a matrix by rows
We briey 14 5
ure ..
3 illustrate compressed row 13 10 12 4 storage schemes 4 g-
ai 16
and column 14 9 20 7 in

0 0 0 Compressed sparse row
16 13 0 Compressed sparse column
0 2 12 4
0 0 rp 1 3 5 7 9 11
0 10 12 cp 1 1 3 6 8 9
0 14 0
11
11

16 20
4 0 0
1
0 0 10 94 9 7
0 20 6
0 ci 2 3 3 4 2 5 6
0 0 4 ri 1 3 1 2 4 2 3 6 4 5

13 4 5 3 4
0 0 7
0 0 ai 16 13 10 12 4 14
0 30 140 5 0
ai 16 4 13 10 9 12 9
7 20
14 7
20 4
4

Row 1 13 0 (3,13.)
16 (2,16.) 0 Row 5 (4,7.) (6,4.)
0 Most graph algorithms0are designed to work with out-edge lists instead of
Compressed sparse column
0 0 10 12 0 0
Row 2 (3,10.) (4,12.)
an algorithm, MatlabBGL 9 11

0 4 lists. Before running cpRow 6
3 6 8 explicitly transposes
in-edge
0 0 14 0
1 1

graph so that Matlab’s internal representation corresponds to storing out-
the 0 9 0 0 20
Row 3 (2,4.) (5,14.)

0 lists. For algorithms symmetric graphs, these transposes are not

0 0 0 7 0 4 ri 1 3 1 2 4 2 5 3 4 5
edge on

Row 4 0 0 (6,20.)
ai 16
0 0 (3,9.) 0 0
required. 4 13 10 9 12 7 14 20 4
e mex commands mxGetPr, mxGetJc, and mxGetIr retrieve pointers to

50
Matlab’s internal storage of the matrix withoutGleich · Purdue
MRWorkshop
David making a copy. ese functions

of a graph, 4 9 storing the matrix by columns corresponds to storing the
1 10 then 7 6
13 4 ci 2 3 3 4 2 5 3 6 4 6
Storing a matrix by rows in a text-ﬁle
We briey 14 5
ure ..
3 illustrate compressed row 13 10 12 4 storage schemes 4 g-
ai 16
and column 14 9 20 7 in

0 0 0 Compressed sparse row
16 13 0 Compressed sparse column
0 2 12 4
0 0 rp 1 3 5 7 9 11
0 10 12 cp 1 1 3 6 8 9
0 14 0
11
11

16 20
4 0 0
1
0 0 10 94 9 7
0 20 6
0 ci 2 3 3 4 2 5 6
0 0 4 ri 1 3 1 2 4 2 3 6 4 5

13 4 5 3 4
0 0 7
0 0 ai 16 13 10 12 4 14
0 30 140 5 0
ai 16 4 13 10 9 12 9
7 20
14 7
20 4
4

Row 1 13 0 (3,13.)
16 (2,16.) 0 Row 5 (4,7.) (6,4.)
0 Most graph algorithms0are designed to work with out-edge lists instead of
Compressed sparse column
0 0 10 12 0 0
Row 2 (3,10.) (4,12.)
an algorithm, MatlabBGL 9 11

0 4 lists. Before running cpRow 6
3 6 8 explicitly transposes
in-edge
0 0 14 0
1 1

graph so that Matlab’s internal representation corresponds to storing out-
the 0 9 0 0 20
Row 3 (2,4.) (5,14.)

0 lists. For algorithms symmetric graphs, these transposes are not

0 0 0 7 0 4 ri 1 3 1 2 4 2 5 3 4 5
edge on

Row 4 0 0 (6,20.)
ai 16
0 0 (3,9.) 0 0
required. 4 13 10 9 12 7 14 20 4
e mex commands mxGetPr, mxGetJc, and mxGetIr retrieve pointers to

51
Matlab’s internal storage of the matrix withoutGleich · Purdue
MRWorkshop
David making a copy. ese functions

To store an m×n sparse matrix M, Matlab uses compressed column format
[Gilbert et al., ]. Matlab never stores a 0 value in a sparse matrix. It always
“re-compresses” the data structure in these cases. If M is the adjacency matrix

Sparse matrix-vector product
of a graph, then storing the matrix by columns corresponds to storing the
We briey illustrate compressed row and column storage schemes in g-
ure ..

2
X 12 4 The matrix!
Compressed sparse row The vector! row and c
Figure 6.1 – Compressed
rp 1 3 5 7 9 11 11
[Ax]i = Ai,j xj
16 20 storage. At far le, we have a wei
1 10 4 9 7 6 1 (2,16.) (3,13.)
1 2.1
directed graph. Its weighted adjac
13 4 ci 2 3 3 4 2 5 3 6 4 6 matrix lies below. At right are the
pressed row and compressed colu
3 14 j 5 ai 2 (3,10.) (4,12.)
16 13 10 12 4 14 9 20 7 4 2 -1.3
arrays for this graph and matrix.
sparse matrices, compressed row

0 0 Compressed sparse column
column storage make it easy to ac

0
16 13 0 0
0 cp
3 (2,4.) (5,14.)
3 0.5
entries in rows and columns, resp
0 10 12 0 Consider the rd entry in rp. It sa
0 0
1 1 3 6 8 9 11
4 0 0 14 to look at the th element in ci to
4 (3,9.) (6,20.)
4 0.6
0 20 all the columns in the rd row of
0 9 0 0
0 4 ri 1 3 1 2 4 2 5 3 4 5
matrix. e th and th elements
0 0 7 0
0 0 ai 16 4 13 10 9 12 7 14 20 4
and ai tell us that row has non-
0 0 0 0 5 (4,7.) (6,4.)
5 -1.2
in columns and , with values
. When the sparse matrix corre
to the adjacency matrix of a grap
6
Most graph algorithms are designed to work with out-edge lists instead of 6 0.89
corresponds to ecient access to
out-edges and in-edges of a vertex
in-edge lists. Before running an algorithm, MatlabBGL explicitly transposes
the graph so that Matlab’s internal representation corresponds to storing out- to
To make this work, we need to get the value of the vector

52
edge lists. For algorithms on as the column ofthese matrix
the same function symmetric graphs, the transposes are not
required. David Gleich · Purdue
MRWorkshop

To store an m×n sparse matrix M, Matlab uses compressed column format
[Gilbert et al., ]. Matlab never stores a 0 value in a sparse matrix. It always
“re-compresses” the data structure in these cases. If M is the adjacency matrix

Sparse matrix-vector product
of a graph, then storing the matrix by columns corresponds to storing the
We briey illustrate compressed row and column storage schemes in g-
ure ..

2
X 12 4 The matrix!
Compressed sparse row The vector! row and c
Figure 6.1 – Compressed
rp 1 3 5 7 9 11 11
[Ax]i = Ai,j xj
16 20 storage. At far le, we have a wei
1 10 4 9 7 6 1 (2,16.) (3,13.)
1 2.1
directed graph. Its weighted adjac
13 4 ci 2 3 3 4 2 5 3 6 4 6 matrix lies below. At right are the
pressed row and compressed colu
3 14 j 5 ai 2 (3,10.) (4,12.)
16 13 10 12 4 14 9 20 7 4 2 -1.3
arrays for this graph and matrix.
sparse matrices, compressed row

0 0 Compressed sparse column
column storage make it easy to ac

0
16 13 0 0
0 cp
3 (2,4.) (5,14.)
3 0.5
entries in rows and columns, resp
0 10 12 0 Consider the rd entry in rp. It sa
0 0
1 1 3 6 8 9 11
4 0 0 14 to look at the th element in ci to
4 (3,9.) (6,20.)
4 0.6
0 20 all the columns in the rd row of
0 9 0 0
0 4 ri 1 3 1 2 4 2 5 3 4 5
matrix. e th and th elements
0 0 7 0
0 0 ai 16 4 13 10 9 12 7 14 20 4
and ai tell us that row has non-
0 0 0 0 5 (4,7.) (6,4.)
5 -1.2
in columns and , with values
. When the sparse matrix corre
to the adjacency matrix of a grap
6
Most graph algorithms are designed to work with out-edge lists instead of 6 0.89
corresponds to ecient access to
out-edges and in-edges of a vertex
in-edge lists. Before running an algorithm, MatlabBGL explicitly transposes
the graph so need to “join” the representationvector based storing out-
We that Matlab’s internal matrix and corresponds to on the column

53
edge lists. For algorithms on symmetric graphs, these transposes are not
required. David Gleich · Purdue
MRWorkshop

Sparse matrix-vector product!
takes two MR tasks
Two type
so
Map! records!
f Map!
If vector, emit (row,vecval)
Identity
If matrix,
for each non-zero (row,col,val),
emit (col,(row,val))

One of th
ese
values is
not like Reduce (row, [(Aij xj), …]) !
Reduce! the other
s
Find vecval in input keys
emit (row, sum(Aij xj))
For each (col,(row,val)),
emit (row,(val*vecval))
Form Aij xj for each nonzero
Regroup data by rows, compute sums

54
MRWorkshop

What about a “dense” row?
Map!
If vector, emit (row,vecval)

If matrix, How do we ﬁnd
for each non-zero (row,col,val),
emit (col,(row,val))
vecval without

One of th
ese
looking through
values is
Reduce! the other
not like
s
(and buffering) all
the input?

55
MRWorkshop

Sparse matrix-vector product!
takes two MR tasks
Two type
so
Map! records!
f
If vector, emit ((row,-1),vecval)

If matrix, Use a custom partitioner
for each non-zero (row,col,val), to make sure that (row,*)
emit ((col,0),(row,val))
all get mapped to the

same reducer, and that
we always see (row,-1)
Reduce!
before (row,0).
Regroup data by rows, compute sums

56
MRWorkshop

Demo 4
Sparse matrix vector products

57
MRWorkshop

58
MRWorkshop

Matrix factorizations

59
MRWorkshop

Algorithm
Data Rows of a matrix
A1 A1 Map QR factorization of rows
A2
qr Reduce QR factorization of rows
A2 Q2 R2
Mapper 1 qr
Serial TSQR A3 A3 Q3 R3
A4 qr emit
A4 Q4 R4

A5 A5
qr
A6 A6 Q6 R6
Mapper 2 qr
Serial TSQR A7 A7 Q7 R7

A8 qr emit
A8 Q8 R8

R4 R4
Reducer 1
Serial TSQR qr emit
R8 R8 Q R

60
MRWorkshop

In hadoopy
Full code in hadoopy
import random, numpy, hadoopy def close(self):
class SerialTSQR: self.compress()
def __init__(self,blocksize,isreducer): for row in self.data:
key = random.randint(0,2000000000)
self.bsize=blocksize yield key, row
self.data = []
if isreducer: self.__call__ = self.reducer def mapper(self,key,value):
else: self.__call__ = self.mapper self.collect(key,value)

def reducer(self,key,values):
def compress(self): for value in values: self.mapper(key,value)
R = numpy.linalg.qr(
numpy.array(self.data),'r') if __name__=='__main__':
# reset data and re-initialize to R mapper = SerialTSQR(blocksize=3,isreducer=False)
self.data = [] reducer = SerialTSQR(blocksize=3,isreducer=True)
for row in R: hadoopy.run(mapper, reducer)
self.data.append([float(v) for v in row])

def collect(self,key,value):
self.data.append(value)
if len(self.data)self.bsize*len(self.data[0]):
self.compress()

61
David Gleich (Sandia) MapReduceDavid
2011 Gleich · Purdue
MRWorkshop
13/22

Related resources

Apache Mahout
Machine learning for Hadoop
… lots of matrices there …

Another fantasic tutorial
http://www.eurecom.fr/~michiard/
teaching/webtech/tutorial.pdf

62
MRWorkshop

Way too much stuff!

I hope to keep expanding this tutorial
over the week…

Keep checking the git repo.

63
MRWorkshop

MapReduce for scientific simulation analysis

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie MapReduce for scientific simulation analysis

Ähnlich wie MapReduce for scientific simulation analysis (20)

Mehr von David Gleich

Mehr von David Gleich (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

MapReduce for scientific simulation analysis