SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Downloaden Sie, um offline zu lesen
Tall-and-skinny !
QRs and SVDs in
MapReduce
David F. Gleich!
Computer Science!
Purdue University!
A1
A4
A2
A3
A4
Yangyang Hou "
Purdue, CS
Paul G. Constantine "
Austin Benson "
Joe Nichols"
Stanford University

James Demmel "
UC Berkeley
Joe Ruthruff "
Jeremy Templeton"
Sandia CA
Mainly funded by Sandia National Labs
CSAR project, recently by NSF CAREER,"
and Purdue Research Foundation.
 ICML2013
David Gleich · Purdue 
1
2
A
From tinyimages"
collection
ICML2013
Tall-and-Skinny
matrices

(m ≫ n) 
Many rows (like a billion)
A few columns (under 10,000)
regression and!
general linear models!
with many samples!

block iterative methods
panel factorizations

approximate kernel k-means 

big-data SVD/PCA!
Used in
David Gleich · Purdue
A matrix A : m × n, m ≥ n"
is tall and skinny when O(n2) "
work and storage is “cheap”
compared to m.
ICML2013
David Gleich · Purdue 
3
-- Austin Benson
Quick review of QR
ICML2013
4
andia)
   is given by
the solution of   
QR is block normalization
“normalize” a vector
usually generalizes to
computing    in the QR
A Q
, real
orthogonal (   )
upper triangular.
0
R
=
MapReduce 2011
David Gleich · Purdue 
Let A : m × n, m ≥ n, real
A = QR
Q is m × n orthogonal (QT Q = I )
R is n × n upper triangular
  
Tall-and-skinny SVD and RSVD
Let A : m × n, m ≥ n, real
A = U𝞢VT
U is m × n orthogonal 
𝞢 is m × n nonneg, diag.
V is n × n orthogonal

ICML2013
David Gleich · Purdue 
5
A Q
TSQR
R V
SVD
There are good MPI
implementations.

What’s left to do? 
ICML2013
6
David Gleich · Purdue
Moving data to an MPI cluster
may be infeasible or costly
ICML2013
David Gleich · Purdue 
7
Data computers I’ve used 
ICML2013
8

Nebula Cluster @ !
Sandia CA!
2TB/core storage, "
64 nodes, 256 cores, "
GB ethernet
Cost $150k

These systems are good for working with
enormous matrix data!

Student Cluster @ !
Stanford!
3TB/core storage, "
11 nodes, 44 cores, "
GB ethernet
Cost $30k


Magellan Cluster @ !
NERSC!
128GB/core storage, "
80 nodes, 640 cores, "
infiniband

David Gleich · Purdue
By 2013(?) all Fortune 500
companies will have a data
computer
ICML2013
9
David Gleich · Purdue
How do you program them?
ICML2013
10
David Gleich · Purdue
A graphical view of the MapReduce
programming model
David Gleich · Purdue 
11
data
Map
data
Map
data
Map
data
Map
key
value
key
value
key
value
key
value
key
value
key
value
()
Shuffle
key
value
value
dataReduce
key
value
value
value
dataReduce
key
value dataReduce
ICML2013
Map tasks read batches of data in
parallel and do some initial filtering
Reduce is often where the
computation happens
Shuffle is a
global comm.
like group-by
or MPIAlltoall
MapReduce looks like the first
step of any MPI program that
loads data.

Read data -> assign to process
Receive data -> do compute
ICML2013
David Gleich · Purdue 
12
Still, isn’t this easy to do?

ICML2013
David Gleich · Purdue 
13
Current MapReduce algs use the normal equations


A = QR AT
A
Cholesky
! RT
R Q = AR 1
A1
A4
A2
A3
A4
Map!
Aii to Ai
TAi 

Reduce!
RTR = Sum(Ai
TAi)

Map 2!
Aii to Ai R-1

Two problems!

R inaccurate if A ill-conditioned

Q not numerically orthogonal
(Householder assures this)
Numerical stability was a
problem for prior approaches
ICML2013
14
Condition number
1020
105
norm(QTQ–I)
AR-1
Prior work
Previous methods
couldn’t ensure
that the matrix Q
was orthogonal 
David Gleich · Purdue
Four things that are better
1.  A simple algorithm to compute R accurately.
(but doesn’t help get Q orthogonal).
2.  “Fast algorithm” to get Q numerically
orthogonal in most cases.
3.  Multi-pass algorithm to get Q numerically
orthogonal in virtually all cases.
4.  A direct algorithm for a numerically
orthogonal Q in all cases.
ICML2013
David Gleich · Purdue 
15
Constantine & Gleich MapReduce 2011
Benson, Gleich & Demmel arXiv 2013
Numerical stability was a
problem for prior approaches
ICML2013
16
Condition number
1020
105
norm(QTQ–I)
AR-1
AR-1 + "
iterative refinement
 Direct TSQR
Benson, Gleich, "
Demmel, Submitted
Prior work
Constantine & Gleich,
MapReduce 2011
Benson, Gleich,
Demmel, Submitted 
Previous methods
couldn’t ensure
that the matrix Q
was orthogonal 
David Gleich · Purdue
How to store tall-and-skinny
matrices in Hadoop
David Gleich · Purdue 
17
A1
A4
A2
A3
A4
A : m x n, m ≫ n

Key is an arbitrary row-id
Value is the 1 x n array "
for a row (or b x n block)

Each submatrix Ai is an "
the input to a map task.
ICML2013
MapReduce is great for TSQR!!
You don’t need  ATA
Data A tall and skinny (TS) matrix by rows

Input 500,000,000-by-50 matrix"
Each record 1-by-50 row"
HDFS Size 183.6 GB

Time to compute read A 253 sec. write A 848 sec.!
Time to compute  R in qr(A) 526 sec. w/ Q=AR-1 1618 sec. "
Time to compute Q in qr(A) 3090 sec. (numerically stable)!

ICML2013
David Gleich · Purdue 
18
Communication avoiding QR
(Demmel et al. 2008)
Communication avoidingTSQR
Demmel et al. 2008. Communicating avoiding parallel and sequential QR.
First, do QR
factorizations
of each local
matrix   
Second, compute
a QR factorization
of the new “R”
19
ICML2013
David Gleich · Purdue
Serial QR factorizations!
(Demmel et al. 2008)
Fully serialTSQR
Demmel et al. 2008. Communicating avoiding parallel and sequential QR.
Compute QR of    ,
read    , update QR, …
20
ICML2013
David Gleich · Purdue
A1
A2
A3
A1
A2
qr
Q2 R2
A3
qr
Q3 R3
A4
qr
Q4A4
R4
emit
A5
A6
A7
A5
A6
qr
Q6 R6
A7
qr
Q7 R7
A8
qr
Q8A8
R8
emit
Mapper 1
Serial TSQR
R4
R8
Mapper 2
Serial TSQR
R4
R8
qr
Q
emit
R
Reducer 1
Serial TSQR
Algorithm
Data Rows of a matrix
Map QR factorization of rows
Reduce QR factorization of rows
Communication avoiding QR (Demmel et al. 2008) !
on MapReduce (Constantine and Gleich, 2011)
ICML2013
21
David Gleich · Purdue
Too many maps cause too
much data to one reducer!
Each image is 5k.
Each HDFS block has "
12,800 images.
6,250 total blocks.
Each map outputs "
1000-by-1000 matrix
One reducer gets a 6.25M-
by-1000 matrix (50GB)
From tinyimages
collection
"

80,000,000images
1000 pixels
ICML2013
David Gleich · Purdue 
22
S(1)
A
A1
A2
A3
A3
R1
map
Mapper 1-1
Serial TSQR
A2
emit
R2
map
Mapper 1-2
Serial TSQR
A3
emit
R3
map
Mapper 1-3
Serial TSQR
A4
emit
R4
map
Mapper 1-4
Serial TSQR
shuffle
S1
A2
reduce
Reducer 1-1
Serial TSQR
S2
R2,2
reduce
Reducer 1-2
Serial TSQR
R2,1
emit
emit
emit
shuffle
A2S3
R2,3
reduce
Reducer 1-3
Serial TSQR
emit
Iteration 1 Iteration 2
identitymap
A2S(2)
Rreduce
Reducer 2-1
Serial TSQR
emit
ICML2013
David Gleich · Purdue 
23
Our robust algorithm to
compute R.
ICML2013
David Gleich · Purdue 
24
Full TSQR code in hadoopy
In hadoopy
import random, numpy, hadoopy
class SerialTSQR:
def __init__(self,blocksize,isreducer):
self.bsize=blocksize
self.data = []
if isreducer: self.__call__ = self.reducer
else: self.__call__ = self.mapper
def compress(self):
R = numpy.linalg.qr(
numpy.array(self.data),'r')
# reset data and re-initialize to R
self.data = []
for row in R:
self.data.append([float(v) for v in row])
def collect(self,key,value):
self.data.append(value)
if len(self.data)>self.bsize*len(self.data[0]):
self.compress()
def close(self):
self.compress()
for row in self.data:
key = random.randint(0,2000000000)
yield key, row
def mapper(self,key,value):
self.collect(key,value)
def reducer(self,key,values):
for value in values: self.mapper(key,value)
if __name__=='__main__':
mapper = SerialTSQR(blocksize=3,isreducer=False)
reducer = SerialTSQR(blocksize=3,isreducer=True)
hadoopy.run(mapper, reducer)
David Gleich (Sandia) 13/22MapReduce 2011
25
ICML2013
David Gleich · Purdue
Numerical stability was a
problem for prior approaches
ICML2013
26
Condition number
1020
105
norm(QTQ–I)
AR-1
AR-1 + "
iterative refinement
 Direct TSQR
Benson, Gleich, "
Demmel, Submitted
Prior work
Constantine & Gleich,
MapReduce 2011
Benson, Gleich,
Demmel, Submitted 
Previous methods
couldn’t ensure
that the matrix Q
was orthogonal 
David Gleich · Purdue
Iterative refinement helps
ICML2013
David Gleich · Purdue 
27
A1
A4
Q1
R-1
Mapper 1
A2
 Q2
A3
 Q3
A4
 Q4
R
TSQR
DistributeR
 R-1
R-1
R-1
Iterative refinement is like using Newton’s method to solve Ax = b. There’s a famous
quote that “two iterations of iterative refinement are enough” attributed to Parlett; 
TSQR
Q1
A4
Q1
T-1
Mapper 2
Q2
 Q2
Q3
 Q3
Q4
 Q4
T
DistributeT
T-1
T-1
T-1
Numerical stability was a
problem for prior approaches
ICML2013
28
Condition number
1020
105
norm(QTQ–I)
AR-1
AR-1 + "
iterative refinement
 Direct TSQR
Benson, Gleich, "
Demmel, Submitted
Prior work
Constantine & Gleich,
MapReduce 2011
Benson, Gleich,
Demmel, Submitted 
Previous methods
couldn’t ensure
that the matrix Q
was orthogonal 
David Gleich · Purdue
What if iterative refinement is
too slow?
ICML2013
David Gleich · Purdue 
29
A1
A4
Q1
R-1
Mapper 1
A2
 Q2
A3
 Q3
A4
 Q4
S
Sample
ComputeQR,"
distributeR
 R-1
R-1
R-1
TSQR
A1
A4
Q1
T-1
Mapper 2
A2
 Q2
A3
 Q3
A4
 Q4
T
DistributeTR
T-1
T-1
T-1
Based on recent work by Ipsen et al. on approximating QR with a random
subset of rows. Also assumes that you can get a subset of rows “cheaply” –
possible, but nontrivial in Hadoop.
R-1
R-1
R-1
R-1
Estimate the “norm” by S
Orthogonality of the sampled
solution
ICML2013
30
Condition number
1020
105
norm(QTQ–I)
AR-1
Direct TSQR
Benson, Gleich, "
Demmel, Submitted
Prior work
Constantine & Gleich,
MapReduce 2011
Benson, Gleich,
Demmel, Submitted 
David Gleich · Purdue
Four things that are better
1.  A simple algorithm to compute R accurately.
(but doesn’t help get Q orthogonal).
2.  Fast algorithm to get Q numerically
orthogonal in most cases.
3.  Two pass algorithm to get Q numerically
orthogonal in virtually all cases.
4.  A direct algorithm for a numerically
orthogonal Q in all cases.
ICML2013
David Gleich · Purdue 
31
Constantine & Gleich MapReduce 2011
Benson, Gleich & Demmel arXiv 2013
Recreate Q by storing the
history of the factorization
ICML2013
David Gleich · Purdue 
32
A1
A4
Q1
R1
Mapper 1
A2
 Q2
R2
A3
 Q3
R3
A4
 Q4
Q1
Q2
Q3
Q4
R1
R2
R3
R4
R4
Qoutput
Routput
Q11
Q21
Q31
Q41
R
Task 2
Q11
Q21
Q31
Q41
Q1
Q2
Q3
Q4
Mapper 3
1. Output local Q and
R in separate files
2. Collect R on one
node, compute Qs
for each piece
3. Distribute the
pieces of Q*1 and
form the true Q
Theoretical lower bound on runtime
for a few cases on our small cluster
Rows
 Cols
 Old
 R-only
+ no IR
R-only
+ PIR
R-only
+ IR
Direct
TSQR
4.0B
 4
 1803
 1821
 1821
 2343
 2525
2.5B
 10
 1645
 1655
 1655
 2062
 2464
0.6B
 25
 804
 812
 812
 1000
 1237
0.5B
 50
 1240
 1250
 1250
 1517
 2103
ICML2013
David Gleich · Purdue 
33
All values in
seconds

Only two params
needed – read and
write bandwidth for
the cluster – in
order to derive a
performance model
of the algorithm.
This simple model
is almost within a
factor of two of the
true runtime. "
(10-node cluster,
60 disks)
Rows
 Cols
 Old
 R-only
+ no IR
R-only
+ PIR
R-only
+ IR
Direct
TSQR
4.0B
 4
 2931
 3460
 3620
 4741
 6128
2.5B
 10
 2508
 2509
 3354
 4034
 4035
0.6B
 25
 1098
 1104
 1476
 2006
 1910
0.5B
 50
 921
 1618
 1960
 2655
 3090
Model
Actual
My
application is "
big simulation
data.
ICML2013
David Gleich · Purdue 
34
ICML2013
David Gleich · Purdue 
35
Nonlinear heat transfer model in
random media
Each run takes 5 hours on 8
processors, we did 8192 runs.
4.2M nodes, 9 time-steps, 128
realizations, 64 size parameters
SVD of 5B x 64 data matrix
50x reduction in wall-clock time
including all pre and post-
processing; “∞x” speedup for a
particular study without pre-
processing (< 60 sec on laptop).
Interpolating adaptive low-rank
approximations.
Constantine, Gleich,"
Hou & Templeton arXiv 2013.
1
error
0
2
(a) Error, s = 0.39 cm
1
std
0
2
(b) Std, s = 0.39 cm
10
error
0
20
(c) Error, s = 1.95 cm
10
std
0
20
(d) Std, s = 1.95 cm
Fig. 4.5: Error in the reduce order model compared to the prediction standard de-
viation for one realization of the bubble locations at the final time for two values of
10
error
0
20
(c) Error, s = 1.95 cm
Fig. 4.5: Error in the reduce order model com
viation for one realization of the bubble locati
the bubble radius, s = 0.39 and s = 1.95 cm
version.)
the varying conductivity fields took approxima
Cubit after substantial optimizations.
Working with the simulation data involved
interpret 4TB of Exodus II files from Aria, glo
TSSVD, and compute predictions and errors.
imately 8-15 hours. We collected precise timi
it as these times are from a multi-tenant, uno
jobs with sizes ranging between 100GB and 2T
Dynamic Mode
Decomposition
One simulation, ~10TB of data, compute the
SVD of a space-by-time matrix.
ICML2013
David Gleich · Purdue 
36
DMD video
Is double-precision sufficient?
Double-precision floating point was designed for the
era where “big” was 1000-10000
s = 0; for i=1 to n: s = s + x[i]


A simple summation formula has "
error that is not always small if n is a billion
Watch out for this issue in important computations
ICML2013
David Gleich · Purdue 
37
fl(x + y) = (x + y)(1 + ")
fl(
X
i
xi )
X
i
xi  nµ
X
i
|xi | µ ⇡ 10 16
Papers

Constantine & Gleich, MapReduce 2011
Benson, Gleich & Demmel, arXiv 2013

Constantine & Gleich, ICASSP 2012
Constantine, Gleich, Hou & Templeton, "
arXiv 2013
ICML2013
David Gleich · Purdue 
38
Code

https://github.com/arbenson/mrtsqr


https://github.com/dgleich/simform
"

Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Multi-scalar multiplication: state of the art and new ideas
Multi-scalar multiplication: state of the art and new ideasMulti-scalar multiplication: state of the art and new ideas
Multi-scalar multiplication: state of the art and new ideasGus Gutoski
 
presentation
presentationpresentation
presentationjie ren
 
Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...zukun
 
End of Sprint 5
End of Sprint 5End of Sprint 5
End of Sprint 5dm_work
 
Hyperspectral Image Reduction
Hyperspectral Image ReductionHyperspectral Image Reduction
Hyperspectral Image ReductionSarah Hussein
 
Constrained Support Vector Quantile Regression for Conditional Quantile Estim...
Constrained Support Vector Quantile Regression for Conditional Quantile Estim...Constrained Support Vector Quantile Regression for Conditional Quantile Estim...
Constrained Support Vector Quantile Regression for Conditional Quantile Estim...Kostas Hatalis, PhD
 
Effect of gauss doping profile on electric potential of p-n diode
Effect of gauss doping profile on electric potential of p-n diodeEffect of gauss doping profile on electric potential of p-n diode
Effect of gauss doping profile on electric potential of p-n diodeagung sapteka
 
Nested dynamic programming algorithm
Nested dynamic programming algorithmNested dynamic programming algorithm
Nested dynamic programming algorithmBlagoj Delipetrev
 
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...Jumlesha Shaik
 
Large scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkLarge scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkMila, Université de Montréal
 
Quantum for Healthcare - 2020-07-14
Quantum for Healthcare - 2020-07-14Quantum for Healthcare - 2020-07-14
Quantum for Healthcare - 2020-07-14Aritra Sarkar
 
Search and optimization on quantum accelerators - 2019-05-23
Search and optimization on quantum accelerators - 2019-05-23Search and optimization on quantum accelerators - 2019-05-23
Search and optimization on quantum accelerators - 2019-05-23Aritra Sarkar
 
Streaming multiscale anomaly detection
Streaming multiscale anomaly detectionStreaming multiscale anomaly detection
Streaming multiscale anomaly detectionRavi Kiran B.
 
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...Nesreen K. Ahmed
 
Pilot Optimization and Channel Estimation for Multiuser Massive MIMO Systems
Pilot Optimization and Channel Estimation for Multiuser Massive MIMO SystemsPilot Optimization and Channel Estimation for Multiuser Massive MIMO Systems
Pilot Optimization and Channel Estimation for Multiuser Massive MIMO SystemsT. E. BOGALE
 
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...Tom Hubregtsen
 

Was ist angesagt? (20)

Multi-scalar multiplication: state of the art and new ideas
Multi-scalar multiplication: state of the art and new ideasMulti-scalar multiplication: state of the art and new ideas
Multi-scalar multiplication: state of the art and new ideas
 
presentation
presentationpresentation
presentation
 
CLIM: Transition Workshop - Sea Ice, Unstable Subspace, Model Error: Three To...
CLIM: Transition Workshop - Sea Ice, Unstable Subspace, Model Error: Three To...CLIM: Transition Workshop - Sea Ice, Unstable Subspace, Model Error: Three To...
CLIM: Transition Workshop - Sea Ice, Unstable Subspace, Model Error: Three To...
 
Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...
 
End of Sprint 5
End of Sprint 5End of Sprint 5
End of Sprint 5
 
Europy17_dibernardo
Europy17_dibernardoEuropy17_dibernardo
Europy17_dibernardo
 
Hyperspectral Image Reduction
Hyperspectral Image ReductionHyperspectral Image Reduction
Hyperspectral Image Reduction
 
Constrained Support Vector Quantile Regression for Conditional Quantile Estim...
Constrained Support Vector Quantile Regression for Conditional Quantile Estim...Constrained Support Vector Quantile Regression for Conditional Quantile Estim...
Constrained Support Vector Quantile Regression for Conditional Quantile Estim...
 
Effect of gauss doping profile on electric potential of p-n diode
Effect of gauss doping profile on electric potential of p-n diodeEffect of gauss doping profile on electric potential of p-n diode
Effect of gauss doping profile on electric potential of p-n diode
 
Nested dynamic programming algorithm
Nested dynamic programming algorithmNested dynamic programming algorithm
Nested dynamic programming algorithm
 
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...
 
Large scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkLarge scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using spark
 
Quantum for Healthcare - 2020-07-14
Quantum for Healthcare - 2020-07-14Quantum for Healthcare - 2020-07-14
Quantum for Healthcare - 2020-07-14
 
Search and optimization on quantum accelerators - 2019-05-23
Search and optimization on quantum accelerators - 2019-05-23Search and optimization on quantum accelerators - 2019-05-23
Search and optimization on quantum accelerators - 2019-05-23
 
Streaming multiscale anomaly detection
Streaming multiscale anomaly detectionStreaming multiscale anomaly detection
Streaming multiscale anomaly detection
 
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
 
rgDefense
rgDefensergDefense
rgDefense
 
Pilot Optimization and Channel Estimation for Multiuser Massive MIMO Systems
Pilot Optimization and Channel Estimation for Multiuser Massive MIMO SystemsPilot Optimization and Channel Estimation for Multiuser Massive MIMO Systems
Pilot Optimization and Channel Estimation for Multiuser Massive MIMO Systems
 
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 

Andere mochten auch

Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
 
Iterative methods for network alignment
Iterative methods for network alignmentIterative methods for network alignment
Iterative methods for network alignmentDavid Gleich
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
 
Direct tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDirect tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDavid Gleich
 
A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignmentDavid Gleich
 
The power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsThe power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsDavid Gleich
 
A history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveA history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveDavid Gleich
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisDavid Gleich
 
Tall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesTall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesDavid Gleich
 
How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...David Gleich
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksDavid Gleich
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential David Gleich
 
A dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationA dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationDavid Gleich
 
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...David Gleich
 
MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisDavid Gleich
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networksDavid Gleich
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detectionDavid Gleich
 

Andere mochten auch (20)

Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
Iterative methods for network alignment
Iterative methods for network alignmentIterative methods for network alignment
Iterative methods for network alignment
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
Direct tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDirect tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architectures
 
A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignment
 
The power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsThe power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulants
 
A history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveA history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspective
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
Tall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesTall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architectures
 
How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
A dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationA dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportation
 
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
 
MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysis
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 

Ähnlich wie Tall and Skinny QRs in MapReduce

Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduceDavid Gleich
 
Massive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph AlgorithmsMassive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph AlgorithmsDavid Gleich
 
Determining the k in k-means with MapReduce
Determining the k in k-means with MapReduceDetermining the k in k-means with MapReduce
Determining the k in k-means with MapReduceThibault Debatty
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUsSri Ambati
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresDavid Gleich
 
Slide 1
Slide 1Slide 1
Slide 1butest
 
Slide 1
Slide 1Slide 1
Slide 1butest
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceKyong-Ha Lee
 
Spatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud dataSpatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud dataTony Fast
 
Achitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and ExascaleAchitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and Exascaleinside-BigData.com
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreDavid Gleich
 
How to Prepare Weather and Climate Models for Future HPC Hardware
How to Prepare Weather and Climate Models for Future HPC HardwareHow to Prepare Weather and Climate Models for Future HPC Hardware
How to Prepare Weather and Climate Models for Future HPC Hardwareinside-BigData.com
 
9A_1_On automatic mapping of environmental data using adaptive general regres...
9A_1_On automatic mapping of environmental data using adaptive general regres...9A_1_On automatic mapping of environmental data using adaptive general regres...
9A_1_On automatic mapping of environmental data using adaptive general regres...GISRUK conference
 
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...LDBC council
 
EOS5 Demo
EOS5 DemoEOS5 Demo
EOS5 Demodm_work
 
An NSA Big Graph experiment
An NSA Big Graph experimentAn NSA Big Graph experiment
An NSA Big Graph experimentTrieu Nguyen
 
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...Austin Benson
 
“New Methods for Implementation of 2-D Convolution for Convolutional Neural N...
“New Methods for Implementation of 2-D Convolution for Convolutional Neural N...“New Methods for Implementation of 2-D Convolution for Convolutional Neural N...
“New Methods for Implementation of 2-D Convolution for Convolutional Neural N...Edge AI and Vision Alliance
 

Ähnlich wie Tall and Skinny QRs in MapReduce (20)

Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduce
 
Massive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph AlgorithmsMassive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph Algorithms
 
Determining the k in k-means with MapReduce
Determining the k in k-means with MapReduceDetermining the k in k-means with MapReduce
Determining the k in k-means with MapReduce
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Slide 1
Slide 1Slide 1
Slide 1
 
Slide 1
Slide 1Slide 1
Slide 1
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduce
 
Defense
DefenseDefense
Defense
 
Spatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud dataSpatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud data
 
Achitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and ExascaleAchitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and Exascale
 
Realtime Analytics
Realtime AnalyticsRealtime Analytics
Realtime Analytics
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and more
 
How to Prepare Weather and Climate Models for Future HPC Hardware
How to Prepare Weather and Climate Models for Future HPC HardwareHow to Prepare Weather and Climate Models for Future HPC Hardware
How to Prepare Weather and Climate Models for Future HPC Hardware
 
9A_1_On automatic mapping of environmental data using adaptive general regres...
9A_1_On automatic mapping of environmental data using adaptive general regres...9A_1_On automatic mapping of environmental data using adaptive general regres...
9A_1_On automatic mapping of environmental data using adaptive general regres...
 
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
 
EOS5 Demo
EOS5 DemoEOS5 Demo
EOS5 Demo
 
An NSA Big Graph experiment
An NSA Big Graph experimentAn NSA Big Graph experiment
An NSA Big Graph experiment
 
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
 
“New Methods for Implementation of 2-D Convolution for Convolutional Neural N...
“New Methods for Implementation of 2-D Convolution for Convolutional Neural N...“New Methods for Implementation of 2-D Convolution for Convolutional Neural N...
“New Methods for Implementation of 2-D Convolution for Convolutional Neural N...
 

Mehr von David Gleich

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisDavid Gleich
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksDavid Gleich
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresDavid Gleich
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansDavid Gleich
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsDavid Gleich
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph miningDavid Gleich
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structuresDavid Gleich
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphsDavid Gleich
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
 
Matrix methods for Hadoop
Matrix methods for HadoopMatrix methods for Hadoop
Matrix methods for HadoopDavid Gleich
 

Mehr von David Gleich (11)

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based Learning
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chains
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 
Matrix methods for Hadoop
Matrix methods for HadoopMatrix methods for Hadoop
Matrix methods for Hadoop
 

Tall and Skinny QRs in MapReduce

  • 1. Tall-and-skinny ! QRs and SVDs in MapReduce David F. Gleich! Computer Science! Purdue University! A1 A4 A2 A3 A4 Yangyang Hou " Purdue, CS Paul G. Constantine " Austin Benson " Joe Nichols" Stanford University James Demmel " UC Berkeley Joe Ruthruff " Jeremy Templeton" Sandia CA Mainly funded by Sandia National Labs CSAR project, recently by NSF CAREER," and Purdue Research Foundation. ICML2013 David Gleich · Purdue 1
  • 2. 2 A From tinyimages" collection ICML2013 Tall-and-Skinny matrices (m ≫ n) Many rows (like a billion) A few columns (under 10,000) regression and! general linear models! with many samples! block iterative methods panel factorizations approximate kernel k-means big-data SVD/PCA! Used in David Gleich · Purdue
  • 3. A matrix A : m × n, m ≥ n" is tall and skinny when O(n2) " work and storage is “cheap” compared to m. ICML2013 David Gleich · Purdue 3 -- Austin Benson
  • 4. Quick review of QR ICML2013 4 andia)    is given by the solution of    QR is block normalization “normalize” a vector usually generalizes to computing    in the QR A Q , real orthogonal (   ) upper triangular. 0 R = MapReduce 2011 David Gleich · Purdue Let A : m × n, m ≥ n, real A = QR Q is m × n orthogonal (QT Q = I ) R is n × n upper triangular
  • 5.    Tall-and-skinny SVD and RSVD Let A : m × n, m ≥ n, real A = U𝞢VT U is m × n orthogonal 𝞢 is m × n nonneg, diag. V is n × n orthogonal ICML2013 David Gleich · Purdue 5 A Q TSQR R V SVD
  • 6. There are good MPI implementations. What’s left to do? ICML2013 6 David Gleich · Purdue
  • 7. Moving data to an MPI cluster may be infeasible or costly ICML2013 David Gleich · Purdue 7
  • 8. Data computers I’ve used ICML2013 8 Nebula Cluster @ ! Sandia CA! 2TB/core storage, " 64 nodes, 256 cores, " GB ethernet Cost $150k These systems are good for working with enormous matrix data! Student Cluster @ ! Stanford! 3TB/core storage, " 11 nodes, 44 cores, " GB ethernet Cost $30k Magellan Cluster @ ! NERSC! 128GB/core storage, " 80 nodes, 640 cores, " infiniband David Gleich · Purdue
  • 9. By 2013(?) all Fortune 500 companies will have a data computer ICML2013 9 David Gleich · Purdue
  • 10. How do you program them? ICML2013 10 David Gleich · Purdue
  • 11. A graphical view of the MapReduce programming model David Gleich · Purdue 11 data Map data Map data Map data Map key value key value key value key value key value key value () Shuffle key value value dataReduce key value value value dataReduce key value dataReduce ICML2013 Map tasks read batches of data in parallel and do some initial filtering Reduce is often where the computation happens Shuffle is a global comm. like group-by or MPIAlltoall
  • 12. MapReduce looks like the first step of any MPI program that loads data. Read data -> assign to process Receive data -> do compute ICML2013 David Gleich · Purdue 12
  • 13. Still, isn’t this easy to do? ICML2013 David Gleich · Purdue 13 Current MapReduce algs use the normal equations A = QR AT A Cholesky ! RT R Q = AR 1 A1 A4 A2 A3 A4 Map! Aii to Ai TAi Reduce! RTR = Sum(Ai TAi) Map 2! Aii to Ai R-1 Two problems! R inaccurate if A ill-conditioned Q not numerically orthogonal (Householder assures this)
  • 14. Numerical stability was a problem for prior approaches ICML2013 14 Condition number 1020 105 norm(QTQ–I) AR-1 Prior work Previous methods couldn’t ensure that the matrix Q was orthogonal David Gleich · Purdue
  • 15. Four things that are better 1.  A simple algorithm to compute R accurately. (but doesn’t help get Q orthogonal). 2.  “Fast algorithm” to get Q numerically orthogonal in most cases. 3.  Multi-pass algorithm to get Q numerically orthogonal in virtually all cases. 4.  A direct algorithm for a numerically orthogonal Q in all cases. ICML2013 David Gleich · Purdue 15 Constantine & Gleich MapReduce 2011 Benson, Gleich & Demmel arXiv 2013
  • 16. Numerical stability was a problem for prior approaches ICML2013 16 Condition number 1020 105 norm(QTQ–I) AR-1 AR-1 + " iterative refinement Direct TSQR Benson, Gleich, " Demmel, Submitted Prior work Constantine & Gleich, MapReduce 2011 Benson, Gleich, Demmel, Submitted Previous methods couldn’t ensure that the matrix Q was orthogonal David Gleich · Purdue
  • 17. How to store tall-and-skinny matrices in Hadoop David Gleich · Purdue 17 A1 A4 A2 A3 A4 A : m x n, m ≫ n Key is an arbitrary row-id Value is the 1 x n array " for a row (or b x n block) Each submatrix Ai is an " the input to a map task. ICML2013
  • 18. MapReduce is great for TSQR!! You don’t need  ATA Data A tall and skinny (TS) matrix by rows Input 500,000,000-by-50 matrix" Each record 1-by-50 row" HDFS Size 183.6 GB Time to compute read A 253 sec. write A 848 sec.! Time to compute  R in qr(A) 526 sec. w/ Q=AR-1 1618 sec. " Time to compute Q in qr(A) 3090 sec. (numerically stable)! ICML2013 David Gleich · Purdue 18
  • 19. Communication avoiding QR (Demmel et al. 2008) Communication avoidingTSQR Demmel et al. 2008. Communicating avoiding parallel and sequential QR. First, do QR factorizations of each local matrix    Second, compute a QR factorization of the new “R” 19 ICML2013 David Gleich · Purdue
  • 20. Serial QR factorizations! (Demmel et al. 2008) Fully serialTSQR Demmel et al. 2008. Communicating avoiding parallel and sequential QR. Compute QR of    , read    , update QR, … 20 ICML2013 David Gleich · Purdue
  • 21. A1 A2 A3 A1 A2 qr Q2 R2 A3 qr Q3 R3 A4 qr Q4A4 R4 emit A5 A6 A7 A5 A6 qr Q6 R6 A7 qr Q7 R7 A8 qr Q8A8 R8 emit Mapper 1 Serial TSQR R4 R8 Mapper 2 Serial TSQR R4 R8 qr Q emit R Reducer 1 Serial TSQR Algorithm Data Rows of a matrix Map QR factorization of rows Reduce QR factorization of rows Communication avoiding QR (Demmel et al. 2008) ! on MapReduce (Constantine and Gleich, 2011) ICML2013 21 David Gleich · Purdue
  • 22. Too many maps cause too much data to one reducer! Each image is 5k. Each HDFS block has " 12,800 images. 6,250 total blocks. Each map outputs " 1000-by-1000 matrix One reducer gets a 6.25M- by-1000 matrix (50GB) From tinyimages collection " 80,000,000images 1000 pixels ICML2013 David Gleich · Purdue 22
  • 23. S(1) A A1 A2 A3 A3 R1 map Mapper 1-1 Serial TSQR A2 emit R2 map Mapper 1-2 Serial TSQR A3 emit R3 map Mapper 1-3 Serial TSQR A4 emit R4 map Mapper 1-4 Serial TSQR shuffle S1 A2 reduce Reducer 1-1 Serial TSQR S2 R2,2 reduce Reducer 1-2 Serial TSQR R2,1 emit emit emit shuffle A2S3 R2,3 reduce Reducer 1-3 Serial TSQR emit Iteration 1 Iteration 2 identitymap A2S(2) Rreduce Reducer 2-1 Serial TSQR emit ICML2013 David Gleich · Purdue 23
  • 24. Our robust algorithm to compute R. ICML2013 David Gleich · Purdue 24
  • 25. Full TSQR code in hadoopy In hadoopy import random, numpy, hadoopy class SerialTSQR: def __init__(self,blocksize,isreducer): self.bsize=blocksize self.data = [] if isreducer: self.__call__ = self.reducer else: self.__call__ = self.mapper def compress(self): R = numpy.linalg.qr( numpy.array(self.data),'r') # reset data and re-initialize to R self.data = [] for row in R: self.data.append([float(v) for v in row]) def collect(self,key,value): self.data.append(value) if len(self.data)>self.bsize*len(self.data[0]): self.compress() def close(self): self.compress() for row in self.data: key = random.randint(0,2000000000) yield key, row def mapper(self,key,value): self.collect(key,value) def reducer(self,key,values): for value in values: self.mapper(key,value) if __name__=='__main__': mapper = SerialTSQR(blocksize=3,isreducer=False) reducer = SerialTSQR(blocksize=3,isreducer=True) hadoopy.run(mapper, reducer) David Gleich (Sandia) 13/22MapReduce 2011 25 ICML2013 David Gleich · Purdue
  • 26. Numerical stability was a problem for prior approaches ICML2013 26 Condition number 1020 105 norm(QTQ–I) AR-1 AR-1 + " iterative refinement Direct TSQR Benson, Gleich, " Demmel, Submitted Prior work Constantine & Gleich, MapReduce 2011 Benson, Gleich, Demmel, Submitted Previous methods couldn’t ensure that the matrix Q was orthogonal David Gleich · Purdue
  • 27. Iterative refinement helps ICML2013 David Gleich · Purdue 27 A1 A4 Q1 R-1 Mapper 1 A2 Q2 A3 Q3 A4 Q4 R TSQR DistributeR R-1 R-1 R-1 Iterative refinement is like using Newton’s method to solve Ax = b. There’s a famous quote that “two iterations of iterative refinement are enough” attributed to Parlett; TSQR Q1 A4 Q1 T-1 Mapper 2 Q2 Q2 Q3 Q3 Q4 Q4 T DistributeT T-1 T-1 T-1
  • 28. Numerical stability was a problem for prior approaches ICML2013 28 Condition number 1020 105 norm(QTQ–I) AR-1 AR-1 + " iterative refinement Direct TSQR Benson, Gleich, " Demmel, Submitted Prior work Constantine & Gleich, MapReduce 2011 Benson, Gleich, Demmel, Submitted Previous methods couldn’t ensure that the matrix Q was orthogonal David Gleich · Purdue
  • 29. What if iterative refinement is too slow? ICML2013 David Gleich · Purdue 29 A1 A4 Q1 R-1 Mapper 1 A2 Q2 A3 Q3 A4 Q4 S Sample ComputeQR," distributeR R-1 R-1 R-1 TSQR A1 A4 Q1 T-1 Mapper 2 A2 Q2 A3 Q3 A4 Q4 T DistributeTR T-1 T-1 T-1 Based on recent work by Ipsen et al. on approximating QR with a random subset of rows. Also assumes that you can get a subset of rows “cheaply” – possible, but nontrivial in Hadoop. R-1 R-1 R-1 R-1 Estimate the “norm” by S
  • 30. Orthogonality of the sampled solution ICML2013 30 Condition number 1020 105 norm(QTQ–I) AR-1 Direct TSQR Benson, Gleich, " Demmel, Submitted Prior work Constantine & Gleich, MapReduce 2011 Benson, Gleich, Demmel, Submitted David Gleich · Purdue
  • 31. Four things that are better 1.  A simple algorithm to compute R accurately. (but doesn’t help get Q orthogonal). 2.  Fast algorithm to get Q numerically orthogonal in most cases. 3.  Two pass algorithm to get Q numerically orthogonal in virtually all cases. 4.  A direct algorithm for a numerically orthogonal Q in all cases. ICML2013 David Gleich · Purdue 31 Constantine & Gleich MapReduce 2011 Benson, Gleich & Demmel arXiv 2013
  • 32. Recreate Q by storing the history of the factorization ICML2013 David Gleich · Purdue 32 A1 A4 Q1 R1 Mapper 1 A2 Q2 R2 A3 Q3 R3 A4 Q4 Q1 Q2 Q3 Q4 R1 R2 R3 R4 R4 Qoutput Routput Q11 Q21 Q31 Q41 R Task 2 Q11 Q21 Q31 Q41 Q1 Q2 Q3 Q4 Mapper 3 1. Output local Q and R in separate files 2. Collect R on one node, compute Qs for each piece 3. Distribute the pieces of Q*1 and form the true Q
  • 33. Theoretical lower bound on runtime for a few cases on our small cluster Rows Cols Old R-only + no IR R-only + PIR R-only + IR Direct TSQR 4.0B 4 1803 1821 1821 2343 2525 2.5B 10 1645 1655 1655 2062 2464 0.6B 25 804 812 812 1000 1237 0.5B 50 1240 1250 1250 1517 2103 ICML2013 David Gleich · Purdue 33 All values in seconds Only two params needed – read and write bandwidth for the cluster – in order to derive a performance model of the algorithm. This simple model is almost within a factor of two of the true runtime. " (10-node cluster, 60 disks) Rows Cols Old R-only + no IR R-only + PIR R-only + IR Direct TSQR 4.0B 4 2931 3460 3620 4741 6128 2.5B 10 2508 2509 3354 4034 4035 0.6B 25 1098 1104 1476 2006 1910 0.5B 50 921 1618 1960 2655 3090 Model Actual
  • 34. My application is " big simulation data. ICML2013 David Gleich · Purdue 34
  • 35. ICML2013 David Gleich · Purdue 35 Nonlinear heat transfer model in random media Each run takes 5 hours on 8 processors, we did 8192 runs. 4.2M nodes, 9 time-steps, 128 realizations, 64 size parameters SVD of 5B x 64 data matrix 50x reduction in wall-clock time including all pre and post- processing; “∞x” speedup for a particular study without pre- processing (< 60 sec on laptop). Interpolating adaptive low-rank approximations. Constantine, Gleich," Hou & Templeton arXiv 2013. 1 error 0 2 (a) Error, s = 0.39 cm 1 std 0 2 (b) Std, s = 0.39 cm 10 error 0 20 (c) Error, s = 1.95 cm 10 std 0 20 (d) Std, s = 1.95 cm Fig. 4.5: Error in the reduce order model compared to the prediction standard de- viation for one realization of the bubble locations at the final time for two values of 10 error 0 20 (c) Error, s = 1.95 cm Fig. 4.5: Error in the reduce order model com viation for one realization of the bubble locati the bubble radius, s = 0.39 and s = 1.95 cm version.) the varying conductivity fields took approxima Cubit after substantial optimizations. Working with the simulation data involved interpret 4TB of Exodus II files from Aria, glo TSSVD, and compute predictions and errors. imately 8-15 hours. We collected precise timi it as these times are from a multi-tenant, uno jobs with sizes ranging between 100GB and 2T
  • 36. Dynamic Mode Decomposition One simulation, ~10TB of data, compute the SVD of a space-by-time matrix. ICML2013 David Gleich · Purdue 36 DMD video
  • 37. Is double-precision sufficient? Double-precision floating point was designed for the era where “big” was 1000-10000 s = 0; for i=1 to n: s = s + x[i] A simple summation formula has " error that is not always small if n is a billion Watch out for this issue in important computations ICML2013 David Gleich · Purdue 37 fl(x + y) = (x + y)(1 + ") fl( X i xi ) X i xi  nµ X i |xi | µ ⇡ 10 16
  • 38. Papers Constantine & Gleich, MapReduce 2011 Benson, Gleich & Demmel, arXiv 2013 Constantine & Gleich, ICASSP 2012 Constantine, Gleich, Hou & Templeton, " arXiv 2013 ICML2013 David Gleich · Purdue 38 Code https://github.com/arbenson/mrtsqr https://github.com/dgleich/simform " Questions?