This document discusses using singular value decomposition (SVD) on data from simulations to build an interpolant that can provide fast approximations. It describes storing simulation runs on a MapReduce cluster, performing SVD to distinguish signal from noise, and using the left singular vectors to form a linear combination that serves as the interpolant. This allows interpolating between simulation parameters and running many more simulations than would be possible directly.
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Distinguishing the signal from noise in an SVD of simulation data
1. Distinguishing signal
noise
from noise in an SVD
of simulation data
DAVID F. GLEICH ! PAUL G. CONSTANTINE!
PURDUE UNIVERSITY
STANFORD UNIVERSITY
COMPUTER SCIENCE !
DEPARTMENT
1
David Gleich · Purdue
ICASSP
2. Large scale non-linear, time
dependent heat transfer problem
105 nodes, 103 time steps
30 minutes on 16 cores
~ 1GB
Questions
What is the probability of failure?
Which input values cause failure?
2
David Gleich · Purdue
ICASSP
3. Insight and confidence requires multiple runs
and hits the curse of dimensionality.
The problem
A simulation run is time-consuming!
Our solution
Use “big-data” techniques and platforms.
3
David Gleich · Purdue
ICASSP
4. We store a few runs …
Supercomputer Data computing cluster Engineer
Run 100-1000 Store them on the Run 10000-100000
simulations MapReduce cluster interpolated simulations
for approximate statistics
… and build an interpolant from the
data for computational steering.
4
David Gleich · Purdue
ICASSP
5. The Database
Input " Time history"
Parameters
of simulation
s1 -> f1
s2 -> f2
s
f
"
5-10 of them
“a few gigabytes”
sk -> fk
2 3 A single simulation
q(x1 , t1 , s)
6 .
. 7 at one time step
The simulation 6 6 . 7
7
as a vector
6q(xn , t1 , s)7
6 7
6q(x1 , t2 , s)7 ⇥ ⇤
6 7
f(s) = 6 . 7 X = f(s1 ) f(s2 ) ... f(sp )
6 .
. 7
6 7
6q(xn , t2 , s)7
6 7 The database as a matrix.
6 . 7
4 .
. 5 100GB – 100TB
q(xn , tk , s)
5
David Gleich · Purdue
ICASSP
6. Xi,j = f (xi , sj ) One-dimensional
1 test problem
f (x, s) = log[1 + 4s(x 2 x)]
8s
f(x)
X= f1
f2
f5
x
“plot( X )”
“imagesc(X )”
6
David Gleich · Purdue
ICASSP
7. The interpolant
Motivation!
This idea was inspired by
Let the data give you the basis.
the success of other
⇥ ⇤ reduced order models
X = f(s1 ) f(s2 ) ... f(sp ) like POD; and Paul’s
residual minimizing idea.
Then find the right combination
Xr
f(s) ⇡ uj ↵j (s)
j=1
These are the left singular
vectors from X!
7
David Gleich · Purdue
ICASSP
8. Why the SVD? It splits “space-
time” from “parameters”
treat each right
singular vector
x is the “space-time” index
as samples of
the unknown
r r basis functions
X X
f (xi , sj ) = Ui,` ` Vj,` = u` (xi ) ` v` (sj )
`=1 `=1 split x and s
a general parameter
r p
X X (`)
f (xi , s) = u` (xi ) ` v` (s) v` (s) ⇡ v` (sj ) j (s)
`=1 j=1
Interpolate v any way you wish
… and it has a “smoothness” property.
8
David Gleich · Purdue
ICASSP
9. MapReduce and Interpolation
f1 Interpolation
Sample
f2
Interp.!
f5
The Database New Samples
The Surrogate
s1 -> f1 sa -> fa
s2 -> f2 Use SVD on Form a linear
sb -> fb
MapReduce Just one machine combination of
sk -> fk cluster to get singular vectors s -> f
c c
singular vector
On the MapReduce cluster basis On the MapReduce cluster
ICASSP David Gleich · Purdue 9/18
10. A quiz!
Which section would you rather
try and interpolate, A or B?
A
B
10
David Gleich · Purdue
ICASSP
11. Fig. 1. An example of when the functions v` become d
How predictable is a ! cult to interpolate. Each plot shows a singular-vector f
the example in Section 3, which we interpret as a func
singular vector?
v` (s). While we might have some confidence in an interp
tion of v1 (s) and v2 (s), interpolating v3 (s) for s nearby
problematic, and interpolating v7 (s) anywhere is dubious
Folk Theorem (O’Leary 2011)
v1 v2
1 1
The singular vectors of a matrix of 0 0
“smooth” data become more −1 −1
oscillatory as the index increases.
−1 0 1 −1 0 1
v v
3 7
Implication! 0.5 0.5
The gradient of the singular vectors 0 0
increases as the index increases.
−0.5
−1 0 1
−0.5
−1 0 1
Fig. 2. For reference, we show a finer discretization of
v1 (s), v2 (s), ... , vt (s)
v (s), ... , v (s)
functions above, which shows that interpolating v7 (s) ne
1 is difficult.t+1 r
Predictable signal
Unpredictable noise
Once we have determined the predictable bases, w
11
terpolate them using procedures discussed above to cr
David Gleich · Purdue
ICASSP
the ↵` (s). From the singular values and left singular vec
12. A refined method with !
an error model
Don’t even try to
interpolate the
predictable modes.
t(s) r
X X
f(s) ⇡ uj ↵j (s) + uj j ⌘j
j=1 Predictable
j=t(s)+1 Unpredictable
⌘j ⇠ N(0, 1)
0 1
r
X
2 TA
Variance[f] = diag @ j uj uj
j=t(s)+1
But now, how to choose t(s)?
12
David Gleich · Purdue
ICASSP
13. Our current approach to
choosing the predictability
v1 v2
t(s) is the largest 𝜏 such that
1 1
0 0
X⌧
1 @vi −1
−1 0
−1
1 −1 0 1
i v3 v7
1 @s 1 1
i=1
0 0
< threshold −1 −1
−1 0 1 −1 0 1
Better ideas? Come talk to me!
We can use more black v` becom
Fig. 1. An example of when the functions
gradients than red gradients,
cult to interpolate. Each will be higher singular-vecto
so error plot shows a for red.
the example in Section 3, which we interpret as a fu
13
v` (s). While we might have some confidence in an int
tion of vDavidand v2 (s), interpolating v3 (s) for s nearb
1 (s)
Gleich · Purdue
ICASSP
14. An experimental test case
A heat equation
problem
Two parameters
that control the
material properties
14
David Gleich · Purdue
ICASSP
15. Where the error is the worst
Error
Our Reduced Order Model
10-2
10-3
Histogram of errors
The Truth
15
Error
10-3
10-2
David Gleich · Purdue
ICASSP
16. A Large Scale Example
Nonlinear heat transfer model
80k nodes, 300 time-steps
104 basis runs
SVD of 24m x 104 data matrix
500x reduction in wall clock time
(100x including the SVD)
16
David Gleich · Purdue
ICASSP
17. SVD from QR: R-SVD
Old algorithm …
Let A = QR
T
then A= QUR ⌃R VR
… helps when A is tall and skinny.
17
David Gleich · Purdue
ICASSP
18. Intro to MapReduce
Originated at Google for indexing web Data scalable
pages and computing PageRank.
Maps
M M
1
2
1
M
The idea Bring the Reduce
2
M M M
computations to the data.
R 3
4
3
M
R
M M
Express algorithms in "
4
5
5
M Shuffle
data-local operations.
Fault-tolerance by design
Implement one type of Input stored in triplicate
communication: shuffle.
M
Reduce input/"
output on disk
M
Shuffle moves all data with M
R
the same key to the same M R
reducer.
Map output"
persisted to disk"
18
before shuffle
David Gleich · Purdue
ICASSP
19. MapReduceTSQR summary
MapReduce is great for TSQR!
Data A tall and skinny (TS) matrix by rows
Map QR factorization of local rows Demmel et al. showed that
this construction works to
Reduce QR factorization of local rows compute a QR factorization
with minimal communication
Input 500,000,000-by-100 matrix
Each record 1-by-100 row
HDFS Size 423.3 GB
Time to compute (the norm of each column) 161 sec.
Time to compute in qr( ) 387 sec.
19
On a 64-node Hadoop cluster with · Purdue
David Gleich 4x2TB, one Core i7-920,ICASSP
12GB RAM/node
20. Key Limitations
Computes only R and not Q
Can get Q via Q = AR+ with another MR iteration. "
(we currently use this for computing the SVD)
Not numerically orthogonal; iterative refinement helps.
We are working on better ways to compute Q"
(with Austin Benson, Jim Demmel)
20
David Gleich · Purdue
ICASSP
21. Our vision!
To enable analysts
and engineers to
hypothesize from " Paul G. Constantine "
data computations
instead of expensive
HPC computations.
21
David Gleich · Purdue
ICASSP