This document discusses using singular value decomposition (SVD) on data from simulations to build an interpolant that can provide fast approximations. It describes storing simulation runs on a MapReduce cluster, performing SVD to distinguish signal from noise, and using the left singular vectors to form a linear combination that serves as the interpolant. This allows interpolating between simulation parameters and running many more simulations than would be possible directly.
Extensible Python: Robustness through Addition - PyCon 2024
Distinguishing the signal from noise in an SVD of simulation data
1. Distinguishing signal
noise
from noise in an SVD
of simulation data
DAVID F. GLEICH ! PAUL G. CONSTANTINE!
PURDUE UNIVERSITY
STANFORD UNIVERSITY
COMPUTER SCIENCE !
DEPARTMENT
1
David Gleich · Purdue
ICASSP
2. Large scale non-linear, time
dependent heat transfer problem
105 nodes, 103 time steps
30 minutes on 16 cores
~ 1GB
Questions
What is the probability of failure?
Which input values cause failure?
2
David Gleich · Purdue
ICASSP
3. Insight and confidence requires multiple runs
and hits the curse of dimensionality.
The problem
A simulation run is time-consuming!
Our solution
Use “big-data” techniques and platforms.
3
David Gleich · Purdue
ICASSP
4. We store a few runs …
Supercomputer Data computing cluster Engineer
Run 100-1000 Store them on the Run 10000-100000
simulations MapReduce cluster interpolated simulations
for approximate statistics
… and build an interpolant from the
data for computational steering.
4
David Gleich · Purdue
ICASSP
5. The Database
Input " Time history"
Parameters
of simulation
s1 -> f1
s2 -> f2
s
f
"
5-10 of them
“a few gigabytes”
sk -> fk
2 3 A single simulation
q(x1 , t1 , s)
6 .
. 7 at one time step
The simulation 6 6 . 7
7
as a vector
6q(xn , t1 , s)7
6 7
6q(x1 , t2 , s)7 ⇥ ⇤
6 7
f(s) = 6 . 7 X = f(s1 ) f(s2 ) ... f(sp )
6 .
. 7
6 7
6q(xn , t2 , s)7
6 7 The database as a matrix.
6 . 7
4 .
. 5 100GB – 100TB
q(xn , tk , s)
5
David Gleich · Purdue
ICASSP
6. Xi,j = f (xi , sj ) One-dimensional
1 test problem
f (x, s) = log[1 + 4s(x 2 x)]
8s
f(x)
X= f1
f2
f5
x
“plot( X )”
“imagesc(X )”
6
David Gleich · Purdue
ICASSP
7. The interpolant
Motivation!
This idea was inspired by
Let the data give you the basis.
the success of other
⇥ ⇤ reduced order models
X = f(s1 ) f(s2 ) ... f(sp ) like POD; and Paul’s
residual minimizing idea.
Then find the right combination
Xr
f(s) ⇡ uj ↵j (s)
j=1
These are the left singular
vectors from X!
7
David Gleich · Purdue
ICASSP
8. Why the SVD? It splits “space-
time” from “parameters”
treat each right
singular vector
x is the “space-time” index
as samples of
the unknown
r r basis functions
X X
f (xi , sj ) = Ui,` ` Vj,` = u` (xi ) ` v` (sj )
`=1 `=1 split x and s
a general parameter
r p
X X (`)
f (xi , s) = u` (xi ) ` v` (s) v` (s) ⇡ v` (sj ) j (s)
`=1 j=1
Interpolate v any way you wish
… and it has a “smoothness” property.
8
David Gleich · Purdue
ICASSP
9. MapReduce and Interpolation
f1 Interpolation
Sample
f2
Interp.!
f5
The Database New Samples
The Surrogate
s1 -> f1 sa -> fa
s2 -> f2 Use SVD on Form a linear
sb -> fb
MapReduce Just one machine combination of
sk -> fk cluster to get singular vectors s -> f
c c
singular vector
On the MapReduce cluster basis On the MapReduce cluster
ICASSP David Gleich · Purdue 9/18
10. A quiz!
Which section would you rather
try and interpolate, A or B?
A
B
10
David Gleich · Purdue
ICASSP
11. Fig. 1. An example of when the functions v` become d
How predictable is a ! cult to interpolate. Each plot shows a singular-vector f
the example in Section 3, which we interpret as a func
singular vector?
v` (s). While we might have some confidence in an interp
tion of v1 (s) and v2 (s), interpolating v3 (s) for s nearby
problematic, and interpolating v7 (s) anywhere is dubious
Folk Theorem (O’Leary 2011)
v1 v2
1 1
The singular vectors of a matrix of 0 0
“smooth” data become more −1 −1
oscillatory as the index increases.
−1 0 1 −1 0 1
v v
3 7
Implication! 0.5 0.5
The gradient of the singular vectors 0 0
increases as the index increases.
−0.5
−1 0 1
−0.5
−1 0 1
Fig. 2. For reference, we show a finer discretization of
v1 (s), v2 (s), ... , vt (s)
v (s), ... , v (s)
functions above, which shows that interpolating v7 (s) ne
1 is difficult.t+1 r
Predictable signal
Unpredictable noise
Once we have determined the predictable bases, w
11
terpolate them using procedures discussed above to cr
David Gleich · Purdue
ICASSP
the ↵` (s). From the singular values and left singular vec
12. A refined method with !
an error model
Don’t even try to
interpolate the
predictable modes.
t(s) r
X X
f(s) ⇡ uj ↵j (s) + uj j ⌘j
j=1 Predictable
j=t(s)+1 Unpredictable
⌘j ⇠ N(0, 1)
0 1
r
X
2 TA
Variance[f] = diag @ j uj uj
j=t(s)+1
But now, how to choose t(s)?
12
David Gleich · Purdue
ICASSP
13. Our current approach to
choosing the predictability
v1 v2
t(s) is the largest 𝜏 such that
1 1
0 0
X⌧
1 @vi −1
−1 0
−1
1 −1 0 1
i v3 v7
1 @s 1 1
i=1
0 0
< threshold −1 −1
−1 0 1 −1 0 1
Better ideas? Come talk to me!
We can use more black v` becom
Fig. 1. An example of when the functions
gradients than red gradients,
cult to interpolate. Each will be higher singular-vecto
so error plot shows a for red.
the example in Section 3, which we interpret as a fu
13
v` (s). While we might have some confidence in an int
tion of vDavidand v2 (s), interpolating v3 (s) for s nearb
1 (s)
Gleich · Purdue
ICASSP
14. An experimental test case
A heat equation
problem
Two parameters
that control the
material properties
14
David Gleich · Purdue
ICASSP
15. Where the error is the worst
Error
Our Reduced Order Model
10-2
10-3
Histogram of errors
The Truth
15
Error
10-3
10-2
David Gleich · Purdue
ICASSP
16. A Large Scale Example
Nonlinear heat transfer model
80k nodes, 300 time-steps
104 basis runs
SVD of 24m x 104 data matrix
500x reduction in wall clock time
(100x including the SVD)
16
David Gleich · Purdue
ICASSP
17. SVD from QR: R-SVD
Old algorithm …
Let A = QR
T
then A= QUR ⌃R VR
… helps when A is tall and skinny.
17
David Gleich · Purdue
ICASSP
18. Intro to MapReduce
Originated at Google for indexing web Data scalable
pages and computing PageRank.
Maps
M M
1
2
1
M
The idea Bring the Reduce
2
M M M
computations to the data.
R 3
4
3
M
R
M M
Express algorithms in "
4
5
5
M Shuffle
data-local operations.
Fault-tolerance by design
Implement one type of Input stored in triplicate
communication: shuffle.
M
Reduce input/"
output on disk
M
Shuffle moves all data with M
R
the same key to the same M R
reducer.
Map output"
persisted to disk"
18
before shuffle
David Gleich · Purdue
ICASSP
19. MapReduceTSQR summary
MapReduce is great for TSQR!
Data A tall and skinny (TS) matrix by rows
Map QR factorization of local rows Demmel et al. showed that
this construction works to
Reduce QR factorization of local rows compute a QR factorization
with minimal communication
Input 500,000,000-by-100 matrix
Each record 1-by-100 row
HDFS Size 423.3 GB
Time to compute (the norm of each column) 161 sec.
Time to compute in qr( ) 387 sec.
19
On a 64-node Hadoop cluster with · Purdue
David Gleich 4x2TB, one Core i7-920,ICASSP
12GB RAM/node
20. Key Limitations
Computes only R and not Q
Can get Q via Q = AR+ with another MR iteration. "
(we currently use this for computing the SVD)
Not numerically orthogonal; iterative refinement helps.
We are working on better ways to compute Q"
(with Austin Benson, Jim Demmel)
20
David Gleich · Purdue
ICASSP
21. Our vision!
To enable analysts
and engineers to
hypothesize from " Paul G. Constantine "
data computations
instead of expensive
HPC computations.
21
David Gleich · Purdue
ICASSP