SlideShare ist ein Scribd-Unternehmen logo
1 von 46
Downloaden Sie, um offline zu lesen
Simulation Informatics!
Analyzing Large Datasets
from Scientific Simulations

DAVID F. GLEICH !     PAUL G. CONSTANTINE!
 PURDUE UNIVERSITY
     STANFORD UNIVERSITY
COMPUTER SCIENCE !    JOE RUTHRUFF!
 DEPARTMENT
            & JEREMY TEMPLETON !
                        SANDIA NATIONAL LABS





                                                                   1
                           David Gleich · Purdue 
 CS&E Seminar
This talk is a story …




                                                          2
                  David Gleich · Purdue 
 CS&E Seminar
How I learned to stop
worrying and love the
simulation!




                                                        3
                David Gleich · Purdue 
 CS&E Seminar
I asked …!
Can we do UQ on
PageRank?




                                                     4
             David Gleich · Purdue 
 CS&E Seminar
PageRank by Google
 Google’s PageRank
 PageRank by Google
                  3
                       3
                                    The Model
    2                       5       1.The Model uniformly with
                                       follow edges
         2
                  4
                                5     1. follow edges uniformly with
                                       probability , and
                       4
                                    2. randomly jump, with probability
                                         probability   and
    1                       6
                                      2. randomlyassume everywhere is
                                       1    , we’ll jump with probability
         1                      6      equally, likely assume everywhere is
                                         1       we’ll
                                         equally likely



                                     The places we find the
                                       The places we find the
                                     surfer most often are im-
                                     portant pages. often are im-
                                       surfer most
                                       portant pages.




                                                                                                  5
 David F. Gleich (Sandia)            PageRank intro
                                                      David Gleich · Purdue 
 CS&E Seminar
/ 36
                                                                                    Purdue 5
h sensitivity?
 alpha alpha PageRank
 PageRa
         PageRank
    RandomPageRank
 dom alpha
  Random         alpha
       RAPr
   or PageRank meets UQ

               (            P)x = (1                      )v
 s the random variables as the random variables
        Model PageRank
 ageRank as the random variables
 y to the links : examined and understoo
            x(A)                    x(A)
                     x(A)
        and look at
k E [x(A)] and Std [x(A)] .
  at
                           E [x(A)] and Std [x(A)] .
y to the E [x(A)]: and Std [x(A)] .understood,
            jump examined,
    Explored in Constantine and Gleich, WAW2007; and "
    Constantine and Gleich, J. Internet Mathematics 2011.




                                                                                      6
                                              David Gleich · Purdue 
 CS&E Seminar
Random alpha PageRank has
   Convergence theory
a rigorous convergence theory.
   Method                        Conv. Work Required                          What is N?
                                  1                                           number of
   Monte Carlo                   p       N PageRank systems
                                   N                                          samples from A
   Path Damping
                                 r N+2   N + 1 matrix vector                  terms of
   (without
                                 N1+     products                             Neumann series
   Std [x(A)])
                                                                              number of
   Gaussian
                                 r 2N    N PageRank systems                   quadrature
   Quadrature
                                                                              points


                            and r are parameters from Bet ( , b, , r)




                                                                                                         7
      David F. Gleich (Sandia)                         David
                                             Random sensitivity   Gleich · Purdue 
 CS&E Seminar
 / 36
                                                                                          Purdue 27
Working with
PageRank showed us
how to treat UQ more
generally …




                                                   8
           David Gleich · Purdue 
 CS&E Seminar
Constantine, Gleich, and Iaccarino.
We studied           Spectral Methods for Parameterized
                     Matrix Equations, SIMAX, 2010.
parameterized            

                     A(s)x(s) = b(s)
                     
matrices.
           
                     
 , A(J 1 )x(J 1 ) = b(J 1 )
                     
      ) A(J N )x(J N ) = b(J N ) or
   Parameterized     
      Solution
      
      ) AN (J 1 )xN (J 1 ) = bN (J 1 )
                     
                     Constantine, Gleich, and Iaccarino. A
A(s)x(s) = b(s)      factorization of the spectral Galerkin
                     system for parameterized matrix
                     equations: derivation and applications,
                     SISC 2011.
                     
                     How to compute the Galerkin solution
   Discretized PDE   in a weakly intrusive manner.!
     with explicit
     parameters




                                                                  9
                          David Gleich · Purdue 
 CS&E Seminar
Simulation!
The Third Pillar of Science
21st Century Science in a nutshell!
    Experiments are not practical or feasible.
    Simulate things instead.
But do we trust the simulations?!

We’re trying!
    Model Fidelity
    Verification & Validation (V&V)
    Uncertainty Quantification (UQ)




                                                                            10

                                   David Gleich · Purdue 
 CS&E Seminar
The message
Insight and confidence
requires multiple runs.




                                                          11
                  David Gleich · Purdue 
 CS&E Seminar
The problem
A simulation run ain’t cheap!




                                                         12
                 David Gleich · Purdue 
 CS&E Seminar
Another problem
It’s very hard to “modify”
current codes.




                                                          13
                  David Gleich · Purdue 
 CS&E Seminar
Large scale nonlinear, time
dependent heat transfer problem
                    105 nodes
                    103 time steps
                    30 minutes on 16 cores

                    
                    Questions
                    What is the probability of failure? 
                    Which input values cause failure?




                                                                14
                        David Gleich · Purdue 
 CS&E Seminar
It’s time to ask "
What can science
learn from Google?"
"
                      
- Wired Magazine (2008)




                                                                  15
                          David Gleich · Purdue 
 CS&E Seminar
We can throw the numbers
                              21.1st Century Science
into the biggest computing    in a nutshell?
clusters the world has ever       Simulations are "
seen and let statistical          too expensive.
algorithms find patterns
                                  Let data provide a
where science cannot.
            surrogate.
- Wired (again)
              




                                                                      16/18
                              David Gleich · Purdue 
 CS&E Seminar
Our approach!
Construct an interpolating
reduced order model from a
budget-constrained ensemble of
runs for uncertainty and
optimization studies.




                                                         17
                 David Gleich · Purdue 
 CS&E Seminar
That is, we store the runs
 Supercomputer            Data computing cluster         Engineer




Each multi-day HPC     A data cluster can         … enabling engineers to query
simulation generates   hold hundreds or thousands and analyze months of simulation
gigabytes of data.     of old simulations …       data for statistical studies and
                                                  uncertainty quantification.

                       and build the interpolant from
                             the pre-computed data.




                                                                                  18
                                          David Gleich · Purdue 
 CS&E Seminar
The Database

       Input "                                                     Time history"          s1 -> f1
    Parameters
                                                    of simulation
         s2 -> f2
                              s
                                         f
                  
                                                                                          sk -> fk

                                     2                 3 A single simulation
The simulation as a vector




                                        q(x1 , t1 , s)
                                      6       .
                                              .        7 at one time step
                                      6       .        7
                                      6                7
                                      6q(xn , t1 , s)7
                                      6                7
                                      6q(x1 , t2 , s)7
                                      6                7                   ⇥                         ⇤
                               f(s) = 6       .        7
                                      6
                                      6
                                              .
                                              .        7
                                                       7             X = f(s1 ) f(s2 ) ... f(sp )
                                      6q(xn , t2 , s)7
                                      6                7
                                      6       .        7                   The database as a matrix
                                      4       .
                                              .        5
                                         q(xn , tk , s)




                                                                                                          19
                                                                  David Gleich · Purdue 
 CS&E Seminar
The interpolant

Motivation!
                                               This idea was inspired by
Let the data give you the basis.
              the success of other
         ⇥                            ⇤        reduced order models

    X = f(s1 ) f(s2 ) ... f(sp )              like POD; and Paul’s
                                               residual minimizing idea.
Then find the right combination
            Xr

     f(s) ⇡     uj ↵j (s)

           j=1

                     These are the left singular
                     vectors from X!




                                                                         20
                                 David Gleich · Purdue 
 CS&E Seminar
Why the SVD?!
 Let’s study a simple case.
    2                                                                3
        g(x1 , s1 )    g(x1 , s2 )     ···              g(x1 , sp )
  6                       ..           ..                   .
                                                            .        7
  6 g(x2 , s1 )              .            .                 .        7
X=6
  6     .
                                                                     7
                                                                     7
  4     .                 ..           ..
        .                    .            .            g(xm 1 , sp )5     treat each right
    g(xm , s1 )                    g(xm , sp                              singular vector
                          ···                     1)    g(xm , sp ).
                                                                          as samples of
  = U⌃VT ,                                                                the unknown
                 r
                 X                          r
                                            X                             basis functions
g(xi , sj ) =           Ui,`   ` Vj,`   =         u` (xi ) ` v` (sj )
                 `=1                        `=1                           split x and s
   a general parameter
               r                                         p
             X                                           X                (`)
g(xi , s) =           u` (xi ) ` v` (s) v` (s) ⇡               v` (sj )   j (s)
               `=1                                       j=1
                       Interpolate v any way you wish




                                                                                           21
                                               David Gleich · Purdue 
 CS&E Seminar
Method summary



Compute SVD of X!
Compute interpolant of right singular vectors
Approximate a new value of f(s)!




                                                                    22
                            David Gleich · Purdue 
 CS&E Seminar
A quiz!
Which section would you rather
try and interpolate, A or B?




          A
          B




                                                             23
                     David Gleich · Purdue 
 CS&E Seminar
How predictable is a !
singular vector?
Folk Theorem (O’Leary 2011)
The singular vectors of a matrix of “smooth” data
become more oscillatory as the index increases.
Implication!
The gradient of the singular vectors increases as
the index increases. 

v1 (s), v2 (s), ... , vt (s)

                                   vt+1 (s), ... , vr (s)
        Predictable
                         Unpredictable





                                                                       24
                               David Gleich · Purdue 
 CS&E Seminar
A refined method with !
an error model
                                Don’t even try to
                                               interpolate the
                                               predictable modes.
         t(s)                            r
         X                               X
f(s) ⇡          uj ↵j (s)       +                   uj j ⌘j
         j=1     Predictable
          j=t(s)+1         Unpredictable
                                                       ⌘j ⇠ N(0, 1)
                                0                           1
                                     r
                                     X
                                                        TA
  Variance[f] = diag @                            j uj uj
                                    j=t(s)+1

           But now, how to choose t(s)?




                                                                         25
                                David Gleich · Purdue 
 CS&E Seminar
Our current approach to
choosing the predictability

  t(s) is the largest ������ such that
        ⌧
        X
      1              @vi
                 i           < threshold
       1             @s
           i=1




                                                                       26
                               David Gleich · Purdue 
 CS&E Seminar
An experimental test case

                                A heat equation
                                problem
                                
                                Two parameters
                                that control the
                                material properties




                                                             27
                     David Gleich · Purdue 
 CS&E Seminar
Experiments




 20 point, Latin hypercube sample




                                                                             28
                                     David Gleich · Purdue 
 CS&E Seminar
Our Reduced Order Model



Where the error is the worst




                                The Truth




                                                                             29
                                     David Gleich · Purdue 
 CS&E Seminar
A Large Scale Example




Nonlinear heat transfer model
80k nodes, 300 time-steps
104 basis runs
SVD of 24m x 104 data matrix
 500x reduction in wall clock time
(100x including the SVD)




                                                                               30
                                       David Gleich · Purdue 
 CS&E Seminar
PART 2!





Tall-and-skinny
QR (and SVD)!
on MapReduce


                                                  31
          David Gleich · Purdue 
 CS&E Seminar
Quick review of QR
 QR Factorization
Let                              , real                         Using QR for regression

                                                                              is given by
                                                                the solution of   

                                                                QR is block normalization
   is                   orthogonal (              )             “normalize” a vector
                                                                usually generalizes to
                                                                computing    in the QR
   is                   upper triangular.




                                                                  0
                                   A        =         Q
                                                                      R




                                                                                                32
David Gleich (Sandia)                                David
                                          MapReduce 2011     Gleich · Purdue 
 CS&E Seminar
                                                                                         4/22
Intro to MapReduce
Originated at Google for indexing web   Data scalable
pages and computing PageRank.
                Maps
                        M         M
                                                                           1
        2
                                        1
     M
The idea Bring the                                  Reduce
                                        2
     M                           M         M
computations to the data.
                            R                    3
        4
                                        3
     M
                                                      R
                                               M                                M
Express algorithms in "
                                        4
                                                                                5
                                        5
     M Shuffle
data-local operations.
                                        Fault-tolerance by design
Implement one type of                        Input stored in triplicate
communication: shuffle.
                                 M
                                                                    Reduce input/"
                                                                    output on disk
                                                        M
Shuffle moves all data with                              M
                                                                 R

the same key to the same                                M        R

reducer.
                                                   Map output"
                                                            persisted to disk"




                                                                                          33
                                                            before shuffle
                                         David Gleich · Purdue 
 CS&E Seminar
Mesh point variance in MapReduce
          Run 1
                Run 2
                         Run 3


T=1
   T=2
    T=3
   T=1
   T=2
    T=3
      T=1
     T=2
       T=3




                                                                             34
                                     David Gleich · Purdue 
 CS&E Seminar
Mesh point variance in MapReduce
             Run 1
                 Run 2
                        Run 3


 T=1
     T=2
     T=3
    T=1
   T=2
    T=3
      T=1
     T=2
       T=3
            M
                       M
                         M


1. Each mapper out-                                           2. Shuffle moves all
puts the mesh points                                          values from the same
with the same key.
                                           mesh point to the
                          R
                        R
        same reducer.


  3. Reducers just
  compute a numerical
  variance.
                                                 Bring the computations
                                                 to the data!




                                                                                  35
                                          David Gleich · Purdue 
 CS&E Seminar
Communication avoiding QR
Communication avoiding TSQR
 (Demmel et al. 2008)



 First, do QR                                        Second, compute
 factorizations                                      a QR factorization
 of each local                                       of the new “R”
 matrix   




                                                                                36
                  Demmel et al.David Communicating avoiding CS&E and sequential QR.
                               2008. Gleich · Purdue 
 parallel Seminar
Serial QR factorizations!
Fully serialet al. 2008)
  (Demmel TSQR

                   Compute QR of    ,
                   read    , update QR, …




                                                                                            37
                   Demmel et al. 2008. Communicating avoiding
parallel and sequential QR.
                                   David Gleich · Purdue CS&E Seminar
Tall-and-skinnymatrix storage
MapReduce matrix
storage in MapReduce
  
                                                                      A1

Key is an arbitrary row-id
Value is the       array for                                          A2
  a row.

                                                                      A3
Each submatrix          is an
  input split.
                                                                      A4




                                                                              38
David Gleich (Sandia)           MapReduce 2011                                 10/2
                                      David Gleich · Purdue 
 CS&E Seminar
Algorithm
                                             Data Rows of a matrix
              A1   A1                        Map QR factorization of rows
                   A2
                        qr                   Reduce QR factorization of rows
              A2             Q2   R2
Mapper 1                                qr
Serial TSQR   A3                  A3          Q3    R3
                                                    A4   qr             emit
              A4                                              Q4   R4

              A5   A5
                        qr
              A6   A6        Q6   R6
Mapper 2                                qr
Serial TSQR   A7                  A7          Q7    R7

                                                    A8   qr             emit
              A8                                              Q8   R8


              R4   R4
Reducer 1
Serial TSQR             qr             emit
              R8   R8        Q    R




                                                                                      39
                                              David Gleich · Purdue 
 CS&E Seminar
Key Limitations
Computes only R and not Q

Can get Q via Q = AR+ with another MR iteration. "
  (we currently use this for computing the SVD) 
Dubious numerical stability; iterative refinement helps.

Working on better ways to compute Q "
(with Austin Benson, Jim Demmel)




                                                                     40
                             David Gleich · Purdue 
 CS&E Seminar
In hadoopy
  Full code in hadoopy
import random, numpy, hadoopy                            def close(self):
class SerialTSQR:                                          self.compress()
 def __init__(self,blocksize,isreducer):                   for row in self.data:
                                                            key = random.randint(0,2000000000)
   self.bsize=blocksize                                     yield key, row
   self.data = []
   if isreducer: self.__call__ = self.reducer             def mapper(self,key,value):
   else: self.__call__ = self.mapper                       self.collect(key,value)

                                                          def reducer(self,key,values):
 def compress(self):                                       for value in values: self.mapper(key,value)
  R = numpy.linalg.qr(
         numpy.array(self.data),'r')                     if __name__=='__main__':
  # reset data and re-initialize to R                      mapper = SerialTSQR(blocksize=3,isreducer=False)
  self.data = []                                           reducer = SerialTSQR(blocksize=3,isreducer=True)
  for row in R:                                            hadoopy.run(mapper, reducer)
   self.data.append([float(v) for v in row])

 def collect(self,key,value):
  self.data.append(value)
  if len(self.data)>self.bsize*len(self.data[0]):
    self.compress()




                                                                                                              41
  David Gleich (Sandia)                             MapReduce 2011                                       13/22
                                                             David Gleich · Purdue 
 CS&E Seminar
Lots many maps? an iteration.
Too of data? Add Add an iteration!
                   map           emit                          reduce            emit                                      reduce        emit
                          R1                                              R2,1                                                      R
             A1     Mapper 1-1
                                                          S1    Reducer 1-1
                                                                                                                    S(2)
                                                                                                                    A2     Reducer 2-1
                   Serial TSQR                                  Serial TSQR                                                Serial TSQR




                                                                                                       shuffle
                                                                                        identity map
                   map           emit                          reduce            emit
                          R2                                              R2,2
             A2     Mapper 1-2
                                                  S(1)    A2
                                                          S     Reducer 1-2
                                        shuffle
                   Serial TSQR                                  Serial TSQR

  A
                   map           emit                          reduce            emit
                          R3                                              R2,3
             A3     Mapper 1-3
                                                          A2
                                                          S3    Reducer 1-3
                   Serial TSQR                                  Serial TSQR


                   map           emit
                          R4
             A3
              4     Mapper 1-4
                   Serial TSQR



                                 Iteration 1                                                                     Iteration 2




                                                                                                                                                 42
David Gleich (Sandia)                                    MapReduce 2011                                                                  14/22
                                                                          David Gleich · Purdue 
 CS&E Seminar
mrtsqr – of parameters
parameters
Summary summary of
Blocksize How many rows to
                                                              A1            A1
  read before computing a QR
                                                                                           qr
  factorization, expressed as a                               A2            A2                  Q2
  multiple of the number of
  columns (See paper)
                                                                     map             emit
                                                                            R1
Splitsize The size of each local                             A1       Mapper 1-1
  matrix                                                             Serial TSQR




Reduction tree




                                                                                   (Red)
                                                                         S(2)
  The number of


                                                                 (Red)
                                                 (Red)    S(2)
                                shuffle



  reducers and                            S(1)
                          A
  iterations to use

                              Iteration 1                   Iter 2         Iter 3




                                                                                                     43
David Gleich (Sandia)         MapReduce 2011
                                         David                                         15/22
                                                         Gleich · Purdue 
 CS&E Seminar
Varying splitsize and the tree
Data
 Varying splitsize Synthetic
 Cols.   Iters.   Split   Maps   Secs.   Increasing split size
                  (MB)                      improves performance
 50      1        64      8000   388        (accounts for Hadoop
 –       –        256     2000   184        data movement)
 –       –        512     1000   149

 –       2        64      8000   425     Increasing iterations helps
 –       –        256     2000   220        for problems with many
                                            columns.
 –       –        512     1000   191

 1000 1           512     1000   666     (1000 columns with 64-MB
                                           split size overloaded the
 –       2        64      6000   590
                                           single reducer.)
 –       –        256     2000   432
 –       –        512     1000   337




                                                                                 44
                                         David Gleich · Purdue 
 CS&E Seminar
MapReduceTSQR summary
 MapReduce is great for TSQR!
Data A tall and skinny (TS) matrix by rows

Map QR factorization of local rows                       Demmel et al. showed that
                                                         this construction works to
Reduce QR factorization of local rows                    compute a QR factorization
                                                         with minimal communication
Input 500,000,000-by-100 matrix
Each record 1-by-100 row
HDFS Size 423.3 GB
Time to compute        (the norm of each column) 161 sec.
Time to compute    in qr(   ) 387 sec.




                                                                                        45
                         On a 64-node Hadoop cluster with · Purdue 
 CS&E Seminar
                                        David Gleich 4x2TB, one Core i7-920, 12GB RAM/node
Our vision!
To enable analysts
and engineers to
hypothesize from "               Paul G. Constantine "
                                          

data computations                      Sandia!
                                  Jeremy Templeton
                                     Joe Ruthruff
instead of expensive                      
                                   … and you ? …
HPC computations.




                                                          46
                  David Gleich · Purdue 
 CS&E Seminar

Weitere ähnliche Inhalte

Ähnlich wie Simulation Informatics; Analyzing Large Scientific Datasets

Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph miningDavid Gleich
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreDavid Gleich
 
Two numerical graph algorithms
Two numerical graph algorithmsTwo numerical graph algorithms
Two numerical graph algorithmsDavid Gleich
 
Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...
Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...
Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...Yury Lifshits
 
The spectre of the spectrum
The spectre of the spectrumThe spectre of the spectrum
The spectre of the spectrumDavid Gleich
 
Direct tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDirect tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDavid Gleich
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
 
A dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationA dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationDavid Gleich
 
Massive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph AlgorithmsMassive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph AlgorithmsDavid Gleich
 
Tall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesTall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesDavid Gleich
 
Analysis of covariation and correlation
Analysis of covariation and correlationAnalysis of covariation and correlation
Analysis of covariation and correlationPaul Gardner
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential David Gleich
 
Overlapping clusters for distributed computation
Overlapping clusters for distributed computationOverlapping clusters for distributed computation
Overlapping clusters for distributed computationDavid Gleich
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresDavid Gleich
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
 
Fast Katz and Commuters
Fast Katz and CommutersFast Katz and Commuters
Fast Katz and CommutersDavid Gleich
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)Masahiro Suzuki
 

Ähnlich wie Simulation Informatics; Analyzing Large Scientific Datasets (20)

Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and more
 
Two numerical graph algorithms
Two numerical graph algorithmsTwo numerical graph algorithms
Two numerical graph algorithms
 
Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...
Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...
Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...
 
The spectre of the spectrum
The spectre of the spectrumThe spectre of the spectrum
The spectre of the spectrum
 
Direct tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDirect tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architectures
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 
A dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationA dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportation
 
Massive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph AlgorithmsMassive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph Algorithms
 
Tall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesTall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architectures
 
ISMVL12
ISMVL12ISMVL12
ISMVL12
 
Analysis of covariation and correlation
Analysis of covariation and correlationAnalysis of covariation and correlation
Analysis of covariation and correlation
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
Overlapping clusters for distributed computation
Overlapping clusters for distributed computationOverlapping clusters for distributed computation
Overlapping clusters for distributed computation
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Godunov-SPH
Godunov-SPHGodunov-SPH
Godunov-SPH
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
Fast Katz and Commuters
Fast Katz and CommutersFast Katz and Commuters
Fast Katz and Commuters
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
 

Mehr von David Gleich

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisDavid Gleich
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksDavid Gleich
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresDavid Gleich
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networksDavid Gleich
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisDavid Gleich
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansDavid Gleich
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsDavid Gleich
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structuresDavid Gleich
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphsDavid Gleich
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksDavid Gleich
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detectionDavid Gleich
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceDavid Gleich
 
How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...David Gleich
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduceDavid Gleich
 
The power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsThe power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsDavid Gleich
 
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...David Gleich
 

Mehr von David Gleich (20)

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based Learning
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chains
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduce
 
How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduce
 
The power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsThe power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulants
 
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
 

Kürzlich hochgeladen

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Kürzlich hochgeladen (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Simulation Informatics; Analyzing Large Scientific Datasets

  • 1. Simulation Informatics! Analyzing Large Datasets from Scientific Simulations DAVID F. GLEICH ! PAUL G. CONSTANTINE! PURDUE UNIVERSITY STANFORD UNIVERSITY COMPUTER SCIENCE ! JOE RUTHRUFF! DEPARTMENT & JEREMY TEMPLETON ! SANDIA NATIONAL LABS 1 David Gleich · Purdue CS&E Seminar
  • 2. This talk is a story … 2 David Gleich · Purdue CS&E Seminar
  • 3. How I learned to stop worrying and love the simulation! 3 David Gleich · Purdue CS&E Seminar
  • 4. I asked …! Can we do UQ on PageRank? 4 David Gleich · Purdue CS&E Seminar
  • 5. PageRank by Google Google’s PageRank PageRank by Google 3 3 The Model 2 5 1.The Model uniformly with follow edges 2 4 5 1. follow edges uniformly with probability , and 4 2. randomly jump, with probability probability and 1 6 2. randomlyassume everywhere is 1 , we’ll jump with probability 1 6 equally, likely assume everywhere is 1 we’ll equally likely The places we find the The places we find the surfer most often are im- portant pages. often are im- surfer most portant pages. 5 David F. Gleich (Sandia) PageRank intro David Gleich · Purdue CS&E Seminar / 36 Purdue 5
  • 6. h sensitivity? alpha alpha PageRank PageRa PageRank RandomPageRank dom alpha Random alpha RAPr or PageRank meets UQ ( P)x = (1 )v s the random variables as the random variables Model PageRank ageRank as the random variables y to the links : examined and understoo x(A) x(A) x(A) and look at k E [x(A)] and Std [x(A)] . at E [x(A)] and Std [x(A)] . y to the E [x(A)]: and Std [x(A)] .understood, jump examined, Explored in Constantine and Gleich, WAW2007; and " Constantine and Gleich, J. Internet Mathematics 2011. 6 David Gleich · Purdue CS&E Seminar
  • 7. Random alpha PageRank has Convergence theory a rigorous convergence theory. Method Conv. Work Required What is N? 1 number of Monte Carlo p N PageRank systems N samples from A Path Damping r N+2 N + 1 matrix vector terms of (without N1+ products Neumann series Std [x(A)]) number of Gaussian r 2N N PageRank systems quadrature Quadrature points and r are parameters from Bet ( , b, , r) 7 David F. Gleich (Sandia) David Random sensitivity Gleich · Purdue CS&E Seminar / 36 Purdue 27
  • 8. Working with PageRank showed us how to treat UQ more generally … 8 David Gleich · Purdue CS&E Seminar
  • 9. Constantine, Gleich, and Iaccarino. We studied Spectral Methods for Parameterized Matrix Equations, SIMAX, 2010. parameterized A(s)x(s) = b(s) matrices. , A(J 1 )x(J 1 ) = b(J 1 ) ) A(J N )x(J N ) = b(J N ) or Parameterized Solution ) AN (J 1 )xN (J 1 ) = bN (J 1 ) Constantine, Gleich, and Iaccarino. A A(s)x(s) = b(s) factorization of the spectral Galerkin system for parameterized matrix equations: derivation and applications, SISC 2011. How to compute the Galerkin solution Discretized PDE in a weakly intrusive manner.! with explicit parameters 9 David Gleich · Purdue CS&E Seminar
  • 10. Simulation! The Third Pillar of Science 21st Century Science in a nutshell! Experiments are not practical or feasible. Simulate things instead. But do we trust the simulations?! We’re trying! Model Fidelity Verification & Validation (V&V) Uncertainty Quantification (UQ) 10 David Gleich · Purdue CS&E Seminar
  • 11. The message Insight and confidence requires multiple runs. 11 David Gleich · Purdue CS&E Seminar
  • 12. The problem A simulation run ain’t cheap! 12 David Gleich · Purdue CS&E Seminar
  • 13. Another problem It’s very hard to “modify” current codes. 13 David Gleich · Purdue CS&E Seminar
  • 14. Large scale nonlinear, time dependent heat transfer problem 105 nodes 103 time steps 30 minutes on 16 cores Questions What is the probability of failure? Which input values cause failure? 14 David Gleich · Purdue CS&E Seminar
  • 15. It’s time to ask " What can science learn from Google?" " - Wired Magazine (2008) 15 David Gleich · Purdue CS&E Seminar
  • 16. We can throw the numbers 21.1st Century Science into the biggest computing in a nutshell? clusters the world has ever Simulations are " seen and let statistical too expensive. algorithms find patterns Let data provide a where science cannot. surrogate. - Wired (again) 16/18 David Gleich · Purdue CS&E Seminar
  • 17. Our approach! Construct an interpolating reduced order model from a budget-constrained ensemble of runs for uncertainty and optimization studies. 17 David Gleich · Purdue CS&E Seminar
  • 18. That is, we store the runs Supercomputer Data computing cluster Engineer Each multi-day HPC A data cluster can … enabling engineers to query simulation generates hold hundreds or thousands and analyze months of simulation gigabytes of data. of old simulations … data for statistical studies and uncertainty quantification. and build the interpolant from the pre-computed data. 18 David Gleich · Purdue CS&E Seminar
  • 19. The Database Input " Time history" s1 -> f1 Parameters of simulation s2 -> f2 s f sk -> fk 2 3 A single simulation The simulation as a vector q(x1 , t1 , s) 6 . . 7 at one time step 6 . 7 6 7 6q(xn , t1 , s)7 6 7 6q(x1 , t2 , s)7 6 7 ⇥ ⇤ f(s) = 6 . 7 6 6 . . 7 7 X = f(s1 ) f(s2 ) ... f(sp ) 6q(xn , t2 , s)7 6 7 6 . 7 The database as a matrix 4 . . 5 q(xn , tk , s) 19 David Gleich · Purdue CS&E Seminar
  • 20. The interpolant Motivation! This idea was inspired by Let the data give you the basis. the success of other ⇥ ⇤ reduced order models X = f(s1 ) f(s2 ) ... f(sp ) like POD; and Paul’s residual minimizing idea. Then find the right combination Xr f(s) ⇡ uj ↵j (s) j=1 These are the left singular vectors from X! 20 David Gleich · Purdue CS&E Seminar
  • 21. Why the SVD?! Let’s study a simple case. 2 3 g(x1 , s1 ) g(x1 , s2 ) ··· g(x1 , sp ) 6 .. .. . . 7 6 g(x2 , s1 ) . . . 7 X=6 6 . 7 7 4 . .. .. . . . g(xm 1 , sp )5 treat each right g(xm , s1 ) g(xm , sp singular vector ··· 1) g(xm , sp ). as samples of = U⌃VT , the unknown r X r X basis functions g(xi , sj ) = Ui,` ` Vj,` = u` (xi ) ` v` (sj ) `=1 `=1 split x and s a general parameter r p X X (`) g(xi , s) = u` (xi ) ` v` (s) v` (s) ⇡ v` (sj ) j (s) `=1 j=1 Interpolate v any way you wish 21 David Gleich · Purdue CS&E Seminar
  • 22. Method summary Compute SVD of X! Compute interpolant of right singular vectors Approximate a new value of f(s)! 22 David Gleich · Purdue CS&E Seminar
  • 23. A quiz! Which section would you rather try and interpolate, A or B? A B 23 David Gleich · Purdue CS&E Seminar
  • 24. How predictable is a ! singular vector? Folk Theorem (O’Leary 2011) The singular vectors of a matrix of “smooth” data become more oscillatory as the index increases. Implication! The gradient of the singular vectors increases as the index increases. v1 (s), v2 (s), ... , vt (s) vt+1 (s), ... , vr (s) Predictable Unpredictable 24 David Gleich · Purdue CS&E Seminar
  • 25. A refined method with ! an error model Don’t even try to interpolate the predictable modes. t(s) r X X f(s) ⇡ uj ↵j (s) + uj j ⌘j j=1 Predictable j=t(s)+1 Unpredictable ⌘j ⇠ N(0, 1) 0 1 r X TA Variance[f] = diag @ j uj uj j=t(s)+1 But now, how to choose t(s)? 25 David Gleich · Purdue CS&E Seminar
  • 26. Our current approach to choosing the predictability t(s) is the largest ������ such that ⌧ X 1 @vi i < threshold 1 @s i=1 26 David Gleich · Purdue CS&E Seminar
  • 27. An experimental test case A heat equation problem Two parameters that control the material properties 27 David Gleich · Purdue CS&E Seminar
  • 28. Experiments 20 point, Latin hypercube sample 28 David Gleich · Purdue CS&E Seminar
  • 29. Our Reduced Order Model Where the error is the worst The Truth 29 David Gleich · Purdue CS&E Seminar
  • 30. A Large Scale Example Nonlinear heat transfer model 80k nodes, 300 time-steps 104 basis runs SVD of 24m x 104 data matrix 500x reduction in wall clock time (100x including the SVD) 30 David Gleich · Purdue CS&E Seminar
  • 31. PART 2! Tall-and-skinny QR (and SVD)! on MapReduce 31 David Gleich · Purdue CS&E Seminar
  • 32. Quick review of QR QR Factorization Let    , real Using QR for regression    is given by    the solution of    QR is block normalization    is    orthogonal (   ) “normalize” a vector usually generalizes to computing    in the QR    is    upper triangular. 0 A = Q R 32 David Gleich (Sandia) David MapReduce 2011 Gleich · Purdue CS&E Seminar 4/22
  • 33. Intro to MapReduce Originated at Google for indexing web Data scalable pages and computing PageRank. Maps M M 1 2 1 M The idea Bring the Reduce 2 M M M computations to the data. R 3 4 3 M R M M Express algorithms in " 4 5 5 M Shuffle data-local operations. Fault-tolerance by design Implement one type of Input stored in triplicate communication: shuffle. M Reduce input/" output on disk M Shuffle moves all data with M R the same key to the same M R reducer. Map output" persisted to disk" 33 before shuffle David Gleich · Purdue CS&E Seminar
  • 34. Mesh point variance in MapReduce Run 1 Run 2 Run 3 T=1 T=2 T=3 T=1 T=2 T=3 T=1 T=2 T=3 34 David Gleich · Purdue CS&E Seminar
  • 35. Mesh point variance in MapReduce Run 1 Run 2 Run 3 T=1 T=2 T=3 T=1 T=2 T=3 T=1 T=2 T=3 M M M 1. Each mapper out- 2. Shuffle moves all puts the mesh points values from the same with the same key. mesh point to the R R same reducer. 3. Reducers just compute a numerical variance. Bring the computations to the data! 35 David Gleich · Purdue CS&E Seminar
  • 36. Communication avoiding QR Communication avoiding TSQR (Demmel et al. 2008) First, do QR Second, compute factorizations a QR factorization of each local of the new “R” matrix    36 Demmel et al.David Communicating avoiding CS&E and sequential QR. 2008. Gleich · Purdue parallel Seminar
  • 37. Serial QR factorizations! Fully serialet al. 2008) (Demmel TSQR Compute QR of    , read    , update QR, … 37 Demmel et al. 2008. Communicating avoiding parallel and sequential QR. David Gleich · Purdue CS&E Seminar
  • 38. Tall-and-skinnymatrix storage MapReduce matrix storage in MapReduce    A1 Key is an arbitrary row-id Value is the    array for A2 a row. A3 Each submatrix    is an input split. A4 38 David Gleich (Sandia) MapReduce 2011 10/2 David Gleich · Purdue CS&E Seminar
  • 39. Algorithm Data Rows of a matrix A1 A1 Map QR factorization of rows A2 qr Reduce QR factorization of rows A2 Q2 R2 Mapper 1 qr Serial TSQR A3 A3 Q3 R3 A4 qr emit A4 Q4 R4 A5 A5 qr A6 A6 Q6 R6 Mapper 2 qr Serial TSQR A7 A7 Q7 R7 A8 qr emit A8 Q8 R8 R4 R4 Reducer 1 Serial TSQR qr emit R8 R8 Q R 39 David Gleich · Purdue CS&E Seminar
  • 40. Key Limitations Computes only R and not Q Can get Q via Q = AR+ with another MR iteration. " (we currently use this for computing the SVD) Dubious numerical stability; iterative refinement helps. Working on better ways to compute Q " (with Austin Benson, Jim Demmel) 40 David Gleich · Purdue CS&E Seminar
  • 41. In hadoopy Full code in hadoopy import random, numpy, hadoopy def close(self): class SerialTSQR: self.compress() def __init__(self,blocksize,isreducer): for row in self.data: key = random.randint(0,2000000000) self.bsize=blocksize yield key, row self.data = [] if isreducer: self.__call__ = self.reducer def mapper(self,key,value): else: self.__call__ = self.mapper self.collect(key,value) def reducer(self,key,values): def compress(self): for value in values: self.mapper(key,value) R = numpy.linalg.qr( numpy.array(self.data),'r') if __name__=='__main__': # reset data and re-initialize to R mapper = SerialTSQR(blocksize=3,isreducer=False) self.data = [] reducer = SerialTSQR(blocksize=3,isreducer=True) for row in R: hadoopy.run(mapper, reducer) self.data.append([float(v) for v in row]) def collect(self,key,value): self.data.append(value) if len(self.data)>self.bsize*len(self.data[0]): self.compress() 41 David Gleich (Sandia) MapReduce 2011 13/22 David Gleich · Purdue CS&E Seminar
  • 42. Lots many maps? an iteration. Too of data? Add Add an iteration! map emit reduce emit reduce emit R1 R2,1 R A1 Mapper 1-1 S1 Reducer 1-1 S(2) A2 Reducer 2-1 Serial TSQR Serial TSQR Serial TSQR shuffle identity map map emit reduce emit R2 R2,2 A2 Mapper 1-2 S(1) A2 S Reducer 1-2 shuffle Serial TSQR Serial TSQR A map emit reduce emit R3 R2,3 A3 Mapper 1-3 A2 S3 Reducer 1-3 Serial TSQR Serial TSQR map emit R4 A3 4 Mapper 1-4 Serial TSQR Iteration 1 Iteration 2 42 David Gleich (Sandia) MapReduce 2011 14/22 David Gleich · Purdue CS&E Seminar
  • 43. mrtsqr – of parameters parameters Summary summary of Blocksize How many rows to A1 A1 read before computing a QR qr factorization, expressed as a A2 A2 Q2 multiple of the number of columns (See paper) map emit R1 Splitsize The size of each local A1 Mapper 1-1 matrix Serial TSQR Reduction tree (Red) S(2) The number of (Red) (Red) S(2) shuffle reducers and S(1) A iterations to use Iteration 1 Iter 2 Iter 3 43 David Gleich (Sandia) MapReduce 2011 David 15/22 Gleich · Purdue CS&E Seminar
  • 44. Varying splitsize and the tree Data Varying splitsize Synthetic Cols. Iters. Split Maps Secs. Increasing split size (MB) improves performance 50 1 64 8000 388 (accounts for Hadoop – – 256 2000 184 data movement) – – 512 1000 149 – 2 64 8000 425 Increasing iterations helps – – 256 2000 220 for problems with many columns. – – 512 1000 191 1000 1 512 1000 666 (1000 columns with 64-MB split size overloaded the – 2 64 6000 590 single reducer.) – – 256 2000 432 – – 512 1000 337 44 David Gleich · Purdue CS&E Seminar
  • 45. MapReduceTSQR summary MapReduce is great for TSQR! Data A tall and skinny (TS) matrix by rows Map QR factorization of local rows Demmel et al. showed that this construction works to Reduce QR factorization of local rows compute a QR factorization with minimal communication Input 500,000,000-by-100 matrix Each record 1-by-100 row HDFS Size 423.3 GB Time to compute    (the norm of each column) 161 sec. Time to compute    in qr(   ) 387 sec. 45 On a 64-node Hadoop cluster with · Purdue CS&E Seminar David Gleich 4x2TB, one Core i7-920, 12GB RAM/node
  • 46. Our vision! To enable analysts and engineers to hypothesize from " Paul G. Constantine " data computations Sandia! Jeremy Templeton Joe Ruthruff instead of expensive … and you ? … HPC computations. 46 David Gleich · Purdue CS&E Seminar