SlideShare ist ein Scribd-Unternehmen logo
1 von 46
Downloaden Sie, um offline zu lesen
Massive MapReduce
Matrix Computations &
Multicore Graph
Algorithms
DAVID F. GLEICH
COMPUTER SCIENCE
PURDUE UNIVERSITY




                                              1
                     David Gleich · Purdue
i
                                        “imvol3” — 2007/7/25 — 21:25 — page 257 — #1




                                 Internet Mathematics Vol. 3, No. 3: 257-294


It’s a pleasure …

                                Approximating Personalized
Intel Intern 2005 in             PageRank with Minimal Use
Application Research             of Web Graph Data
Lab in Santa Clara
              David Gleich and Marzia Polito


                                 Could you run your own search engine
Resulting in one of                   and crawl the web to compute
                       proximations to the personalized PageRank score of ayou are We focus on
                                 your own PageRank vector if webpage.
                       Abstract. In this paper, we consider the problem of calculating fast and a
my favorite papers!
                  highly concerned with privacy?
                       to improve speed by limiting the amount of web graph data we need to acc
                          Our algorithms provide both the approximation to the personalized Page
                                                          	

                       as well as guidance in using only the necessary information—and therefo

                         Yes! Theory, Experiments, Implementation!
                       reduce not only the computational cost of the algorithm but also the m
                       memory bandwidth requirements. We report experiments with these alg
                       web graphs of up to 118 million pages and prove a theoretical approxima




                                                                                           2
                                   David Gleich · Purdue
                       for all. Finally, we propose a local, personalized web-search system for a f
                       system using our algorithms.
Massive MapReduce 
                
Matrix Computations 
              Yangyang Hou "
                                     Purdue, CS
                           A1
     Paul G. Constantine "
                                   Austin Benson "
                                   Joe Nichols"
                           A2
                                     Stanford University
                                   James Demmel "
                           A3
       UC Berkeley
                                   Joe Ruthruff "
                           A4 
    Jeremy Templeton"
                                     Sandia CA

 Funded by Sandia National Labs
                                   
 CSAR project.




                                                            3
                                   David Gleich · Purdue
By 2013(?) all Fortune 500
companies will have a data
computer




                                         4
                David Gleich · Purdue
Data computers I’ve worked with … 




                        
                         
Magellan Cluster @ !     Student Cluster @ !       Nebula Cluster @ !
   NERSC!                  Stanford!                 Sandia CA!
128GB/core storage, "    3TB/core storage, "       2TB/core storage, "
80 nodes, 640 cores, "   11 nodes, 44 cores, "     64 nodes, 256 cores, "
infiniband
               GB ethernet
              GB ethernet

                        Cost $30k
               Cost $150k
    These systems        
are good
                          for                     working with
                                                  
            enormous matrix data!




                                                                        5
                                     David Gleich · Purdue
How do you program them?




                                       6
              David Gleich · Purdue
MapReduce and"
Hadoop overview




                                     7
            David Gleich · Purdue
MapReduce in a picture




                 Like an MPI all-to-all




  In parallel
                                      In parallel




                                                                    8
                                           David Gleich · Purdue
Computing a histogram "
A simple MapReduce example
                        1
            1
                                                                       5
Input!                  1
            1
          Output!              15
                        1
!                                     1
          !                    10




                            shuffle
                        1
                                             9
Key ImageId
            1
            1
                                       1
        Key Color
            3
Value Pixels 
   Map
   1
            1
 Reduce
 Value "
                                       1
                                                                       17
                        1
                                             5
                        1
            1
               # of pixels 
   10



    Map(ImageId, Pixels)
             Reduce(Color, Values)
    for each pixel
                   emit"
      emit"                            Key = Color
        Key = (r,g,b)"                 Value = sum(Values)
        Value = 1




                                                                         9
                                      David Gleich · Purdue
Why a limited computational model?
Data scalability, fault tolerance.
       Maps
               M         M                       The last page of a
1
      M                  1
        2
                   136-page error dump.
             Reduce
2
      M                  M         M
               R           3
        4
3
      M
               R
4
      M                       M
                                5
5
      M Shuffle


     The idea !
     Bring the computations
     to the data
     MR can schedule map                  After waiting in the queue for a month and "
     functions without                    after 24 hours of finding eigenvalues, "
     moving data.
                        one node randomly hiccups. 
                                          




                                                                                  10
                                          David Gleich · Purdue
Tall-and-Skinny
    matrices
    
    (m ≫ n) 
    Many rows (like a billion)
A
    A few columns (under 10,000)

                regression and
                general linear models"
                with many samples
                   From tinyimages"
                
                                                            collection
        Used in
 block iterative methods
                panel factorizations
                

                simulation data analysis !
                

                big-data SVD/PCA!




                                                                          11
                                         David Gleich · Purdue
Scientific simulations as "
  Tall-and-Skinny matrices
   Input "                                                                                                 Time history"
Parameters
                                                                                                of simulation

             s
                                                                                                 f"
                                                                                                             ~100GB
  The simulation as a matrix




                                                 The simulation as a vector



                                                                                      2                3
                                         time
                                          q(x1 , t1 , s)         A database 
 parameters




                                                                                                                                                         tall-and-skinny matrix
                                                                                                                                                         The database is a very"
                                                                                      6       .
                                                                                              .        7      of simulations
                                                                                      6       .        7
                                                                                      6                7
                                                                                      6q(xn , t1 , s)7
                                                                                      6                7




                                                                                                                                    space-by-time
                                                                                      6q(x1 , t2 , s)7           s1 -> f1
                                space




                                                                                      6                7
                                                                               f(s) = 6                7
                                          A                                           6
                                                                                      6
                                                                                              .
                                                                                              .
                                                                                              .        7
                                                                                                       7
                                                                                                                 s2 -> f2
                           A
                                                                                      6q(xn , t2 , s)7
                                                                                      6
                                                                                      6
                                                                                                       7
                                                                                                       7
                                                                                                                    
                                                                                              .
                                                                                      4       .
                                                                                              .        5         sk -> fk
                                                                                        q(xn , tk , s)




                                                                                                                                                                              12
                                                                                                           David Gleich · Purdue
Model reduction
                          Constantine & Gleich, ICASSP 2012
                           A Large Scale Example




Nonlinear heat transfer model
80k nodes, 300 time-steps
104 basis runs
SVD of 24m x 104 data matrix
 500x reduction in wall clock time
(100x including the SVD)




                                                                               13
                                        David Gleich · Purdue
PCA of 80,000,000"
         images
                                                       First 16
                                                                      columns
                                                                        of V as
                                                                       images
1000 pixels
                                       R                       V
                                              SVD
                   (principal
                                      TSQR
                        components)
 80,000,000 images




                                                           Top 100
                      A           X                        singular
                                                           values
                          Zero"
                          mean"
                          rows




                                                                                               14/22
                          MapReduce                  Post Processing

  Constantine & Gleich, MapReduce 2010.
                              David Gleich · Purdue
All these applications need is
Tall-and-Skinny QR




                                          15
                 David Gleich · Purdue
                                                             the solution of

                                                                                                 QR is block nor
                                     is               orthogonal (                       )       “normalize” a v
Quick review of QR
QR Factorization                                                                                 usually genera
                                                                                                 computing    in
                                     is               upper triangular.
Let                                   , real                              Using QR for regression

                                                                                        is given by
                                                                          the solution of   
                                                                                                  0
                                                                        A QR is =       Q
                                                                                block normalization
   is                    orthogonal (                      )              “normalize” a vector
                                                                                                    R
                                                                          usually generalizes to
                                                                          computing    in the QR
   is                   upper David Gleich (Sandia)
                              triangular.                                       MapReduce 2011




Current MapReduce algs use the normal equations
                                                                            0
                                             AT       Cholesky

                           ! RT R                                                                         1
                                                      =    Q
   A = QR        A A                                                                         Q = AR
                                   R
which can limit numerical accuracy




                                                                                                               16
David Gleich (Sandia)                             MapReduce 2011                                    4/22
                                                                   David Gleich · Purdue
There are good MPI
implementations. 

Why MapReduce?




                                        17
               David Gleich · Purdue
Full TSQR code inhadoopy
                In hadoopy
import random, numpy, hadoopy                            def close(self):
class SerialTSQR:                                          self.compress()
 def __init__(self,blocksize,isreducer):                   for row in self.data:
                                                            key = random.randint(0,2000000000)
   self.bsize=blocksize                                     yield key, row
   self.data = []
   if isreducer: self.__call__ = self.reducer            def mapper(self,key,value):
   else: self.__call__ = self.mapper                      self.collect(key,value)

                                                         def reducer(self,key,values):
 def compress(self):                                      for value in values: self.mapper(key,value)
  R = numpy.linalg.qr(
         numpy.array(self.data),'r')                    if __name__=='__main__':
  # reset data and re-initialize to R                     mapper = SerialTSQR(blocksize=3,isreducer=False)
  self.data = []                                          reducer = SerialTSQR(blocksize=3,isreducer=True)
  for row in R:                                           hadoopy.run(mapper, reducer)
   self.data.append([float(v) for v in row])

 def collect(self,key,value):
  self.data.append(value)
  if len(self.data)>self.bsize*len(self.data[0]):
    self.compress()




                                                                                                             18
  David Gleich (Sandia)                             MapReduceDavid
                                                             2011    Gleich · Purdue
                   13/22
Tall-and-skinny matrix
storage in MapReduce
A : m x n, m ≫ n

                                                      A1


Key is an arbitrary row-id
                                                       A2
Value is the 1 x n array "
for a row
                                                       A3

                                                       A4 
Each submatrix Ai is an "
the input to a map task.




                                                              19
                              David Gleich · Purdue
Numerical stability was a
  problem for prior approaches

                                                         Constantine & Gleich,
                                                         MapReduce 2010
                                        Prior work
                    norm ( QTQ – I )

Previous methods
couldn’t ensure                         AR-1
that the matrix Q
was orthogonal 
                                                              Benson, Gleich,
                                                                              Demmel, Submitted 
                                                             AR + "
                                                               -1

                                                                  nt
         Direct TSQR
                                                          refineme
                                                iterative                     Benson, Gleich, "
                                                                              Demmel, Submitted

                                            105
                                  1020
                                              Condition number




                                                                                                   20
                                                     David Gleich · Purdue
Communication avoiding QR (Demmel et al. 2008) "
     on MapReduce (Constantine and Gleich, 2010)
                                             Algorithm
                                             Data Rows of a matrix
              A1   A1                        Map QR factorization of rows
                   A2
                        qr                   Reduce QR factorization of rows
              A2             Q2   R2
Mapper 1                                qr
Serial TSQR   A3                  A3          Q3   R3
                                                                              “Manual reduce” can make
                                                   A4   qr             emit
              A4                                             Q4   R4          it faster by adding a second
                                                                              iteration. 
              A5   A5
                                                                              
                        qr
              A6   A6        Q6   R6                                          Computes only R and not Q 
Mapper 2                                qr
                                                                              
              A7                  A7
Serial TSQR                                   Q7   R7                         Can get Q via Q = AR-1 with
              A8                                   A8   qr
                                                             Q8   R8
                                                                       emit   another MR iteration.
                                                                              
              R4
                                                                              Use the standard
                   R4
Reducer 1                                                                     Householder method?
Serial TSQR             qr             emit
              R8   R8        Q    R




                                                                                                       21
                                                             David Gleich · Purdue
Taking care of business by
keeping track of Q
                                        3. Distribute the
                                                           pieces of Q*1 and
                                                           form the true Q

       Mapper 1
                                                           Mapper 3
                                                 Task 2
                         R1
                                                   Q11
       A1
         Q1
                     R1
   Q11
 R
                   Q1
          Q1
                                           R2
    Q21




                                                               Q output
                               R output
                         R2
               R3
    Q31
                           Q21
       A2
         Q2
                                                     Q2
          Q2
                                           R4
    Q41
                         R3
                                                     Q31
                                    2. Collect R on one
       A3
         Q3
                                                     Q3
          Q3
                                    node, compute Qs
                                    for each piece
                         R4
                                                     Q41
       A4 
        Q4
                                                     Q4
          Q4

               1. Output local Q and
               R in separate files




                                                                                              22
                                             David Gleich · Purdue
The price is right! Based on
performance model and tests 
                                                                    Experiment on
 2500
                                                              NERSC
              DirectTSQR is                                         Magellan
              faster than                                           computer, 80
              refinement for                 … and not any           nodes, 640
seconds




              few columns
                  slower for many         processors,
                                            columns.
               80TB disk




     500



            800M-by-10
   7.5B-by-4
   150M-by-100
 500M-by-50




                                                                                23
                                           David Gleich · Purdue
Ongoing work
Make AR-1 stable with targeted quad-precision
arithmetic to get a numerically orthogonal Q"
     



     
   Performance model says it’s feasible!
How to handle more than ~ 10,000 columns? "
     



     
   Some randomized methods?
Do we need quad-precision for big-data?"
Standard error analysis n 𝜀 to compute sum."
     



     
   I’ve seen this with PageRank computations!




                                                             24
                                    David Gleich · Purdue
Multicore Graph "
                                           Assefaw Gehraimbem "
Algorithms
                                Arif Khan"
                                           Alex Pothen"
                                           Ryan Rossi"
             Mem
                                             Purdue, CS
             CPU
                                           Mahantesh Halappanavar"
      Mem
                                  PNNL
                       Mem
      CPU
                                 Chen Greif"
                       CPU
                David Kurokawa"
                                            Univ. British Columbia
                                           Mohsen Bayati"
Funded by DOE CSCAPES Institute grant      Amin Saberi"
(DE-FC02-08ER25864), NSF CAREER grant      Ying Wang (now Google)"
1149756-CCF, and the Center for Adaptive
Super Computing Software Multithreaded       Stanford




                                                                      25
Architectures (CASS-MT) at PNNL.
                                           David Gleich · Purdue
Network alignment"
What is the best way of matching "
graph A to B?

                                                w
            v
                                        s
    r




                t                           u


        A                                   B




                                                    26
                       David Gleich · Purdue
the    Figure 2. The NetworkBLAST local network alignment algorithm. Given two input
s) or
odes
 lem
         Network alignment"
         networks, a network alignment graph is constructed. Nodes in this graph correspond
         to pairs of sequence-similar proteins, one from each species, and edges correspond to
         conserved interactions. A search algorithm identifies highly similar subnetworks that


         
         follow a prespecified interaction pattern. Adapted from Sharan and Ideker.30
n the
 ent;
nied
 ped
 lem
         
  net-
 one
 one
plest
 ying
 eins
ome
  the
  be-

d as
  aph
ever,
   ap-           From Sharan and Ideker, Modeling cellular machinery through biological
rked             network comparison. Nat. Biotechnol. 24, 4 (Apr. 2006), 427–433. 




                                                                                                 27
 , we            
         Figure 3. Performance comparison of computational approaches.
mon-                                                     David Gleich · Purdue
Network alignment"
What is the best way of matching "
graph A to B using only edges in L?

                                                       w
            v
                                               s
    r




                        wtu
                t                                  u


        A           L                              B




                                                           28
                              David Gleich · Purdue
Network alignment"
Matching? 1-1 relationship"
Best? highest weight and overlap

                                                             w
            v
                    Overlap                          s
    r




                              wtu
                t                                        u


        A                L                               B




                                                                 29
                                    David Gleich · Purdue
Our contributions
A new belief propagation method (Bayati et al. 2009, 2013)"
Outperformed state-of-the-art PageRank and optimization-
based heuristic methods

High performance C++ implementations (Khan et al. 2012)"
40 times faster (C++ ~ 3, complexity ~ 2, threading ~ 8)"
5 million edge alignments ~ 10 sec"

www.cs.purdue.edu/~dgleich/codes/netalignmc




                                                              30
                                 David Gleich · Purdue
31
David Gleich · Purdue
Each iteration involves
                                        Let x[i] be the score for
Matrix-vector-ish computations          each pair-wise match in L
with a sparse matrix, e.g. sparse
matrix vector products in a semi-       for i=1 to ...
ring, dot-products, axpy, etc. 
            update x[i] to y[i]
Bipartite max-weight matching               compute a
using a different weight vector at            max-weight match
                                              with y
each iteration
                                            update y[i] to x[i]
"                                             (using match in MR)
No “convergence” "
100-1000 iterations




                                                                    32
                                     David Gleich · Purdue
The methods
Each iteration involves!                               Belief Propagation!

                                                      !
                                                  Listing 2. A belief-propagation message passing procedure for network
                                                  alignment. See the text for a description of othermax and round heuristic.
                                                                                                                               D

                                                  1   y(0) = 0, z(0) = 0, d(0) = 0, S(k) = 0                                   t
Matrix-vector-ish computations    !               2

                                                  3
                                                      for k = 1 to niter
                                                                                       T
                                                        F = bound0, [ S + S(k) ] Step 1: compute F
                                                                                                                               O
                                                                                                                               s
with a sparse matrix, e.g. sparse                 4     d = ↵w + Fe Step 2: compute d                                          a
                                    !             5     y(k) = d othermaxcol(z(k 1) ) Step 3: othermax                         i
matrix vector products in a semi-                 6     z(k) = d othermaxrow(y(k 1) )                                          i
                                                                                                                               h
                                                        S(k) = diag(y(k) + z(k) d)S F Step 4: update S
                                      !
                                                  7
ring, dot-products, axpy, etc. 
                  8     (y(k) , z(k) , S(k) )    k
                                                                                   (y(k) , z(k) , S(k) )+
                                                                                                                               O
                                                                                                                               a
                                                  9        (1     k
                                                                    )(y(k 1) , z(k 1) , S(k 1) ) Step 5: damping
                                                                                                                               e

                                                10
                                                 11    !
                                                         round heuristic (y(k) ) Step 6: matching
                                                         round heuristic (z(k) ) Step 6: matching
                             I
                                                 12   end
Bipartite max-weight matching                         return y(k) or z(k) with the largest objective value
                                        !
                                                 13                                                                            t
                                                                                                                               p
using a different weight vector at                                                                                             m

                                          !
                                                                                                                               w
each iteration
                    interpretation, the weight vectors are usually called messages
                                   as they communicate the “beliefs” of each “agent.” In this                                  A
                                                 particular problem, the neighborhood of an agent represents




                                                                                                                        33
                                                 all of the other edges in graph L incident on the same vertex                 s
                                                 in graph A (1st vector), all edges in L incident on the same
                                                 David in graph BPurdue
                                                 vertex Gleich · (2nd vector), or the edges in L that are
                                                                                                                               fi
                                                                                                                               “
The NEW methods
  Each iteration involves!                       Belief Propagation!
  
 el
                                          !
                                            Listing 2. A belief-propagation message passing procedure for network
                                            alignment. See the text for a description of othermax and round heuristic.
                                                                                                                          D

      l
Paral
                                                     (0)          (0)              (0)                (k)
                                    y = 0, z = 0, d = 0, S = 0
                                             1                                                                            t
                                    ! F = bound
  Matrix-vector-ish computations for k = 1 to n [ S + S ] Step 1: compute F
                                             2

                                             3
                                                                         iter

                                                                        0,
                                                                                             (k) T
                                                                                                                          O
                                                                                                                          s
  with a sparse matrix, e.g. sparse d = ↵wd+ Fe Step 2: compute dStep 3: othermax
                                             4                                                                            a
                                     ! y = d othermaxrow(y ))
                                           = 5
                                                       (k)
                                                 othermaxcol(z                                       (k 1)                i
  matrix vector products in a semi- z        6
                                                       (k)
                                                       (k)
                                                                                                      (k 1)               i
                                                                                                                          h
                                        S = diag(y + z        d)S F Step 4: update S
                                                                             (k)         (k)

                                      ! (y , z , S ) (y , z , S )+
                                             7
  ring, dot-products, axpy, etc. 
           8
                                                           (k)   (k)     (k)             k      (k)     (k)   (k)         O
                                                                                                                          a
                                             9        (1    k
                                                              )(y(k 1) , z(k 1) , S(k 1) ) Step 5: damping
                                                                                                                          e
  
                                         10
                                            11   !
                                                    round heuristic (y(k) ) Step 6: matching
                                                    round heuristic (z(k) )
                                                                              Step 6"                                     I
                                            12   end                          approx matching
  Approximate bipartite max-                     return y or z with the largest objective value
                                                          (k)      (k)

                                                 !
                                            13                                                                            t
                                                                                                                          p
  weight matching is used here                                                                                            m

                                                 !
                                                                                                                          w
  instead!
                                 interpretation, the weight vectors are usually called messages
                                            as they communicate the “beliefs” of each “agent.” In this                    A
                                            particular problem, the neighborhood of an agent represents




                                                                                                                    34
                                            all of the other edges in graph L incident on the same vertex                 s
                                            in graph A (1st vector), all edges in L incident on the same
                                            David in graph BPurdue
                                            vertex Gleich · (2nd vector), or the edges in L that are
                                                                                                                          fi
                                                                                                                          “
MR
        Approximation doesn’t hurt the
 between the Library of Congress




                                                  r
                                                                      0.2   ApproxMR
pedia categories (lcsh-wiki). While                                         BP
 e hierarchical tree, they also have
        belief propagation algorithm
                                                                            ApproxBP
 r types of relationships. Thus we                                     0
                                                                        0         5           10              15       20
 l graphs. The second problem is an                                            expected degree of noise in L (p ⋅ n)
rary of Congress subject headings
French National Library: Rameau.                                       1
d weights in L are computed via a
  heading strings (and via translated




                                                  of correct match
au). These problems are larger than                                   0.8
                                                                                               BP a

                                         Fraction fraction correct
                                                                                              indis nd App
                                                                                                   tingu roxB
 NMENT WITH APPROXIMATE                                               0.6
                                                                                                        isha    P
 ATCHING                                                                                                    ble
 are
 ss the question: how does the be-       0.4
 d the BP method change when we
matching procedure from Section V             MR
                                         0.2 ApproxMR
 step in each algorithm? Note that
                                              BP
 ching in the first step of Klau’s
                                              ApproxBP
ch) because the problems in each           0
we parallelize over perturb onealso         0              5              10              15             20
        Randomly rows. Note                            expected degree of noise in L (p ⋅ n)
  is much more integral to Klau’s B
        power-law graph to get A,                     The amount of random-ness in L in
                                                            average expected degree
edure. Generate L by the true-we Fig. 2. Alignment with a power-law graph shows the large effect that
        For the BP procedure,
 ing problem to evaluate the quality approximate rounding can have on solutions from Klau’s method (MR). With




                                                                                                                        35
        match + random edges
Klau’s method, the results of the that method, using exact rounding will yield the identity matching for all
                                                        David Gleich · Purdue
                                            problems (bottom figure), whereas using the approximation results in over a
A local dominating edge
method for bipartite matching
                                         j
         i                                   The method guarantees
 r
                                 s
                                             •  ½ approximation
                                             •  maximal matching
                                             based on work by Preis
                                             (1999), Manne and
                       wtu                   Bisseling (2008), and
             t                       u
                                             Halappanavar et al (2012)
     A             L                 B
A locally dominating edge is an edge
heavier than all neighboring edges.

For bipartite Work on smaller side only




                                                                      36
                                  David Gleich · Purdue
A local dominating edge
method for bipartite matching
                                         j
                                             Queue all vertices
         i
 r
                                 s           Until queue is empty!
                                             In Parallel over vertices"
                                               Match to heavy edge
                                               and if there’s a conflict,
                       wtu
                                     u
                                               check the winner, and
             t
                                               find an alternative for
     A             L                 B         the loser
                                             Add endpoint of non-
A locally dominating edge is an edge           dominating edges to
heavier than all neighboring edges.
           the queue

For bipartite Work on smaller side only




                                                                       37
                                  David Gleich · Purdue
A local dominating edge
method for bipartite matching
                                         j
         i                                   Customized first iteration
 r
                                 s
                                             (with all vertices)
                                             
                                             Use OpenMP locks to
                                             update choices
                       wtu                   
             t                       u
                                             Use sync_and_fetch_add
     A             L                 B       for queue updates.

A locally dominating edge is an edge
heavier than all neighboring edges.

For bipartite Work on smaller side only




                                                                    38
                                  David Gleich · Purdue
Remaining multi-threading
procedures are straightforward
Standard OpenMP for matrix-computations"
 use schedule=dynamic to handle skew
We can batch the matching procedures in the
BP method for additional parallelism

        for i=1 to ...
           update x[i] to y[i]
           save y[i] in a buffer
           when the buffer is full
             compute max-weight match
             for all in buffer and save
             the best




                                                    39
                           David Gleich · Purdue
Performance evaluation
(2x4)-10 core Intel E7-8870, 2.4 GHz (80-cores)
16 GB memory/proc (128 GB)

Scaling study 
                                 Mem
      Mem
       Mem
   Mem
1.  Thread binding "             CPU
      CPU
       CPU
   CPU
    scattered vs. compact
                                 CPU
      CPU
       CPU
   CPU
2.  Memory binding "
                                 Mem
      Mem
       Mem
   Mem
    interleaved vs. bind




                                                                    40
                             David Gleich · Purdue
Scaling
                         BP with no batching
                                 lcsh-rameau, 400 iterations
              25
                                scatter and interleave
              20
    Speedup




              15

                                      115 seconds for 40-thread
              10

              5
                   1450 seconds for 1-thread
              0
               0           20         40              60           80
                                    Threads




                                                                        41
                                          David Gleich · Purdue
Ongoing work

Better memory handling! "
      



     
    numactl, affinity insufficient for full scaling
Better models!"
      



     
    These get to be much bigger computations.
Distributed memory."
      



     
    Trying to get an MPI version, looking into GraphLab




                                                                  42
                                         David Gleich · Purdue
PageRank was created by
ageRank details
  Google to rank by Google
   PageRank web-pages 
                   3

                   3


           2               5      The Model 0 0 0 3
                                    2
                                      1/ 6 1/ 2 0
               2           5        6 1/ 6 0 0 1/ 3 0 0 7
                                   1. follow edges uniformlyPwith
                                                                 j 0
                                 ! 6 probability1/ 3, 0 0 7 eT P=eT
                                      1/ 6 1/ 2 0      0 0
                   4
                   4                4 1/ 6 0 1/ 2 0 and    5
                                      1/ 6 0 1/ 2 1/ 3 0 1
                                   2. randomly jump 0    with probability
           1               6        | 1/ 6 0 {z 0 1 }
                                                0
               1           6          1      , we’ll assume everywhere is
                                                P
                                      equally likely

                                                  T           0
              “jump” !               v = [ 1 ... 1 ]
                                            n    n        eT v=1
               î                           ó
 Markov chain     P + (1              )ve T x=x
                                      The places we find the
                 unique x            ) j 0, eT x = 1. are im-
                                      surfer most often
 Linear system         (              portant pages.
                               P)x = (1    )v




                                                                        43/40
 Ignored               dangling nodes patched back to v
                       algorithms later Gleich, Purdue
 UTRC Seminar
                                      David
ther uses for PageRank
ensitivity?
 else people use PageRank to do
                                                                             ProteinRank
                                  GeneRank
                                                                             ObjectRank
           NM_003748
           NM_003862
      Contig32125_RC
              U82987
            AB037863
           NM_020974
      Contig55377_RC
           NM_003882
           NM_000849
      Contig48328_RC
      Contig46223_RC
           NM_006117
           NM_003239
           NM_018401
            AF257175
            AF201951
           NM_001282
      Contig63102_RC
           NM_000286
      Contig34634_RC
           NM_000320
            AB033007
            AL355708
           NM_000017
           NM_006763
            AF148505
          Contig57595
           NM_001280
            AJ224741
              U45975
      Contig49670_RC
        Contig753_RC
      Contig25055_RC
      Contig53646_RC
      Contig42421_RC
      Contig51749_RC
                                                                              EventRank
            AL137514
           NM_004911
           NM_000224
           NM_013262
      Contig41887_RC
           NM_004163
            AB020689
           NM_015416
      Contig43747_RC




                                                                                IsoRank
           NM_012429
            AB033043
            AL133619
           NM_016569
           NM_004480
           NM_004798
      Contig37063_RC
           NM_000507
            AB037745
      Contig50802_RC
           NM_001007
      Contig53742_RC
           NM_018104
          Contig51963
      Contig53268_RC
           NM_012261
           NM_020244
      Contig55813_RC
      Contig27312_RC
      Contig44064_RC
           NM_002570
           NM_002900
            AL050090
           NM_015417
      Contig47405_RC
           NM_016337
      Contig55829_RC
          Contig37598
      Contig45347_RC
           NM_020675
           NM_003234
            AL080110
            AL137295
      Contig17359_RC
           NM_013296
           NM_019013
            AF052159
      Contig55313_RC
           NM_002358
           NM_004358
      Contig50106_RC
           NM_005342
           NM_014754
              U58033
          Contig64688
           NM_001827
       Contig3902_RC
      Contig41413_RC
           NM_015434
           NM_014078
           NM_018120
           NM_001124
               L27560
      Contig45816_RC
            AL050021
           NM_006115
           NM_001333
           NM_005496
      Contig51519_RC
       Contig1778_RC
           NM_014363
           NM_001905
           NM_018454
           NM_002811




                                                                              Clustering
           NM_004603
            AB032973
           NM_006096
              D25328
      Contig46802_RC
               X94232
           NM_018004
       Contig8581_RC
      Contig55188_RC
          Contig50410
      Contig53226_RC
           NM_012214
           NM_006201
           NM_006372
      Contig13480_RC
            AL137502
      Contig40128_RC
           NM_003676
           NM_013437
       Contig2504_RC
            AL133603
           NM_012177
          R70506_RC
           NM_003662
           NM_018136
           NM_000158
           NM_018410
      Contig21812_RC
           NM_004052
           Contig4595
      Contig60864_RC
           NM_003878
              U96131
           NM_005563
           NM_018455
      Contig44799_RC
           NM_003258




                                                           P)x = (1
           NM_004456
           NM_003158
           NM_014750
      Contig25343_RC
           NM_005196
      Contig57864_RC
           NM_014109
           NM_002808
      Contig58368_RC
      Contig46653_RC




     (                                                                )v
           NM_004504
              M21551
           NM_014875
           NM_001168
           NM_003376
           NM_018098
            AF161553
           NM_020166
           NM_017779




                                                                           (graph partitioning)
           NM_018265
            AF155117
           NM_004701
           NM_006281
      Contig44289_RC
           NM_004336
      Contig33814_RC
           NM_003600
           NM_006265
           NM_000291
           NM_000096
           NM_001673
           NM_001216
           NM_014968
           NM_018354
           NM_007036
           NM_004702
       Contig2399_RC
           NM_001809
      Contig20217_RC
           NM_003981
           NM_007203
           NM_006681
            AF055033
           NM_014889
           NM_020386
           NM_000599
      Contig56457_RC
           NM_005915
      Contig24252_RC
      Contig55725_RC
           NM_002916
           NM_014321
           NM_006931
            AL080079
      Contig51464_RC
           NM_000788
           NM_016448
               X05610
           NM_014791
      Contig40831_RC
            AK000745
           NM_015984
           NM_016577
      Contig32185_RC
            AF052162
            AF073519
           NM_003607
           NM_006101
           NM_003875
          Contig25991
      Contig35251_RC
           NM_004994
           NM_000436
           NM_002073
           NM_002019
           NM_000127
           NM_020188




                                                                           Sports ranking
            AL137718
      Contig28552_RC
      Contig38288_RC
        AA555029_RC
           NM_016359
      Contig46218_RC
      Contig63649_RC
            AL080059
                        10   20   30   40   50   60   70




he (links : 1examined and understood
 se     GD )x = w to        Food webs
 nd “nearby” important
                                                                              Centrality
 enes.
                                                                               Teaching




                                                                                                  44/40
  Conjectured new papers: TweetRank (Done, WSDM 2010), WaveRank,
he jump : examined, understood, and u
 Rank, PaperRank, UniversityRank, LabRank. Gleich, Purdue
 last one involves a
                                      David I think the UTRC Seminar
Multicore PageRank
… similar story … 

Serialized preprocessing
Parallelize the linear algebra via an "
asynchronous Gauss-Seidel iterative method

~10x scaling on same (80-core) machine "
(1M nodes, 15M edges, synthetic)




                                                     45
                            David Gleich · Purdue
Questions?
             Papers on my webpage
        www.cs.purdue.edu/homes/dgleich
                     Codes
           github.com/arbenson/mrtsqr
www.cs.purdue.edu/homes/dgleich/codes/netalignmc
            github.com/dgleich/prpack




                                                   46
                          David Gleich · Purdue

Weitere ähnliche Inhalte

Andere mochten auch

ScanService Kronberg - Ihr preiswerter Scandienst
ScanService Kronberg - Ihr preiswerter Scandienst ScanService Kronberg - Ihr preiswerter Scandienst
ScanService Kronberg - Ihr preiswerter Scandienst fastNOTE SchreibService
 
Unidad v tema 10 crm y el comercio electrónico (e-crm) - cad
Unidad v   tema 10  crm y el comercio electrónico (e-crm) - cadUnidad v   tema 10  crm y el comercio electrónico (e-crm) - cad
Unidad v tema 10 crm y el comercio electrónico (e-crm) - cadUDO Monagas
 
Profiles of 50 major appliance manufacturers worldwide
Profiles of 50 major appliance manufacturers worldwideProfiles of 50 major appliance manufacturers worldwide
Profiles of 50 major appliance manufacturers worldwideCSIL Kitchen
 
DIA, ¿una inversion de valor? (II) - DIA, se consolida en el mercado
DIA, ¿una inversion de valor? (II) - DIA, se consolida en el mercadoDIA, ¿una inversion de valor? (II) - DIA, se consolida en el mercado
DIA, ¿una inversion de valor? (II) - DIA, se consolida en el mercadoFrancisco Fernández Reguero
 
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...David Gleich
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceDavid Gleich
 
Iterative methods for network alignment
Iterative methods for network alignmentIterative methods for network alignment
Iterative methods for network alignmentDavid Gleich
 
A history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveA history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveDavid Gleich
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
 
The power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsThe power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsDavid Gleich
 
Direct tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDirect tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDavid Gleich
 

Andere mochten auch (19)

Jordania
JordaniaJordania
Jordania
 
ScanService Kronberg - Ihr preiswerter Scandienst
ScanService Kronberg - Ihr preiswerter Scandienst ScanService Kronberg - Ihr preiswerter Scandienst
ScanService Kronberg - Ihr preiswerter Scandienst
 
Unidad v tema 10 crm y el comercio electrónico (e-crm) - cad
Unidad v   tema 10  crm y el comercio electrónico (e-crm) - cadUnidad v   tema 10  crm y el comercio electrónico (e-crm) - cad
Unidad v tema 10 crm y el comercio electrónico (e-crm) - cad
 
TpM2015: Digitales Zielgruppenmarketing am Beispiel der Tourismusmarke Tirol.
TpM2015: Digitales Zielgruppenmarketing am Beispiel der Tourismusmarke Tirol.TpM2015: Digitales Zielgruppenmarketing am Beispiel der Tourismusmarke Tirol.
TpM2015: Digitales Zielgruppenmarketing am Beispiel der Tourismusmarke Tirol.
 
Profiles of 50 major appliance manufacturers worldwide
Profiles of 50 major appliance manufacturers worldwideProfiles of 50 major appliance manufacturers worldwide
Profiles of 50 major appliance manufacturers worldwide
 
Propuestas estructurales
Propuestas estructuralesPropuestas estructurales
Propuestas estructurales
 
DIA, ¿una inversion de valor? (II) - DIA, se consolida en el mercado
DIA, ¿una inversion de valor? (II) - DIA, se consolida en el mercadoDIA, ¿una inversion de valor? (II) - DIA, se consolida en el mercado
DIA, ¿una inversion de valor? (II) - DIA, se consolida en el mercado
 
Visitas seeiuc14 madrid jmvelasco
Visitas seeiuc14 madrid jmvelascoVisitas seeiuc14 madrid jmvelasco
Visitas seeiuc14 madrid jmvelasco
 
PEC CEPA 2013
PEC CEPA 2013PEC CEPA 2013
PEC CEPA 2013
 
Digital Brand Manager - Präsentation Infoabend
Digital Brand Manager - Präsentation InfoabendDigital Brand Manager - Präsentation Infoabend
Digital Brand Manager - Präsentation Infoabend
 
Aprender a escuchar
Aprender a escucharAprender a escuchar
Aprender a escuchar
 
Teaching with Vine
Teaching with VineTeaching with Vine
Teaching with Vine
 
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduce
 
Iterative methods for network alignment
Iterative methods for network alignmentIterative methods for network alignment
Iterative methods for network alignment
 
A history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveA history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspective
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
The power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsThe power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulants
 
Direct tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDirect tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architectures
 

Ähnlich wie Massive MapReduce Matrix Computations & Multicore Graph Algorithms

MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisDavid Gleich
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduceDavid Gleich
 
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing  with MapReduce Data-Intensive Text Processing  with MapReduce
Data-Intensive Text Processing with MapReduce George Ang
 
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduceData-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduceGeorge Ang
 
Workshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data ScienceWorkshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data ScienceNeo4j
 
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...Neo4j
 
Simulation Informatics
Simulation InformaticsSimulation Informatics
Simulation InformaticsDavid Gleich
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processingjins0618
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labImpetus Technologies
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingNesreen K. Ahmed
 
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFTed Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFMLconf
 
Thinking in parallel ab tuladev
Thinking in parallel ab tuladevThinking in parallel ab tuladev
Thinking in parallel ab tuladevPavel Tsukanov
 
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...VMware Tanzu
 
Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4Vijay Srinivas Agneeswaran, Ph.D
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
 
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLabBeyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLabVijay Srinivas Agneeswaran, Ph.D
 

Ähnlich wie Massive MapReduce Matrix Computations & Multicore Graph Algorithms (20)

MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysis
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduce
 
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing  with MapReduce Data-Intensive Text Processing  with MapReduce
Data-Intensive Text Processing with MapReduce
 
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduceData-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduce
 
Workshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data ScienceWorkshop Tel Aviv - Graph Data Science
Workshop Tel Aviv - Graph Data Science
 
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
 
Simulation Informatics
Simulation InformaticsSimulation Informatics
Simulation Informatics
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
Spark
SparkSpark
Spark
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFTed Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
 
Thinking in parallel ab tuladev
Thinking in parallel ab tuladevThinking in parallel ab tuladev
Thinking in parallel ab tuladev
 
ENAR short course
ENAR short courseENAR short course
ENAR short course
 
Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013
 
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
 
Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLabBeyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
 

Mehr von David Gleich

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisDavid Gleich
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksDavid Gleich
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresDavid Gleich
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networksDavid Gleich
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisDavid Gleich
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansDavid Gleich
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsDavid Gleich
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph miningDavid Gleich
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresDavid Gleich
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structuresDavid Gleich
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphsDavid Gleich
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential David Gleich
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreDavid Gleich
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksDavid Gleich
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detectionDavid Gleich
 

Mehr von David Gleich (20)

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based Learning
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chains
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and more
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 

Kürzlich hochgeladen

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Kürzlich hochgeladen (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Massive MapReduce Matrix Computations & Multicore Graph Algorithms

  • 1. Massive MapReduce Matrix Computations & Multicore Graph Algorithms DAVID F. GLEICH COMPUTER SCIENCE PURDUE UNIVERSITY 1 David Gleich · Purdue
  • 2. i “imvol3” — 2007/7/25 — 21:25 — page 257 — #1 Internet Mathematics Vol. 3, No. 3: 257-294 It’s a pleasure … Approximating Personalized Intel Intern 2005 in PageRank with Minimal Use Application Research of Web Graph Data Lab in Santa Clara David Gleich and Marzia Polito Could you run your own search engine Resulting in one of and crawl the web to compute proximations to the personalized PageRank score of ayou are We focus on your own PageRank vector if webpage. Abstract. In this paper, we consider the problem of calculating fast and a my favorite papers! highly concerned with privacy? to improve speed by limiting the amount of web graph data we need to acc Our algorithms provide both the approximation to the personalized Page as well as guidance in using only the necessary information—and therefo Yes! Theory, Experiments, Implementation! reduce not only the computational cost of the algorithm but also the m memory bandwidth requirements. We report experiments with these alg web graphs of up to 118 million pages and prove a theoretical approxima 2 David Gleich · Purdue for all. Finally, we propose a local, personalized web-search system for a f system using our algorithms.
  • 3. Massive MapReduce Matrix Computations Yangyang Hou " Purdue, CS A1 Paul G. Constantine " Austin Benson " Joe Nichols" A2 Stanford University James Demmel " A3 UC Berkeley Joe Ruthruff " A4 Jeremy Templeton" Sandia CA Funded by Sandia National Labs CSAR project. 3 David Gleich · Purdue
  • 4. By 2013(?) all Fortune 500 companies will have a data computer 4 David Gleich · Purdue
  • 5. Data computers I’ve worked with … Magellan Cluster @ ! Student Cluster @ ! Nebula Cluster @ ! NERSC! Stanford! Sandia CA! 128GB/core storage, " 3TB/core storage, " 2TB/core storage, " 80 nodes, 640 cores, " 11 nodes, 44 cores, " 64 nodes, 256 cores, " infiniband GB ethernet GB ethernet Cost $30k Cost $150k These systems are good for working with enormous matrix data! 5 David Gleich · Purdue
  • 6. How do you program them? 6 David Gleich · Purdue
  • 7. MapReduce and" Hadoop overview 7 David Gleich · Purdue
  • 8. MapReduce in a picture Like an MPI all-to-all In parallel In parallel 8 David Gleich · Purdue
  • 9. Computing a histogram " A simple MapReduce example 1 1 5 Input! 1 1 Output! 15 1 ! 1 ! 10 shuffle 1 9 Key ImageId 1 1 1 Key Color 3 Value Pixels Map 1 1 Reduce Value " 1 17 1 5 1 1 # of pixels 10 Map(ImageId, Pixels) Reduce(Color, Values) for each pixel emit" emit" Key = Color Key = (r,g,b)" Value = sum(Values) Value = 1 9 David Gleich · Purdue
  • 10. Why a limited computational model? Data scalability, fault tolerance. Maps M M The last page of a 1 M 1 2 136-page error dump. Reduce 2 M M M R 3 4 3 M R 4 M M 5 5 M Shuffle The idea ! Bring the computations to the data MR can schedule map After waiting in the queue for a month and " functions without after 24 hours of finding eigenvalues, " moving data. one node randomly hiccups. 10 David Gleich · Purdue
  • 11. Tall-and-Skinny matrices (m ≫ n) Many rows (like a billion) A A few columns (under 10,000) regression and general linear models" with many samples From tinyimages" collection Used in block iterative methods panel factorizations simulation data analysis ! big-data SVD/PCA! 11 David Gleich · Purdue
  • 12. Scientific simulations as " Tall-and-Skinny matrices Input " Time history" Parameters of simulation s f" ~100GB The simulation as a matrix The simulation as a vector 2 3 time q(x1 , t1 , s) A database parameters tall-and-skinny matrix The database is a very" 6 . . 7 of simulations 6 . 7 6 7 6q(xn , t1 , s)7 6 7 space-by-time 6q(x1 , t2 , s)7 s1 -> f1 space 6 7 f(s) = 6 7 A 6 6 . . . 7 7 s2 -> f2 A 6q(xn , t2 , s)7 6 6 7 7 . 4 . . 5 sk -> fk q(xn , tk , s) 12 David Gleich · Purdue
  • 13. Model reduction Constantine & Gleich, ICASSP 2012 A Large Scale Example Nonlinear heat transfer model 80k nodes, 300 time-steps 104 basis runs SVD of 24m x 104 data matrix 500x reduction in wall clock time (100x including the SVD) 13 David Gleich · Purdue
  • 14. PCA of 80,000,000" images First 16 columns of V as images 1000 pixels R    V SVD (principal TSQR components) 80,000,000 images Top 100 A X singular values Zero" mean" rows 14/22 MapReduce Post Processing Constantine & Gleich, MapReduce 2010. David Gleich · Purdue
  • 15. All these applications need is Tall-and-Skinny QR 15 David Gleich · Purdue
  • 16.    the solution of QR is block nor    is    orthogonal (   ) “normalize” a v Quick review of QR QR Factorization usually genera computing    in    is    upper triangular. Let    , real Using QR for regression    is given by    the solution of    0 A QR is = Q block normalization    is    orthogonal (   ) “normalize” a vector R usually generalizes to computing    in the QR    is    upper David Gleich (Sandia) triangular. MapReduce 2011 Current MapReduce algs use the normal equations 0 AT Cholesky ! RT R 1 = Q A = QR A A Q = AR R which can limit numerical accuracy 16 David Gleich (Sandia) MapReduce 2011 4/22 David Gleich · Purdue
  • 17. There are good MPI implementations. Why MapReduce? 17 David Gleich · Purdue
  • 18. Full TSQR code inhadoopy In hadoopy import random, numpy, hadoopy def close(self): class SerialTSQR: self.compress() def __init__(self,blocksize,isreducer): for row in self.data: key = random.randint(0,2000000000) self.bsize=blocksize yield key, row self.data = [] if isreducer: self.__call__ = self.reducer def mapper(self,key,value): else: self.__call__ = self.mapper self.collect(key,value) def reducer(self,key,values): def compress(self): for value in values: self.mapper(key,value) R = numpy.linalg.qr( numpy.array(self.data),'r') if __name__=='__main__': # reset data and re-initialize to R mapper = SerialTSQR(blocksize=3,isreducer=False) self.data = [] reducer = SerialTSQR(blocksize=3,isreducer=True) for row in R: hadoopy.run(mapper, reducer) self.data.append([float(v) for v in row]) def collect(self,key,value): self.data.append(value) if len(self.data)>self.bsize*len(self.data[0]): self.compress() 18 David Gleich (Sandia) MapReduceDavid 2011 Gleich · Purdue 13/22
  • 19. Tall-and-skinny matrix storage in MapReduce A : m x n, m ≫ n A1 Key is an arbitrary row-id A2 Value is the 1 x n array " for a row A3 A4 Each submatrix Ai is an " the input to a map task. 19 David Gleich · Purdue
  • 20. Numerical stability was a problem for prior approaches Constantine & Gleich, MapReduce 2010 Prior work norm ( QTQ – I ) Previous methods couldn’t ensure AR-1 that the matrix Q was orthogonal Benson, Gleich, Demmel, Submitted AR + " -1 nt Direct TSQR refineme iterative Benson, Gleich, " Demmel, Submitted 105 1020 Condition number 20 David Gleich · Purdue
  • 21. Communication avoiding QR (Demmel et al. 2008) " on MapReduce (Constantine and Gleich, 2010) Algorithm Data Rows of a matrix A1 A1 Map QR factorization of rows A2 qr Reduce QR factorization of rows A2 Q2 R2 Mapper 1 qr Serial TSQR A3 A3 Q3 R3 “Manual reduce” can make A4 qr emit A4 Q4 R4 it faster by adding a second iteration. A5 A5 qr A6 A6 Q6 R6 Computes only R and not Q Mapper 2 qr A7 A7 Serial TSQR Q7 R7 Can get Q via Q = AR-1 with A8 A8 qr Q8 R8 emit another MR iteration. R4 Use the standard R4 Reducer 1 Householder method? Serial TSQR qr emit R8 R8 Q R 21 David Gleich · Purdue
  • 22. Taking care of business by keeping track of Q 3. Distribute the pieces of Q*1 and form the true Q Mapper 1 Mapper 3 Task 2 R1 Q11 A1 Q1 R1 Q11 R Q1 Q1 R2 Q21 Q output R output R2 R3 Q31 Q21 A2 Q2 Q2 Q2 R4 Q41 R3 Q31 2. Collect R on one A3 Q3 Q3 Q3 node, compute Qs for each piece R4 Q41 A4 Q4 Q4 Q4 1. Output local Q and R in separate files 22 David Gleich · Purdue
  • 23. The price is right! Based on performance model and tests Experiment on 2500 NERSC DirectTSQR is Magellan faster than computer, 80 refinement for … and not any nodes, 640 seconds few columns slower for many processors, columns. 80TB disk 500 800M-by-10 7.5B-by-4 150M-by-100 500M-by-50 23 David Gleich · Purdue
  • 24. Ongoing work Make AR-1 stable with targeted quad-precision arithmetic to get a numerically orthogonal Q" Performance model says it’s feasible! How to handle more than ~ 10,000 columns? " Some randomized methods? Do we need quad-precision for big-data?" Standard error analysis n 𝜀 to compute sum." I’ve seen this with PageRank computations! 24 David Gleich · Purdue
  • 25. Multicore Graph " Assefaw Gehraimbem " Algorithms Arif Khan" Alex Pothen" Ryan Rossi" Mem Purdue, CS CPU Mahantesh Halappanavar" Mem PNNL Mem CPU Chen Greif" CPU David Kurokawa" Univ. British Columbia Mohsen Bayati" Funded by DOE CSCAPES Institute grant Amin Saberi" (DE-FC02-08ER25864), NSF CAREER grant Ying Wang (now Google)" 1149756-CCF, and the Center for Adaptive Super Computing Software Multithreaded Stanford 25 Architectures (CASS-MT) at PNNL. David Gleich · Purdue
  • 26. Network alignment" What is the best way of matching " graph A to B? w v s r t u A B 26 David Gleich · Purdue
  • 27. the Figure 2. The NetworkBLAST local network alignment algorithm. Given two input s) or odes lem Network alignment" networks, a network alignment graph is constructed. Nodes in this graph correspond to pairs of sequence-similar proteins, one from each species, and edges correspond to conserved interactions. A search algorithm identifies highly similar subnetworks that follow a prespecified interaction pattern. Adapted from Sharan and Ideker.30 n the ent; nied ped lem net- one one plest ying eins ome the be- d as aph ever, ap- From Sharan and Ideker, Modeling cellular machinery through biological rked network comparison. Nat. Biotechnol. 24, 4 (Apr. 2006), 427–433. 27 , we Figure 3. Performance comparison of computational approaches. mon- David Gleich · Purdue
  • 28. Network alignment" What is the best way of matching " graph A to B using only edges in L? w v s r wtu t u A L B 28 David Gleich · Purdue
  • 29. Network alignment" Matching? 1-1 relationship" Best? highest weight and overlap w v Overlap s r wtu t u A L B 29 David Gleich · Purdue
  • 30. Our contributions A new belief propagation method (Bayati et al. 2009, 2013)" Outperformed state-of-the-art PageRank and optimization- based heuristic methods High performance C++ implementations (Khan et al. 2012)" 40 times faster (C++ ~ 3, complexity ~ 2, threading ~ 8)" 5 million edge alignments ~ 10 sec" www.cs.purdue.edu/~dgleich/codes/netalignmc 30 David Gleich · Purdue
  • 32. Each iteration involves Let x[i] be the score for Matrix-vector-ish computations each pair-wise match in L with a sparse matrix, e.g. sparse matrix vector products in a semi- for i=1 to ... ring, dot-products, axpy, etc. update x[i] to y[i] Bipartite max-weight matching compute a using a different weight vector at max-weight match with y each iteration update y[i] to x[i] " (using match in MR) No “convergence” " 100-1000 iterations 32 David Gleich · Purdue
  • 33. The methods Each iteration involves! Belief Propagation! ! Listing 2. A belief-propagation message passing procedure for network alignment. See the text for a description of othermax and round heuristic. D 1 y(0) = 0, z(0) = 0, d(0) = 0, S(k) = 0 t Matrix-vector-ish computations ! 2 3 for k = 1 to niter T F = bound0, [ S + S(k) ] Step 1: compute F O s with a sparse matrix, e.g. sparse 4 d = ↵w + Fe Step 2: compute d a ! 5 y(k) = d othermaxcol(z(k 1) ) Step 3: othermax i matrix vector products in a semi- 6 z(k) = d othermaxrow(y(k 1) ) i h S(k) = diag(y(k) + z(k) d)S F Step 4: update S ! 7 ring, dot-products, axpy, etc. 8 (y(k) , z(k) , S(k) ) k (y(k) , z(k) , S(k) )+ O a 9 (1 k )(y(k 1) , z(k 1) , S(k 1) ) Step 5: damping e 10 11 ! round heuristic (y(k) ) Step 6: matching round heuristic (z(k) ) Step 6: matching I 12 end Bipartite max-weight matching return y(k) or z(k) with the largest objective value ! 13 t p using a different weight vector at m ! w each iteration interpretation, the weight vectors are usually called messages as they communicate the “beliefs” of each “agent.” In this A particular problem, the neighborhood of an agent represents 33 all of the other edges in graph L incident on the same vertex s in graph A (1st vector), all edges in L incident on the same David in graph BPurdue vertex Gleich · (2nd vector), or the edges in L that are fi “
  • 34. The NEW methods Each iteration involves! Belief Propagation! el ! Listing 2. A belief-propagation message passing procedure for network alignment. See the text for a description of othermax and round heuristic. D l Paral (0) (0) (0) (k) y = 0, z = 0, d = 0, S = 0 1 t ! F = bound Matrix-vector-ish computations for k = 1 to n [ S + S ] Step 1: compute F 2 3 iter 0, (k) T O s with a sparse matrix, e.g. sparse d = ↵wd+ Fe Step 2: compute dStep 3: othermax 4 a ! y = d othermaxrow(y )) = 5 (k) othermaxcol(z (k 1) i matrix vector products in a semi- z 6 (k) (k) (k 1) i h S = diag(y + z d)S F Step 4: update S (k) (k) ! (y , z , S ) (y , z , S )+ 7 ring, dot-products, axpy, etc. 8 (k) (k) (k) k (k) (k) (k) O a 9 (1 k )(y(k 1) , z(k 1) , S(k 1) ) Step 5: damping e 10 11 ! round heuristic (y(k) ) Step 6: matching round heuristic (z(k) ) Step 6" I 12 end approx matching Approximate bipartite max- return y or z with the largest objective value (k) (k) ! 13 t p weight matching is used here m ! w instead! interpretation, the weight vectors are usually called messages as they communicate the “beliefs” of each “agent.” In this A particular problem, the neighborhood of an agent represents 34 all of the other edges in graph L incident on the same vertex s in graph A (1st vector), all edges in L incident on the same David in graph BPurdue vertex Gleich · (2nd vector), or the edges in L that are fi “
  • 35. MR Approximation doesn’t hurt the between the Library of Congress r 0.2 ApproxMR pedia categories (lcsh-wiki). While BP e hierarchical tree, they also have belief propagation algorithm ApproxBP r types of relationships. Thus we 0 0 5 10 15 20 l graphs. The second problem is an expected degree of noise in L (p ⋅ n) rary of Congress subject headings French National Library: Rameau. 1 d weights in L are computed via a heading strings (and via translated of correct match au). These problems are larger than 0.8 BP a Fraction fraction correct indis nd App tingu roxB NMENT WITH APPROXIMATE 0.6 isha P ATCHING ble are ss the question: how does the be- 0.4 d the BP method change when we matching procedure from Section V MR 0.2 ApproxMR step in each algorithm? Note that BP ching in the first step of Klau’s ApproxBP ch) because the problems in each 0 we parallelize over perturb onealso 0 5 10 15 20 Randomly rows. Note expected degree of noise in L (p ⋅ n) is much more integral to Klau’s B power-law graph to get A, The amount of random-ness in L in average expected degree edure. Generate L by the true-we Fig. 2. Alignment with a power-law graph shows the large effect that For the BP procedure, ing problem to evaluate the quality approximate rounding can have on solutions from Klau’s method (MR). With 35 match + random edges Klau’s method, the results of the that method, using exact rounding will yield the identity matching for all David Gleich · Purdue problems (bottom figure), whereas using the approximation results in over a
  • 36. A local dominating edge method for bipartite matching j i The method guarantees r s •  ½ approximation •  maximal matching based on work by Preis (1999), Manne and wtu Bisseling (2008), and t u Halappanavar et al (2012) A L B A locally dominating edge is an edge heavier than all neighboring edges. For bipartite Work on smaller side only 36 David Gleich · Purdue
  • 37. A local dominating edge method for bipartite matching j Queue all vertices i r s Until queue is empty! In Parallel over vertices" Match to heavy edge and if there’s a conflict, wtu u check the winner, and t find an alternative for A L B the loser Add endpoint of non- A locally dominating edge is an edge dominating edges to heavier than all neighboring edges. the queue For bipartite Work on smaller side only 37 David Gleich · Purdue
  • 38. A local dominating edge method for bipartite matching j i Customized first iteration r s (with all vertices) Use OpenMP locks to update choices wtu t u Use sync_and_fetch_add A L B for queue updates. A locally dominating edge is an edge heavier than all neighboring edges. For bipartite Work on smaller side only 38 David Gleich · Purdue
  • 39. Remaining multi-threading procedures are straightforward Standard OpenMP for matrix-computations" use schedule=dynamic to handle skew We can batch the matching procedures in the BP method for additional parallelism for i=1 to ... update x[i] to y[i] save y[i] in a buffer when the buffer is full compute max-weight match for all in buffer and save the best 39 David Gleich · Purdue
  • 40. Performance evaluation (2x4)-10 core Intel E7-8870, 2.4 GHz (80-cores) 16 GB memory/proc (128 GB) Scaling study Mem Mem Mem Mem 1.  Thread binding " CPU CPU CPU CPU scattered vs. compact CPU CPU CPU CPU 2.  Memory binding " Mem Mem Mem Mem interleaved vs. bind 40 David Gleich · Purdue
  • 41. Scaling BP with no batching lcsh-rameau, 400 iterations 25 scatter and interleave 20 Speedup 15 115 seconds for 40-thread 10 5 1450 seconds for 1-thread 0 0 20 40 60 80 Threads 41 David Gleich · Purdue
  • 42. Ongoing work Better memory handling! " numactl, affinity insufficient for full scaling Better models!" These get to be much bigger computations. Distributed memory." Trying to get an MPI version, looking into GraphLab 42 David Gleich · Purdue
  • 43. PageRank was created by ageRank details Google to rank by Google PageRank web-pages 3 3 2 5 The Model 0 0 0 3 2 1/ 6 1/ 2 0 2 5 6 1/ 6 0 0 1/ 3 0 0 7 1. follow edges uniformlyPwith j 0 ! 6 probability1/ 3, 0 0 7 eT P=eT 1/ 6 1/ 2 0 0 0 4 4 4 1/ 6 0 1/ 2 0 and 5 1/ 6 0 1/ 2 1/ 3 0 1 2. randomly jump 0 with probability 1 6 | 1/ 6 0 {z 0 1 } 0 1 6 1 , we’ll assume everywhere is P equally likely T 0 “jump” ! v = [ 1 ... 1 ] n n eT v=1 î ó Markov chain P + (1 )ve T x=x The places we find the unique x ) j 0, eT x = 1. are im- surfer most often Linear system ( portant pages. P)x = (1 )v 43/40 Ignored dangling nodes patched back to v algorithms later Gleich, Purdue UTRC Seminar David
  • 44. ther uses for PageRank ensitivity? else people use PageRank to do ProteinRank GeneRank ObjectRank NM_003748 NM_003862 Contig32125_RC U82987 AB037863 NM_020974 Contig55377_RC NM_003882 NM_000849 Contig48328_RC Contig46223_RC NM_006117 NM_003239 NM_018401 AF257175 AF201951 NM_001282 Contig63102_RC NM_000286 Contig34634_RC NM_000320 AB033007 AL355708 NM_000017 NM_006763 AF148505 Contig57595 NM_001280 AJ224741 U45975 Contig49670_RC Contig753_RC Contig25055_RC Contig53646_RC Contig42421_RC Contig51749_RC EventRank AL137514 NM_004911 NM_000224 NM_013262 Contig41887_RC NM_004163 AB020689 NM_015416 Contig43747_RC IsoRank NM_012429 AB033043 AL133619 NM_016569 NM_004480 NM_004798 Contig37063_RC NM_000507 AB037745 Contig50802_RC NM_001007 Contig53742_RC NM_018104 Contig51963 Contig53268_RC NM_012261 NM_020244 Contig55813_RC Contig27312_RC Contig44064_RC NM_002570 NM_002900 AL050090 NM_015417 Contig47405_RC NM_016337 Contig55829_RC Contig37598 Contig45347_RC NM_020675 NM_003234 AL080110 AL137295 Contig17359_RC NM_013296 NM_019013 AF052159 Contig55313_RC NM_002358 NM_004358 Contig50106_RC NM_005342 NM_014754 U58033 Contig64688 NM_001827 Contig3902_RC Contig41413_RC NM_015434 NM_014078 NM_018120 NM_001124 L27560 Contig45816_RC AL050021 NM_006115 NM_001333 NM_005496 Contig51519_RC Contig1778_RC NM_014363 NM_001905 NM_018454 NM_002811 Clustering NM_004603 AB032973 NM_006096 D25328 Contig46802_RC X94232 NM_018004 Contig8581_RC Contig55188_RC Contig50410 Contig53226_RC NM_012214 NM_006201 NM_006372 Contig13480_RC AL137502 Contig40128_RC NM_003676 NM_013437 Contig2504_RC AL133603 NM_012177 R70506_RC NM_003662 NM_018136 NM_000158 NM_018410 Contig21812_RC NM_004052 Contig4595 Contig60864_RC NM_003878 U96131 NM_005563 NM_018455 Contig44799_RC NM_003258 P)x = (1 NM_004456 NM_003158 NM_014750 Contig25343_RC NM_005196 Contig57864_RC NM_014109 NM_002808 Contig58368_RC Contig46653_RC ( )v NM_004504 M21551 NM_014875 NM_001168 NM_003376 NM_018098 AF161553 NM_020166 NM_017779 (graph partitioning) NM_018265 AF155117 NM_004701 NM_006281 Contig44289_RC NM_004336 Contig33814_RC NM_003600 NM_006265 NM_000291 NM_000096 NM_001673 NM_001216 NM_014968 NM_018354 NM_007036 NM_004702 Contig2399_RC NM_001809 Contig20217_RC NM_003981 NM_007203 NM_006681 AF055033 NM_014889 NM_020386 NM_000599 Contig56457_RC NM_005915 Contig24252_RC Contig55725_RC NM_002916 NM_014321 NM_006931 AL080079 Contig51464_RC NM_000788 NM_016448 X05610 NM_014791 Contig40831_RC AK000745 NM_015984 NM_016577 Contig32185_RC AF052162 AF073519 NM_003607 NM_006101 NM_003875 Contig25991 Contig35251_RC NM_004994 NM_000436 NM_002073 NM_002019 NM_000127 NM_020188 Sports ranking AL137718 Contig28552_RC Contig38288_RC AA555029_RC NM_016359 Contig46218_RC Contig63649_RC AL080059 10 20 30 40 50 60 70 he (links : 1examined and understood se GD )x = w to Food webs nd “nearby” important Centrality enes. Teaching 44/40 Conjectured new papers: TweetRank (Done, WSDM 2010), WaveRank, he jump : examined, understood, and u Rank, PaperRank, UniversityRank, LabRank. Gleich, Purdue last one involves a David I think the UTRC Seminar
  • 45. Multicore PageRank … similar story … Serialized preprocessing Parallelize the linear algebra via an " asynchronous Gauss-Seidel iterative method ~10x scaling on same (80-core) machine " (1M nodes, 15M edges, synthetic) 45 David Gleich · Purdue
  • 46. Questions? Papers on my webpage www.cs.purdue.edu/homes/dgleich Codes github.com/arbenson/mrtsqr www.cs.purdue.edu/homes/dgleich/codes/netalignmc github.com/dgleich/prpack 46 David Gleich · Purdue