SlideShare a Scribd company logo
1 of 48
Download to read offline
Two matrix computations for
                                          numerical graph problems:
                                   PageRank and Network Alignment
                                                           David F. Gleich
                                                    Sandia National Labs
                                                           Livermore, CA

                                                   IBM Almaden Seminar
                                                            San Jose, CA
                                                     January 17th, 2011

                                                       In collaboration with
                                     Andrew Gray (UBC), Chen Greif (UBC)
                             Tracy Lau (UBC/IBM?), Mohsen Bayati (Stanford)
                           Ying Wang (Stanford), Margot Gerritsen (Stanford)
                                                     Amin Saberi (Stanford)

                                       Supported by the Library of Congress
                                          and Microsoft Live Labs Fellowship
David F. Gleich (Sandia)                                       IBM Almaden   1 / 47
Sketch of talk
   two algorithms inner-outer and belief propagation
   two problems PageRank and network alignment
   big graphs for both
   iterative matrix computations for both
   multi-core parallel results inner-outer only

standard flow
problem → algorithm → theory (hopefully) → empirical results
except “fun” results first

some open questions at end



    David F. Gleich (Sandia)                           IBM Almaden   2 / 47
A PageRank algorithm
Instead of the power method,
                                                          Web Data, α = 0.99
   x(k+1) = αPx(k) + (1 − α)v.                            Nodes                  105,896,555
                                                          Edges                3,783,733,648
Use an outer iteration                                    Power Method     964 its 5.15 hrs.
                  (k+1)                                   Inner-Outer      857 its 4.45 hrs.
( − βP)x
        = (α − β)Px(k) + (1 − α)v .                       Network-Alignment Data, α = 0.95
                                  f                       Nodes             4,219,893,141
                                                          Edges            91,886,357,440
with the inner iteration                                  Power Method     271 its 54.6 hrs.
                                                          Inner-Outer      188 its 36.2 hrs.
y(j+1) = βPy(j) + f.
                                                              Codes and data available.
                    It’s faster!
Note   Web data is uk-2006 from UNIMI’s (Univ. Milano) DSI group.


       David F. Gleich (Sandia)                                                   IBM Almaden   3 / 47
Network Alignment

 r                       Square            s         A is about 200,000 vertices
                                                     B is about 300,000 vertices
                                                     L has around 5,000,000 edges
                                                     5 million variable integer QP
                                    t                ∼ 90% of optimality in minutes.
                t

     A                          L           B

                                    Codes and data available.


                                           DEMO

     David F. Gleich (Sandia)                                            IBM Almaden   4 / 47
PageRank
                  PageRank Algorithms
                Inner-outer Performance
                    Network Alignment
PageRank                    Motivation
                     Network alignment
Slide 5 of 47
                    Network Alignment
                           Algorithms
                                Results
                            Conclusion
PageRank is a ...
 ... modified Markov chain,
 ... damped random walk on a graph,
 ... pinball game on the reverse web, or
 ... random surfer model.
Proposed by Brin and Page in 1998, but similar ideas from
earlier... (Sebastiano Vigna is working on tracing the history –
the current history dates to 1949)


 Langville and Meyer (2006) is a good
 general reference; Berkhin (2005) has
 lots of goodies; and Des Higham called
 it pinball.



    David F. Gleich (Sandia)   PageRank                 IBM Almaden   6 / 47
The PageRank Random Surfer
                    important pages ↔ highly probable to visit
                             3
                                                  1. follow out-edges uniformly with
                                                     probability α, and
               2                         5
                                                  2. randomly jump according to v with
                             4                       probability 1 − α, we’ll assume
                                                       = 1/ n.
               1                         6     Induces a Markov chain model

                                                          αP + (1 − α)veT x(α) = x(α)

        1/ 6 1/ 2      ↓                      or the linear system
                  0           0     0   0
                                        
        1/ 6 1/ 2
               0  0          1/ 3   0   0
        1/ 6     0          1/ 3   0   0                       ( − αP)x(α) = (1 − α)v
        1/ 6 0  1/ 2         0     0   0
          1/ 6 0 1/ 2        1/ 3   0   1
          1/ 6 0 0            0     1   0      But it’s just a model.
                        P
Note   I’m omitting important details about dangling nodes, I’ll mention them a bit later.

       David F. Gleich (Sandia)                       PageRank                               IBM Almaden   7 / 47
What is α?
                                             Author                                                      α
                                             Brin and Page (1998)                                        0.85
                                             Najork et al. (2007)                                        0.85
                                             Litvak et al. (2006)                                        0.5
                                             Katz (1953)                                                 0.5
                                             Experiment (2009)                                           0.63 ≈       0.85 · 0.5
                                             Algorithms (...)                                            ≥ 0.85
Our regime                                                               3.0         InfBeta( 3.2 , 2.0 , 1.9e−05 , 0.0019 )


                                             α from browsers
   α ≥ .85 otherwise                                                     2.5

   power is fast.                                                        2.0

                                                               density
                                                                         1.5
   P only available                                                      1.0
   for mat-vec                                                           0.5
   otherwise custom                                                      0.0
   techniques                                                                  0.0        0.2     0.4      0.6      0.8        1.0
                                                                                                   Raw α
   possible.
                              Constantine, Flaxman, Gleich, Gunawardana, Tracking the Random Surfer, WWW2010
                                                 Constantine and Gleich, Random Alpha PageRank, Internet Math.
   David F. Gleich (Sandia)                                       PageRank                                                IBM Almaden   8 / 47
PageRank
                        PageRank Algorithms
                      Inner-outer Performance
                          Network Alignment
PageRank Algorithms               Motivation
                           Network alignment
Slide 9 of 47
                          Network Alignment
                                 Algorithms
                                      Results
                                  Conclusion
PageRank formulations and theory
                               Codes                                           Theory



                                           Strongly prefer-
                                           ential PageRank             PseudoRank
                                                                                      Eigensystems
Graph or                  Substochastic    Weakly prefer-
                                                                       PageRank
Web graph                 matrix           ential PageRank
                                                                                      Linear systems
                                           Sink preferential
                                           PageRank
                                                                 Other transformations

                             v            teleportation vector
                             ¯
                             P            substochastic matrix (for algorithms)
                             d            dangling node vector (d = e − PT e)
                  P + vdT → P
                  ¯                       Strongly preferential PageRank
                  P + dT → P
                  ¯                       Weakly preferential PageRank ( = v)
                             P            PageRank stochastic matrix (for theory)
           ( − αP)x = (1 − α)v            PageRank linear system
    David F. Gleich (Sandia)               PageRank Algorithms                      IBM Almaden   10 / 47
Motivation
Why another PageRank algorithm?

 An ideal algorithm is
   1. reliable
   2. fast over a range of α’s                            fancy
               → Use Matlab’s “”
   3. efficient for big problems
               → Use a Gauss-Seidel or
                 custom Richardson method
   4. uses only matvec products
              → Use the inner-outer iteration
   5. uses only 2 vectors of memory
              → Use the power method                      simple




      David F. Gleich (Sandia)    PageRank Algorithms   IBM Almaden   11 / 47
Simple algorithms
The power method                                     The Richardson method
For Ax = λx, the iteration                           For Ax = b, the iteration

   x(k+1) = Ax(k) / Ax(k)                                 x(k+1) = x(k) + ω (b − Ax(k) )
                                                                                 residual
computes the largest
eigenpair.                                           computes x.
The PageRank Markov chain                            The PageRank linear system is
eigenvector problem is
                                                                   ( − αP)x = (1 − α)v.
  [αP + (1 − α)veT ]x = x
                                                     For ω = 1
If eT x(0) = 1 and             j   ≥0                         x(k+1) = αPx(k) + (1 − α)v

x(k+1) = αPx(k) +(1−α)v eT x(k)                       and the Richardson iteration is
                                        =1           the power method.

    David F. Gleich (Sandia)                 PageRank Algorithms                   IBM Almaden   12 / 47
Inner-Outer

            Note              PageRank is easier when α is smaller
            Thus              Solve PageRank with itself using β < α!

Outer         ( − βP)x(k+1) = (α − β)Px(k) + (1 − α)v ≡ f(k)

Inner         y(0) = x(k)            y(j+1) = βPy(j) + f(k)

  A new parameter? What is β?                                  0.5
  How many inner iterations?                                   Until a residual of 10−2




                                                                       Gleich, Gray, Greif, Lau, SISC 2010.
   David F. Gleich (Sandia)              PageRank Algorithms                            IBM Almaden   13 / 47
Inner-Outer algorithm
                                           uses only three vectors
  Input: P, v, α, τ, (β = 0.5, η = 10−2 ) of memory
  Output: x
  1: x ← v                                Convergence?
  2: y ← Px                                if 0 ≤ β ≤ α, with “ex-
  3: while αy + (1 − α)v − x 1 ≥ τ         act” iteration
  4:     f ← (α − β)y + (1 − α)v           but also (small theo-
  5:     repeat                            rem) with any η!
  6:         x ← f + βy
  7:         y ← Px                       Parameters?
  8:     until f + βy − x 1 < η            β = 0.5, η = 10−2 often
  9: end while                             faster than the power
  10: x ← αy + (1 − α)v                    method
                                                                        (or just a titch slower)

Note Note that the inner-loop checks its condition after doing one iteration. An inexact iteration is
always at least as good as one-step of the power method.


       David F. Gleich (Sandia)                   PageRank Algorithms                             IBM Almaden   14 / 47
Inner-Outer Parameters
                                    Question: What parameters should we pick?
                                                 in−2004, α=0.99                                                          in−2004, α=0.99
                     1500                                                                                    1500
                                                                          power                                                                   power
                                                                          η = 1e−01                                                               β = 0.10
                     1400                                                 η = 1e−02                          1400                                 β = 0.30
                                                                          η = 1e−03                                                               β = 0.50
                                                                          η = 1e−04                                                               β = 0.70
                     1300                                                                                    1300
                                                                          η = 1e−05


                     1200                                                                                    1200
   Multiplications




                                                                                           Multiplications
                     1100                                                                                    1100


                     1000                                                                                    1000


                     900                                                                                     900


                     800                                                                                     800


                     700                                                                                     700     −4          −3          −2              −1
                        0     0.1    0.2    0.3      0.4    0.5    0.6   0.7    0.8                                 10         10           10           10
                                                        β                                                                       η


                            α = 0.99, in-2004 graph (1.3M nodes, 16.9M edges)
                                                    Just use β = 0.5 and η = 10−2 !
Note                 Many similar plots appear in my thesis.

                      David F. Gleich (Sandia)                                 PageRank Algorithms                                           IBM Almaden      15 / 47
The Competition
Our Requirement: only Px is available!
   Quadratic Extrapolation (Kamvar, Haveliwala, et al.)
   Aggregation/Disaggregation
   (Langville and Meyer; Stewart)
   Permutations/Strong Components
   (Del Corso, Gulli, and Romani; Langville and Meyer)
   Krylov methods (Gleich, Zhukov, Berkhin;
   Del Corso, Gulli, and Romani)
   Padé-type extrapolation (Brezinski and Redivo-Zaglia)

   Arnoldi methods (Greif and Golub)

   Gauss-Seidel (Arasu, Novak, Tomkins, and Tomlin)


    David F. Gleich (Sandia)   PageRank Algorithms   IBM Almaden   16 / 47
PageRank
                   PageRank Algorithms
                 Inner-outer Performance

Inner-outer          Network Alignment
                             Motivation
Performance           Network alignment
Slide 17 of 47       Network Alignment
                            Algorithms
                                 Results
                             Conclusion
Datasets

name                                 size     nonzeros avg nz/row
ubc-cs-2006                       51,681       673,010      13.0
ubc-2006                         339,147     4,203,811      12.4
eu-2005                          862,664    19,235,140      22.3
in-2004                        1,382,908    16,917,053      12.2
wb-edu                         9,845,725    57,156,537        5.8
arabic-2005                   22,744,080   639,999,458      28.1
sk-2005                       50,636,154 1,949,412,601      38.5
uk-2007                      105,896,555 3,738,733,648      35.3




  David F. Gleich (Sandia)           Inner-outer Performance   IBM Almaden   18 / 47
One example
                                    wb−edu, α = 0.85                                                               wb−edu, α = 0.99
             0
            10                                                                                  0
                                                                                               10


             −1                                       0
            10                                     10                                           −1
                                                                                               10                               10
                                                                                                                                      0




             −2
            10                                        −2                                        −2
                                                   10                                          10                               10
                                                                                                                                      −2

                                                            5   10 15 20                                                                         20    40
             −3                                                                                 −3
            10                                                                                 10
 Residual




                                                                                    Residual
             −4                                                                                 −4
            10                                                                                 10


             −5                                                                                 −5
            10                                                                                 10


             −6                                                                                 −6
            10                                                                                 10
                      power                                                                          power
                      inout                                                                          inout
             −7                                                                                 −7
            10                                                                                 10
                     10       20   30        40        50       60   70     80                       200     400          600              800        1000   1200
                                        Multiplication                                                               Multiplication


                                    τ = 10−7 , β = 0.5, η = 10−2 ;
                               wb-edu graph (9.8M nodes, 57.M edges)


             David F. Gleich (Sandia)                                 Inner-outer Performance                                                     IBM Almaden   19 / 47
Advantage Inner-Outer
                                tol.   graph          work    (mults.)                   time     (secs.)
                                                     power      in/out       gain       power      in/out   gain

                                10−3   ubc-cs-2006    226          141       37.6%         1.9       1.2    35.2%
                                       ubc            242          141       41.7%        13.6       8.3    38.4%
  α = 0.99, β = 0.5, η = 10−2

                                       in-2004        232          129       44.4%        51.1      30.4    40.5%
                                       eu-2005        149          150       -0.7%        26.9      28.3    -5.3%
                                       wb-edu         221          130       41.2%       291.2     184.6    36.6%
                                       arabic-2005    213          139       34.7%       779.2     502.5    35.5%
                                       sk-2005        156          144       7.7%       1718.2    1595.9    7.1%
                                       uk-2007        145          125       13.8%      2802.0    2359.3    15.8%

                                10−5   ubc-cs-2006    574          432       24.7%         4.7       3.6    22.9%
                                       ubc            676          484       28.4%        37.7      27.8    26.2%
                                       in-2004        657          428       34.9%       144.3      97.5    32.4%
                                       eu-2005        499          476       4.6%         89.3      87.4    2.1%
                                       wb-edu         647          417       35.5%       850.6     572.0    32.8%
                                       arabic-2005    638          466       27.0%      2333.5    1670.0    28.4%
                                       sk-2005        523          460       12.0%      5729.0    5077.1    11.4%
                                       uk-2007        531          463       12.8%     10225.8    8661.9    15.3%

                                10−7   ubc-cs-2006    986          815       17.3%         8.0       6.8    15.4%
                                       ubc           1121          856       23.6%        62.5      49.0    21.6%
                                       in-2004       1108          795       28.2%       243.1     179.8    26.0%
                                       eu-2005        896          814       9.2%        159.9     148.6    7.1%
                                       wb-edu        1096          777       29.1%      1442.9    1059.0    26.6%
                                       arabic-2005   1083          843       22.2%      3958.8    3012.9    23.9%
                                       sk-2005        951          828       12.9%     10393.3    9122.9    12.2%
                                       uk-2007        964          857       11.1%     18559.2   16016.7    13.7%


        David F. Gleich (Sandia)                             Inner-outer Performance                         IBM Almaden   20 / 47
Parallelization
parallel Px
xi=x[i]/degree(i); for (j in edges of i) { atomic(y[j]+=xi); }.

                                                                8
                                                                     linear
                                                                     power relative                                         6
                                                                7    inout relative
                                                                     1e−3 power
                                                                     1e−3 inout
                         Speedup relative to best 1 processor




                                                                6    1e−5 power
                                                                     1e−5 inout
                                                                     1e−7 power
                                                                5    1e−7 inout


                                                                4
                                                                                                                            5


                                                                3


                                                                2


                                                                1

                                                                                                                            4
                                                                                                                                8
                                                                0
                                                                 1   2        3       4         5               6   7   8
                                                                                  Number of processors
     David F. Gleich (Sandia)                                                         Inner-outer Performance                       IBM Almaden   21 / 47
PageRank
                      PageRank Algorithms
                    Inner-outer Performance

Network Alignment       Network Alignment
                                Motivation
Motivation               Network alignment
Slide 22 of 47          Network Alignment
                               Algorithms
                                    Results
                                Conclusion
David F. Gleich (Sandia)   Network Alignment Motivation   IBM Almaden   23 / 47
David F. Gleich (Sandia)   Network Alignment Motivation   IBM Almaden   24 / 47
Alignment and overlap: The goal
                                                                                 3
       Educational psychology
                                                                                 2   b2
                                     a
                                                                                 1   b1
    Psychiatric hospitals        b           Mental health
                                                                            is better than
                                                                                 3



                                                                                 2   b2
       Health organizations                       Health
                                                                                 1   b1
             Wikipedia                            LCSH



                             r                    Square                     s




                                                               t
                                         t

                                 A                       L                   B
      Maximize squares/overlap in 1-1 matching
    Find a good mapping to investigate similarity!
  David F. Gleich (Sandia)                   Network Alignment Motivation                 IBM Almaden   25 / 47
PageRank
                      PageRank Algorithms
                    Inner-outer Performance
                        Network Alignment
Network alignment               Motivation
                         Network alignment
Slide 26 of 47
                        Network Alignment
                               Algorithms
                                    Results
                                Conclusion
Integrating Matching and Overlap: A QP
Squares produce overlap → bonus for some                                             and        j   →             j



Variables, Data
                                                                   r               Square               s

     = edge indicator                      e ∈L
     = weight of edges                     e = (t, )
Sj    squares in S                            =      t                         t
                                                                                            t

                                                                       A               L                 B

Problem
                                                                                  1
m ximize                               +             j
                                                                  m ximize wT x + 2 xT Sx
                                                                           x
                          :e ∈L            ,j∈S          ↔        subject to          Ax ≤ e
subject to                      is a matching                                           ∈ {0, 1}

     David F. Gleich (Sandia)                 Network alignment                                 IBM Almaden   27 / 47
An example with overlap
                                                       (2,2   )
                                                                  0    0   0   0   0   1   0   1   0   1   1   1     0.6
                                                                                                                                
                                                       (2,1   )   0    0   0   0   1   0   1   0   1   0   0   0   0.9 
                                                       (2,3   )   0    0   0   0   1   0   1   0   1   0   0   0   0.3 
                                                       (2,4   )
                                                                  0    0   0   0   1   0   1   0   1   0   0   0   0.1 
                                                                                                                        
5                       0.5                    5                  
                                                       (1,2   )   0    1   1   1   0   0   0   0   0   0   0   1   0.9 
                                                       (1,1   )
                                                                  1    0   0   0   0   0   0   0   0   0   0   0   0.6 
                                                                                                                        
    4                   0.4                4
                0.1            0.1                     (3,2   )
                                                                  
                                                                  0    1   1   1   0   0   0   0   0   0   0
                                                                                                                   ,
                                                                                                                0   0.3 
                                                                                                                            ,
                                                       (3,3   )
                                                                  1    0   0   0   0   0   0   0   0   0   0   0   0.5 
                                                                                                                        
                                                                  
    2                   0.6                2           (4,2   )   0    1   1   1   0   0   0   0   0   0   0   0   0.1 
            0.3                 0.3
                                                       (4,4   )   1    0   0   0   0   0   0   0   0   0   0   0   0.4 
            3           0.5          3
                                                       (5,5   )
                                                                                                                       
                                                                    1   0   0   0   0   0   0   0   0   0   0   0     0.5
                                                       (6,1   )     1   0   0   0   1   0   0   0   0   0   0   0     1.0
          0.9                        0.9
                                                   edge order                               S                                 w
        6                                                              1       1   1   1   0   0   0   0   0   0   0   0
                                                                  0            0   0   0   1   1   0   0   0   0   0   0
                                                                  0            0   0   0   0   0   1   1   0   0   0   0
                                                                  0            0   0   0   0   0   0   0   1   1   0   0
                                                                                                                          
                               1.0                                0            0   0   0   0   0   0   0   0   0   1   0
                                                              A = 0            0   0   0   0   0   0   0   0   0   0   1
    1                   0.6                1                      1            0   0   0   1   0   1   0   1   0   0   0
                                                                  0            1   0   0   0   1   0   0   0   0   0   1
                                                                  0            0   1   0   0   0   0   1   0   0   0   0
                                                                            0   0   0   1   0   0   0   0   0   1   0   0
                                                                            0   0   0   0   0   0   0   0   0   0   1   0


    David F. Gleich (Sandia)                       Network alignment                                                IBM Almaden   28 / 47
Network alignment

                                  NETWORK ALIGNMENT
                                                                      β
                                  m ximize αwT x + 2 xT Sx
                                  subject to       Ax ≤ e,                ∈ {0, 1}

History                                                                   Sparse problems
      QUADRATIC ASSIGNMENT                                                Sparse L often ignored (a
      MAXIMUM COMMON SUBGRAPH                                             few exceptions).
                                                                          Our paper tackles that
      PATTERN RECOGNITION
                                                                          case explicitly.
      ONTOLOGY MATCHING                                                   We do large problems,
      BIOINFORMATICS                                                      too.

 Conte el al. Thirty years of graph matching, 2004.; Melnik et al. Similarity ooding, 2004; Blondel et al. SIREV 2004;
                                               Singh et al. RECOMB 2007; Klau, BMC Bioinformatics 10:S59, 2009.
       David F. Gleich (Sandia)                   Network alignment                              IBM Almaden   29 / 47
PageRank
                      PageRank Algorithms
                    Inner-outer Performance

Network Alignment       Network Alignment
                                Motivation
Algorithms               Network alignment
Slide 30 of 47          Network Alignment
                               Algorithms
                                    Results
                                Conclusion
Algorithms

 1. L P                           Convert to LP, relax, solve (Skipped)
 2. T I G H T L P                 Improve the LP (Skipped)

 3. I S O R A N K Use a PageRank heuristic (Singh et al. 2007)
 4. B P           Max-product belief propagation for the LP
 5. T I G H T B P                 BP for the TIGHTLP (skipped)
 6. M R                           Sub-gradient descent on TIGHTLP (Klau 2009;
      skipped)




Note Not discussed: early heuristic: Flannick et al. Genome Research 16:1169–1181, 2006; an
independent BP algorithm: Bradde et al. arXiv:0905.1893, 2009
                                                                           Singh et al. RECOMB2007; Klau, 2009
       David F. Gleich (Sandia)             Network Alignment Algorithms                      IBM Almaden   31 / 47
IsoRank
                              m ximize αwT x + (β/ 2)xT Sx
                              subject to    0 ≤ Ax ≤ e,               ∈ 0, 1
Solve PageRank on S and w!


          1. Normalize S to stochastic P
          2. Normalize w to stochastic v
          3. Compute power iterations and round at each
          4. Output best solution


   Need to evaluate a range of PageRank α
   Designed for complete bipartite L

                                               Singh et al. RECOMB2007; Ninove Ph.D. Thesis Louvain, 2008
   David F. Gleich (Sandia)            Network Alignment Algorithms                    IBM Almaden   32 / 47
Inner-outer for this problem?
             Only on the cores of the two graphs.
        Dataset        Size             Non-Zeros
        LCSH-2                                    59,849                   227,464
        WC-3                                      70,509                   403,960
        Product Graph               4,219,893,141                    91,886,357,440

                             α = 0.95, w from text similarity
                    Inner-Outer        188 mat-vec                   36.2 hours
                    Power              271 mat-vec                   54.6 hours

                      Caveat: I’m ignoring all the details of
                         actually using this technique.



  David F. Gleich (Sandia)            Network Alignment Algorithms                IBM Almaden   33 / 47
Belief propagation: Our algorithm

Summary                                      History
  Construct a probability                             BP used for computing
  model where the most                                marginal probabilities and
  likely state is the solution!                       maximum aposterori
  Locally update information                          probability
  Like a generalized dynamic                          Wildly successful at solving
  program                                             satisfiability problems
                                                      Convergent algorithm for
  It works                                            max-weight matching

  Most likely, it won’t
  converge

                                                                       Bayati et al. 2005;
   David F. Gleich (Sandia)   Network Alignment Algorithms             IBM Almaden   34 / 47
M →j {    = s} =
                                                          i             Mj →   {       = s}
                                                                j ∈{N( )j}

                                                          j
                                                variable tells function j what it thinks
                                                about being in state s. This is just the
                                                product of what all the other functions tell
                                                 about being in state s.

                                                          i     Mj→ {     = s} = m xim m
variables                     functions                                            y:all possible choices
                                                                                    for variables ∈N(j)

max-product of function nodes
                                                                                                     
                                                          j
                                                                   ƒj (y)         M   →j {   = y }
                                                                                                  
variables have state 0 or 1
                                                                         ∈{N(j) }
function nodes compute a
product                                         function j tells variable what it thinks
                                                about being in state s. This means that we
messages are the belief (local                  have to locally maxamize ƒj among all
objective) about a node for a                   possible choices. Note y = s always (too
state                                           cumbersome to include in notation.)



   David F. Gleich (Sandia)      Network Alignment Algorithms                          IBM Almaden   35 / 47
NetAlign factor graph: Loopy BP

                                      Variables         Functions
          A           B                                       ƒ1
                                      11
                                                                    ƒ2
     1                      1         12
                                                                    g1
                                      22
     2                      2                                       g2
                                      23
                                                                    g3
                            3
                                    11 22                           h11    22




Note It’s pretty hairy to put all the stuff I should put here on a single slide. Most of it is in the paper.
The rest is just “turning the crank” with standard tricks in BP algorithms.


         David F. Gleich (Sandia)               Network Alignment Algorithms                           IBM Almaden   36 / 47
Get tropical




                             In the max-plus sense.

  David F. Gleich (Sandia)       Network Alignment Algorithms   IBM Almaden   37 / 47
Belief propagation: A                                                                      view
                                         
                                          m xj
                                                                            bo nd ,b z
     A      :m×n                                          1,j j
                                         m xj                                 ≡
                                                                                min(b, m x( , z))
              Ar                                          2,j    j
                                                                  
     A      =                     A   x≡                                            z<
                                        
              Ac                               .
                                               .                               
                                               .                               = z
                                                                                
                                                                                        ≤z≤b
                                                                              

     x      :n×1                          m xj
                                                                                
                                                          m,j j                   b z>b
                                                                                



     NETALIGNBP ALGORITHM
         y(0) = 0, z(0) = 0, S(0) = 0, β = β/ 2
                                        ˜
         while t = 1, . . . do
                                  T
           d = bo nd0,β (S(t−1) + βS) · e
                          ˜            ˜

             y(t) = αw − bo nd0,∞ [(AT Ar − )
                                     r
                                                                            z(t−1) ] + d
             z(t) = αw − bo nd0,∞ [(AT Ac − )
                                     c
                                                                            y(t−1) ] + d
                                                              T
          S(t) = (Y(t) + Z(t) − αW − D) · S − bo nd0,β (S(t−1) + βS)
                                                     ˜           ˜
         end while
Note α = 1, β = 2, γ = 0.99 damping, max-weight matching rounding gives 15,214 overlap, 56,361
weight in 10 mins.


       David F. Gleich (Sandia)              Network Alignment Algorithms                  IBM Almaden   38 / 47
PageRank
                   PageRank Algorithms
                 Inner-outer Performance
                     Network Alignment
Results                      Motivation
                      Network alignment
Slide 39 of 47
                     Network Alignment
                            Algorithms
                                 Results
                             Conclusion
Synthetic experiments: BP does well!




                               1                                                                                  1
   rounded objective values




                              0.8                                                                                0.8




                                                                                              fraction correct
                              0.6                                                                                0.6


                              0.4                                                                                0.4
                                    MR−upper
                                    MR                                                                                 MR
                              0.2   BP                                                                           0.2   BP
                                    BPSC                                                                               BPSC
                                    IsoRank                                                                            IsoRank
                               0                                                                                  0
                                0         5           10              15       20                                  0          5           10              15         20
                                       expected degree of noise in L (p ⋅ n)                                               expected degree of noise in L (p ⋅ n)




  David F. Gleich (Sandia)                                                          Results                                                                    IBM Almaden   40 / 47
Biological data: A close tie
          400                                                                                     1200
          376                    overlap upper bound
                                                 381                                              1076                   overlap upper bound
                                                                                                                                        1087
                                                                                                  1000

          300

                                                                                                   800
Overlap




                                                                                        Overlap
          200                                                                                      600



                                                                                                   400
                                                                     max weight                                                                            max weight
          100                                                          671.551                                                                                  2733

                     BP                                                                                        BP
                                                                                                   200
                     SCBP                                                                                      SCBP
                     IsoRank                                                                                   IsoRank
                     MR                                                                                        MR
            0                                                                                       0
                0     100      200         300      400        500   600          700                    0        500       1000          1500   2000      2500
                                              Weight                                                                                   Weight



Problem                                                |VA |           |EA |                                 |VB |               |EB |             |EL |
dmela-scere                                            9459            25636                                 5696                31261             34582
Mus M.-Homo S.                                         3247            2793                                  9695                32890             15810


                David F. Gleich (Sandia)                                          Results                                                        IBM Almaden      41 / 47
Real dataset
                                        20000


                                                                 overlap upper bound
                                        16836                                  17608

                                        15000
                              Overlap



                                        10000




                                                                                              max weight
                                         5000                                                   60119.8

                                                    BP
                                                    SCBP
                                                    IsoRank
                                                    MR
                                            0
                                                0   10000     20000      30000   40000    50000     60000   70000
                                                                             Weight




Problem             |VA |                            |EA |                        |VB |                |EB |        |EL |
lcsh2wiki           297,266                          248,230                      205,948              382,353      4,971,629


   David F. Gleich (Sandia)                                               Results                                   IBM Almaden   42 / 47
Matching results: A little too hot!
                              LCSH     WC
  Science fiction television series     Science fiction television programs
                       Turing test     Turing test
                Machine learning       Machine learning
                         Hot tubs      Hot dog




   David F. Gleich (Sandia)      Results                       IBM Almaden   43 / 47
Foreign subject headings
       The US uses LCSH for subj. headings (342k verts, 258k edges).
       France uses Rameau for subj. headings (155k verts, 156k edges).
       Generate L by automatic translation and text matching.
       Used Google’s automatic translation service
       (translate.google.com).
       Produces 22,195,304 possible links based on text.

                                  cardinality               overlap               correct
    Manual                         54,259                 39,749
    MWM                           125,609                 17,134                29,133          50.54%
    NetAlignBP                    121,316                 46,534                32,467          56.32%
    NetAlignMR                    119,120                 45,977                25,086          43.52%
    Upper                                                 50,753

Note NetAlignBP with α = 1, β = 2, γ = 0.99 for 100 iterations; NetAlignMR with α = 0, β = 1 for 1000
iterations.


       David F. Gleich (Sandia)                       Results                                   IBM Almaden   44 / 47
PageRank
                   PageRank Algorithms
                 Inner-outer Performance
                     Network Alignment
Conclusion                   Motivation
                      Network alignment
Slide 45 of 47
                     Network Alignment
                            Algorithms
                                 Results
                             Conclusion
Philosophy


Why matrix computations?
   Simple, iterative methods
   “Easy” to code
   “Easy” to parallelize
   “Often” apply to graph problems




   David F. Gleich (Sandia)   Conclusion   IBM Almaden   46 / 47
Summary and Future ideas
Inner-outer iterations for            BP algorithms for network
PageRank                              alignment
    Robust analysis                         Fast and scalable
    Good for general graphs                 Good results on biology PPI
    Can combine with other                  networks
    techniques                              Reasonable results with
    Works for Gauss-Seidel                  Rameau to LCSH
    Works for non-stationary          Future work
    iterations
                                            No vertex label information
                                            for matches?
Future work                                 Are “overlap” scores
    Gauss-Seidel performance?               significant?
    O P E N Asymptotic                      Are LCSH and Wikipedia
    performance of inner-outer?             really similar?
    Dynamic β and η?                        O P E N An approx. algorithm?

   David F. Gleich (Sandia)    Conclusion                       IBM Almaden   47 / 47
PAPER 1   stanford.edu/~dgleich/publications/2009/
          gleich-2009-inner-outer.html
          SIAM J. Scientific Computing
          Google “inner outer gleich”
CODE      stanford.edu/~dgleich/publications/2009/innout
          Google “innout gleich”

PAPER 2   arxiv.org/abs/0907.3338
          ICDM 2009
          Google “network alignment gleich”
CODE      stanford.edu/~dgleich/publications/2009/netalign
          Google “netalign gleich”

More Related Content

Similar to Two numerical graph algorithms

Skew-symmetric matrix completion for rank aggregation
Skew-symmetric matrix completion for rank aggregationSkew-symmetric matrix completion for rank aggregation
Skew-symmetric matrix completion for rank aggregationDavid Gleich
 
Simulation Informatics; Analyzing Large Scientific Datasets
Simulation Informatics; Analyzing Large Scientific DatasetsSimulation Informatics; Analyzing Large Scientific Datasets
Simulation Informatics; Analyzing Large Scientific DatasetsDavid Gleich
 
Spectral methods for linear systems with random inputs
Spectral methods for linear systems with random inputsSpectral methods for linear systems with random inputs
Spectral methods for linear systems with random inputsDavid Gleich
 
Intro to threp
Intro to threpIntro to threp
Intro to threpHong Wu
 
Information processing with artificial spiking neural networks
Information processing with artificial spiking neural networksInformation processing with artificial spiking neural networks
Information processing with artificial spiking neural networksAdvanced-Concepts-Team
 
GraphChi big graph processing
GraphChi big graph processingGraphChi big graph processing
GraphChi big graph processinghuguk
 
Collaborative Similarity Measure for Intra-Graph Clustering
Collaborative Similarity Measure for Intra-Graph ClusteringCollaborative Similarity Measure for Intra-Graph Clustering
Collaborative Similarity Measure for Intra-Graph ClusteringWaqas Nawaz
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
Approximate Bayesian Computation on GPUs
Approximate Bayesian Computation on GPUsApproximate Bayesian Computation on GPUs
Approximate Bayesian Computation on GPUsMichael Stumpf
 
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetup
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetupLucas Theis - Compressing Images with Neural Networks - Creative AI meetup
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetupLuba Elliott
 
Hybrid Evolutionary Algorithms on Minimum Vertex Cover for Random Graphs
Hybrid Evolutionary Algorithms on Minimum Vertex Cover for Random GraphsHybrid Evolutionary Algorithms on Minimum Vertex Cover for Random Graphs
Hybrid Evolutionary Algorithms on Minimum Vertex Cover for Random GraphsMartin Pelikan
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slidesSara Asher
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Databricks
 
Mesh Generation and Topological Data Analysis
Mesh Generation and Topological Data AnalysisMesh Generation and Topological Data Analysis
Mesh Generation and Topological Data AnalysisDon Sheehy
 
Simulation Informatics
Simulation InformaticsSimulation Informatics
Simulation InformaticsDavid Gleich
 
Rise of the scientific database
Rise of the scientific databaseRise of the scientific database
Rise of the scientific databaseJohn De Goes
 

Similar to Two numerical graph algorithms (20)

Skew-symmetric matrix completion for rank aggregation
Skew-symmetric matrix completion for rank aggregationSkew-symmetric matrix completion for rank aggregation
Skew-symmetric matrix completion for rank aggregation
 
Simulation Informatics; Analyzing Large Scientific Datasets
Simulation Informatics; Analyzing Large Scientific DatasetsSimulation Informatics; Analyzing Large Scientific Datasets
Simulation Informatics; Analyzing Large Scientific Datasets
 
Spectral methods for linear systems with random inputs
Spectral methods for linear systems with random inputsSpectral methods for linear systems with random inputs
Spectral methods for linear systems with random inputs
 
Intro to threp
Intro to threpIntro to threp
Intro to threp
 
Barcelona sabatica
Barcelona sabaticaBarcelona sabatica
Barcelona sabatica
 
Shuronr
ShuronrShuronr
Shuronr
 
Information processing with artificial spiking neural networks
Information processing with artificial spiking neural networksInformation processing with artificial spiking neural networks
Information processing with artificial spiking neural networks
 
GraphChi big graph processing
GraphChi big graph processingGraphChi big graph processing
GraphChi big graph processing
 
2014 nci-edrn
2014 nci-edrn2014 nci-edrn
2014 nci-edrn
 
Collaborative Similarity Measure for Intra-Graph Clustering
Collaborative Similarity Measure for Intra-Graph ClusteringCollaborative Similarity Measure for Intra-Graph Clustering
Collaborative Similarity Measure for Intra-Graph Clustering
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Approximate Bayesian Computation on GPUs
Approximate Bayesian Computation on GPUsApproximate Bayesian Computation on GPUs
Approximate Bayesian Computation on GPUs
 
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetup
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetupLucas Theis - Compressing Images with Neural Networks - Creative AI meetup
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetup
 
Hybrid Evolutionary Algorithms on Minimum Vertex Cover for Random Graphs
Hybrid Evolutionary Algorithms on Minimum Vertex Cover for Random GraphsHybrid Evolutionary Algorithms on Minimum Vertex Cover for Random Graphs
Hybrid Evolutionary Algorithms on Minimum Vertex Cover for Random Graphs
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slides
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
 
Mesh Generation and Topological Data Analysis
Mesh Generation and Topological Data AnalysisMesh Generation and Topological Data Analysis
Mesh Generation and Topological Data Analysis
 
Lecture12
Lecture12Lecture12
Lecture12
 
Simulation Informatics
Simulation InformaticsSimulation Informatics
Simulation Informatics
 
Rise of the scientific database
Rise of the scientific databaseRise of the scientific database
Rise of the scientific database
 

More from David Gleich

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisDavid Gleich
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksDavid Gleich
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresDavid Gleich
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networksDavid Gleich
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisDavid Gleich
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansDavid Gleich
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsDavid Gleich
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph miningDavid Gleich
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresDavid Gleich
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structuresDavid Gleich
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphsDavid Gleich
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential David Gleich
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksDavid Gleich
 

More from David Gleich (20)

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based Learning
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chains
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 

Recently uploaded

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Two numerical graph algorithms

  • 1. Two matrix computations for numerical graph problems: PageRank and Network Alignment David F. Gleich Sandia National Labs Livermore, CA IBM Almaden Seminar San Jose, CA January 17th, 2011 In collaboration with Andrew Gray (UBC), Chen Greif (UBC) Tracy Lau (UBC/IBM?), Mohsen Bayati (Stanford) Ying Wang (Stanford), Margot Gerritsen (Stanford) Amin Saberi (Stanford) Supported by the Library of Congress and Microsoft Live Labs Fellowship David F. Gleich (Sandia) IBM Almaden 1 / 47
  • 2. Sketch of talk two algorithms inner-outer and belief propagation two problems PageRank and network alignment big graphs for both iterative matrix computations for both multi-core parallel results inner-outer only standard flow problem → algorithm → theory (hopefully) → empirical results except “fun” results first some open questions at end David F. Gleich (Sandia) IBM Almaden 2 / 47
  • 3. A PageRank algorithm Instead of the power method, Web Data, α = 0.99 x(k+1) = αPx(k) + (1 − α)v. Nodes 105,896,555 Edges 3,783,733,648 Use an outer iteration Power Method 964 its 5.15 hrs. (k+1) Inner-Outer 857 its 4.45 hrs. ( − βP)x = (α − β)Px(k) + (1 − α)v . Network-Alignment Data, α = 0.95 f Nodes 4,219,893,141 Edges 91,886,357,440 with the inner iteration Power Method 271 its 54.6 hrs. Inner-Outer 188 its 36.2 hrs. y(j+1) = βPy(j) + f. Codes and data available. It’s faster! Note Web data is uk-2006 from UNIMI’s (Univ. Milano) DSI group. David F. Gleich (Sandia) IBM Almaden 3 / 47
  • 4. Network Alignment r Square s A is about 200,000 vertices B is about 300,000 vertices L has around 5,000,000 edges 5 million variable integer QP t ∼ 90% of optimality in minutes. t A L B Codes and data available. DEMO David F. Gleich (Sandia) IBM Almaden 4 / 47
  • 5. PageRank PageRank Algorithms Inner-outer Performance Network Alignment PageRank Motivation Network alignment Slide 5 of 47 Network Alignment Algorithms Results Conclusion
  • 6. PageRank is a ... ... modified Markov chain, ... damped random walk on a graph, ... pinball game on the reverse web, or ... random surfer model. Proposed by Brin and Page in 1998, but similar ideas from earlier... (Sebastiano Vigna is working on tracing the history – the current history dates to 1949) Langville and Meyer (2006) is a good general reference; Berkhin (2005) has lots of goodies; and Des Higham called it pinball. David F. Gleich (Sandia) PageRank IBM Almaden 6 / 47
  • 7. The PageRank Random Surfer important pages ↔ highly probable to visit 3 1. follow out-edges uniformly with probability α, and 2 5 2. randomly jump according to v with 4 probability 1 − α, we’ll assume = 1/ n. 1 6 Induces a Markov chain model αP + (1 − α)veT x(α) = x(α)  1/ 6 1/ 2 ↓ or the linear system 0 0 0 0   1/ 6 1/ 2 0 0 1/ 3 0 0  1/ 6 0 1/ 3 0 0 ( − αP)x(α) = (1 − α)v  1/ 6 0 1/ 2 0 0 0 1/ 6 0 1/ 2 1/ 3 0 1 1/ 6 0 0 0 1 0 But it’s just a model. P Note I’m omitting important details about dangling nodes, I’ll mention them a bit later. David F. Gleich (Sandia) PageRank IBM Almaden 7 / 47
  • 8. What is α? Author α Brin and Page (1998) 0.85 Najork et al. (2007) 0.85 Litvak et al. (2006) 0.5 Katz (1953) 0.5 Experiment (2009) 0.63 ≈ 0.85 · 0.5 Algorithms (...) ≥ 0.85 Our regime 3.0 InfBeta( 3.2 , 2.0 , 1.9e−05 , 0.0019 ) α from browsers α ≥ .85 otherwise 2.5 power is fast. 2.0 density 1.5 P only available 1.0 for mat-vec 0.5 otherwise custom 0.0 techniques 0.0 0.2 0.4 0.6 0.8 1.0 Raw α possible. Constantine, Flaxman, Gleich, Gunawardana, Tracking the Random Surfer, WWW2010 Constantine and Gleich, Random Alpha PageRank, Internet Math. David F. Gleich (Sandia) PageRank IBM Almaden 8 / 47
  • 9. PageRank PageRank Algorithms Inner-outer Performance Network Alignment PageRank Algorithms Motivation Network alignment Slide 9 of 47 Network Alignment Algorithms Results Conclusion
  • 10. PageRank formulations and theory Codes Theory Strongly prefer- ential PageRank PseudoRank Eigensystems Graph or Substochastic Weakly prefer- PageRank Web graph matrix ential PageRank Linear systems Sink preferential PageRank Other transformations v teleportation vector ¯ P substochastic matrix (for algorithms) d dangling node vector (d = e − PT e) P + vdT → P ¯ Strongly preferential PageRank P + dT → P ¯ Weakly preferential PageRank ( = v) P PageRank stochastic matrix (for theory) ( − αP)x = (1 − α)v PageRank linear system David F. Gleich (Sandia) PageRank Algorithms IBM Almaden 10 / 47
  • 11. Motivation Why another PageRank algorithm? An ideal algorithm is 1. reliable 2. fast over a range of α’s fancy → Use Matlab’s “” 3. efficient for big problems → Use a Gauss-Seidel or custom Richardson method 4. uses only matvec products → Use the inner-outer iteration 5. uses only 2 vectors of memory → Use the power method simple David F. Gleich (Sandia) PageRank Algorithms IBM Almaden 11 / 47
  • 12. Simple algorithms The power method The Richardson method For Ax = λx, the iteration For Ax = b, the iteration x(k+1) = Ax(k) / Ax(k) x(k+1) = x(k) + ω (b − Ax(k) ) residual computes the largest eigenpair. computes x. The PageRank Markov chain The PageRank linear system is eigenvector problem is ( − αP)x = (1 − α)v. [αP + (1 − α)veT ]x = x For ω = 1 If eT x(0) = 1 and j ≥0 x(k+1) = αPx(k) + (1 − α)v x(k+1) = αPx(k) +(1−α)v eT x(k) and the Richardson iteration is =1 the power method. David F. Gleich (Sandia) PageRank Algorithms IBM Almaden 12 / 47
  • 13. Inner-Outer Note PageRank is easier when α is smaller Thus Solve PageRank with itself using β < α! Outer ( − βP)x(k+1) = (α − β)Px(k) + (1 − α)v ≡ f(k) Inner y(0) = x(k) y(j+1) = βPy(j) + f(k) A new parameter? What is β? 0.5 How many inner iterations? Until a residual of 10−2 Gleich, Gray, Greif, Lau, SISC 2010. David F. Gleich (Sandia) PageRank Algorithms IBM Almaden 13 / 47
  • 14. Inner-Outer algorithm uses only three vectors Input: P, v, α, τ, (β = 0.5, η = 10−2 ) of memory Output: x 1: x ← v Convergence? 2: y ← Px if 0 ≤ β ≤ α, with “ex- 3: while αy + (1 − α)v − x 1 ≥ τ act” iteration 4: f ← (α − β)y + (1 − α)v but also (small theo- 5: repeat rem) with any η! 6: x ← f + βy 7: y ← Px Parameters? 8: until f + βy − x 1 < η β = 0.5, η = 10−2 often 9: end while faster than the power 10: x ← αy + (1 − α)v method (or just a titch slower) Note Note that the inner-loop checks its condition after doing one iteration. An inexact iteration is always at least as good as one-step of the power method. David F. Gleich (Sandia) PageRank Algorithms IBM Almaden 14 / 47
  • 15. Inner-Outer Parameters Question: What parameters should we pick? in−2004, α=0.99 in−2004, α=0.99 1500 1500 power power η = 1e−01 β = 0.10 1400 η = 1e−02 1400 β = 0.30 η = 1e−03 β = 0.50 η = 1e−04 β = 0.70 1300 1300 η = 1e−05 1200 1200 Multiplications Multiplications 1100 1100 1000 1000 900 900 800 800 700 700 −4 −3 −2 −1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 10 10 10 10 β η α = 0.99, in-2004 graph (1.3M nodes, 16.9M edges) Just use β = 0.5 and η = 10−2 ! Note Many similar plots appear in my thesis. David F. Gleich (Sandia) PageRank Algorithms IBM Almaden 15 / 47
  • 16. The Competition Our Requirement: only Px is available! Quadratic Extrapolation (Kamvar, Haveliwala, et al.) Aggregation/Disaggregation (Langville and Meyer; Stewart) Permutations/Strong Components (Del Corso, Gulli, and Romani; Langville and Meyer) Krylov methods (Gleich, Zhukov, Berkhin; Del Corso, Gulli, and Romani) Padé-type extrapolation (Brezinski and Redivo-Zaglia) Arnoldi methods (Greif and Golub) Gauss-Seidel (Arasu, Novak, Tomkins, and Tomlin) David F. Gleich (Sandia) PageRank Algorithms IBM Almaden 16 / 47
  • 17. PageRank PageRank Algorithms Inner-outer Performance Inner-outer Network Alignment Motivation Performance Network alignment Slide 17 of 47 Network Alignment Algorithms Results Conclusion
  • 18. Datasets name size nonzeros avg nz/row ubc-cs-2006 51,681 673,010 13.0 ubc-2006 339,147 4,203,811 12.4 eu-2005 862,664 19,235,140 22.3 in-2004 1,382,908 16,917,053 12.2 wb-edu 9,845,725 57,156,537 5.8 arabic-2005 22,744,080 639,999,458 28.1 sk-2005 50,636,154 1,949,412,601 38.5 uk-2007 105,896,555 3,738,733,648 35.3 David F. Gleich (Sandia) Inner-outer Performance IBM Almaden 18 / 47
  • 19. One example wb−edu, α = 0.85 wb−edu, α = 0.99 0 10 0 10 −1 0 10 10 −1 10 10 0 −2 10 −2 −2 10 10 10 −2 5 10 15 20 20 40 −3 −3 10 10 Residual Residual −4 −4 10 10 −5 −5 10 10 −6 −6 10 10 power power inout inout −7 −7 10 10 10 20 30 40 50 60 70 80 200 400 600 800 1000 1200 Multiplication Multiplication τ = 10−7 , β = 0.5, η = 10−2 ; wb-edu graph (9.8M nodes, 57.M edges) David F. Gleich (Sandia) Inner-outer Performance IBM Almaden 19 / 47
  • 20. Advantage Inner-Outer tol. graph work (mults.) time (secs.) power in/out gain power in/out gain 10−3 ubc-cs-2006 226 141 37.6% 1.9 1.2 35.2% ubc 242 141 41.7% 13.6 8.3 38.4% α = 0.99, β = 0.5, η = 10−2 in-2004 232 129 44.4% 51.1 30.4 40.5% eu-2005 149 150 -0.7% 26.9 28.3 -5.3% wb-edu 221 130 41.2% 291.2 184.6 36.6% arabic-2005 213 139 34.7% 779.2 502.5 35.5% sk-2005 156 144 7.7% 1718.2 1595.9 7.1% uk-2007 145 125 13.8% 2802.0 2359.3 15.8% 10−5 ubc-cs-2006 574 432 24.7% 4.7 3.6 22.9% ubc 676 484 28.4% 37.7 27.8 26.2% in-2004 657 428 34.9% 144.3 97.5 32.4% eu-2005 499 476 4.6% 89.3 87.4 2.1% wb-edu 647 417 35.5% 850.6 572.0 32.8% arabic-2005 638 466 27.0% 2333.5 1670.0 28.4% sk-2005 523 460 12.0% 5729.0 5077.1 11.4% uk-2007 531 463 12.8% 10225.8 8661.9 15.3% 10−7 ubc-cs-2006 986 815 17.3% 8.0 6.8 15.4% ubc 1121 856 23.6% 62.5 49.0 21.6% in-2004 1108 795 28.2% 243.1 179.8 26.0% eu-2005 896 814 9.2% 159.9 148.6 7.1% wb-edu 1096 777 29.1% 1442.9 1059.0 26.6% arabic-2005 1083 843 22.2% 3958.8 3012.9 23.9% sk-2005 951 828 12.9% 10393.3 9122.9 12.2% uk-2007 964 857 11.1% 18559.2 16016.7 13.7% David F. Gleich (Sandia) Inner-outer Performance IBM Almaden 20 / 47
  • 21. Parallelization parallel Px xi=x[i]/degree(i); for (j in edges of i) { atomic(y[j]+=xi); }. 8 linear power relative 6 7 inout relative 1e−3 power 1e−3 inout Speedup relative to best 1 processor 6 1e−5 power 1e−5 inout 1e−7 power 5 1e−7 inout 4 5 3 2 1 4 8 0 1 2 3 4 5 6 7 8 Number of processors David F. Gleich (Sandia) Inner-outer Performance IBM Almaden 21 / 47
  • 22. PageRank PageRank Algorithms Inner-outer Performance Network Alignment Network Alignment Motivation Motivation Network alignment Slide 22 of 47 Network Alignment Algorithms Results Conclusion
  • 23. David F. Gleich (Sandia) Network Alignment Motivation IBM Almaden 23 / 47
  • 24. David F. Gleich (Sandia) Network Alignment Motivation IBM Almaden 24 / 47
  • 25. Alignment and overlap: The goal 3 Educational psychology 2 b2 a 1 b1 Psychiatric hospitals b Mental health is better than 3 2 b2 Health organizations Health 1 b1 Wikipedia LCSH r Square s t t A L B Maximize squares/overlap in 1-1 matching Find a good mapping to investigate similarity! David F. Gleich (Sandia) Network Alignment Motivation IBM Almaden 25 / 47
  • 26. PageRank PageRank Algorithms Inner-outer Performance Network Alignment Network alignment Motivation Network alignment Slide 26 of 47 Network Alignment Algorithms Results Conclusion
  • 27. Integrating Matching and Overlap: A QP Squares produce overlap → bonus for some and j → j Variables, Data r Square s = edge indicator e ∈L = weight of edges e = (t, ) Sj squares in S = t t t A L B Problem 1 m ximize + j m ximize wT x + 2 xT Sx x :e ∈L ,j∈S ↔ subject to Ax ≤ e subject to is a matching ∈ {0, 1} David F. Gleich (Sandia) Network alignment IBM Almaden 27 / 47
  • 28. An example with overlap (2,2 ) 0 0 0 0 0 1 0 1 0 1 1 1 0.6    (2,1 ) 0 0 0 0 1 0 1 0 1 0 0 0   0.9  (2,3 ) 0 0 0 0 1 0 1 0 1 0 0 0   0.3  (2,4 ) 0 0 0 0 1 0 1 0 1 0 0 0   0.1     5 0.5 5  (1,2 ) 0 1 1 1 0 0 0 0 0 0 0 1   0.9  (1,1 ) 1 0 0 0 0 0 0 0 0 0 0 0   0.6     4 0.4 4 0.1 0.1 (3,2 )  0 1 1 1 0 0 0 0 0 0 0 , 0   0.3  , (3,3 ) 1 0 0 0 0 0 0 0 0 0 0 0   0.5      2 0.6 2 (4,2 ) 0 1 1 1 0 0 0 0 0 0 0 0   0.1  0.3 0.3 (4,4 ) 1 0 0 0 0 0 0 0 0 0 0 0   0.4  3 0.5 3 (5,5 )     1 0 0 0 0 0 0 0 0 0 0 0 0.5 (6,1 ) 1 0 0 0 1 0 0 0 0 0 0 0 1.0 0.9 0.9 edge order S w 6 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0  1.0 0 0 0 0 0 0 0 0 0 0 1 0 A = 0 0 0 0 0 0 0 0 0 0 0 1 1 0.6 1 1 0 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 David F. Gleich (Sandia) Network alignment IBM Almaden 28 / 47
  • 29. Network alignment NETWORK ALIGNMENT β m ximize αwT x + 2 xT Sx subject to Ax ≤ e, ∈ {0, 1} History Sparse problems QUADRATIC ASSIGNMENT Sparse L often ignored (a MAXIMUM COMMON SUBGRAPH few exceptions). Our paper tackles that PATTERN RECOGNITION case explicitly. ONTOLOGY MATCHING We do large problems, BIOINFORMATICS too. Conte el al. Thirty years of graph matching, 2004.; Melnik et al. Similarity ooding, 2004; Blondel et al. SIREV 2004; Singh et al. RECOMB 2007; Klau, BMC Bioinformatics 10:S59, 2009. David F. Gleich (Sandia) Network alignment IBM Almaden 29 / 47
  • 30. PageRank PageRank Algorithms Inner-outer Performance Network Alignment Network Alignment Motivation Algorithms Network alignment Slide 30 of 47 Network Alignment Algorithms Results Conclusion
  • 31. Algorithms 1. L P Convert to LP, relax, solve (Skipped) 2. T I G H T L P Improve the LP (Skipped) 3. I S O R A N K Use a PageRank heuristic (Singh et al. 2007) 4. B P Max-product belief propagation for the LP 5. T I G H T B P BP for the TIGHTLP (skipped) 6. M R Sub-gradient descent on TIGHTLP (Klau 2009; skipped) Note Not discussed: early heuristic: Flannick et al. Genome Research 16:1169–1181, 2006; an independent BP algorithm: Bradde et al. arXiv:0905.1893, 2009 Singh et al. RECOMB2007; Klau, 2009 David F. Gleich (Sandia) Network Alignment Algorithms IBM Almaden 31 / 47
  • 32. IsoRank m ximize αwT x + (β/ 2)xT Sx subject to 0 ≤ Ax ≤ e, ∈ 0, 1 Solve PageRank on S and w! 1. Normalize S to stochastic P 2. Normalize w to stochastic v 3. Compute power iterations and round at each 4. Output best solution Need to evaluate a range of PageRank α Designed for complete bipartite L Singh et al. RECOMB2007; Ninove Ph.D. Thesis Louvain, 2008 David F. Gleich (Sandia) Network Alignment Algorithms IBM Almaden 32 / 47
  • 33. Inner-outer for this problem? Only on the cores of the two graphs. Dataset Size Non-Zeros LCSH-2 59,849 227,464 WC-3 70,509 403,960 Product Graph 4,219,893,141 91,886,357,440 α = 0.95, w from text similarity Inner-Outer 188 mat-vec 36.2 hours Power 271 mat-vec 54.6 hours Caveat: I’m ignoring all the details of actually using this technique. David F. Gleich (Sandia) Network Alignment Algorithms IBM Almaden 33 / 47
  • 34. Belief propagation: Our algorithm Summary History Construct a probability BP used for computing model where the most marginal probabilities and likely state is the solution! maximum aposterori Locally update information probability Like a generalized dynamic Wildly successful at solving program satisfiability problems Convergent algorithm for It works max-weight matching Most likely, it won’t converge Bayati et al. 2005; David F. Gleich (Sandia) Network Alignment Algorithms IBM Almaden 34 / 47
  • 35. M →j { = s} = i Mj → { = s} j ∈{N( )j} j variable tells function j what it thinks about being in state s. This is just the product of what all the other functions tell about being in state s. i Mj→ { = s} = m xim m variables functions y:all possible choices for variables ∈N(j) max-product of function nodes   j ƒj (y) M →j { = y }   variables have state 0 or 1 ∈{N(j) } function nodes compute a product function j tells variable what it thinks about being in state s. This means that we messages are the belief (local have to locally maxamize ƒj among all objective) about a node for a possible choices. Note y = s always (too state cumbersome to include in notation.) David F. Gleich (Sandia) Network Alignment Algorithms IBM Almaden 35 / 47
  • 36. NetAlign factor graph: Loopy BP Variables Functions A B ƒ1 11 ƒ2 1 1 12 g1 22 2 2 g2 23 g3 3 11 22 h11 22 Note It’s pretty hairy to put all the stuff I should put here on a single slide. Most of it is in the paper. The rest is just “turning the crank” with standard tricks in BP algorithms. David F. Gleich (Sandia) Network Alignment Algorithms IBM Almaden 36 / 47
  • 37. Get tropical In the max-plus sense. David F. Gleich (Sandia) Network Alignment Algorithms IBM Almaden 37 / 47
  • 38. Belief propagation: A view  m xj  bo nd ,b z A :m×n 1,j j  m xj ≡ min(b, m x( , z)) Ar 2,j j  A = A x≡ z<  Ac . .   . = z  ≤z≤b    x :n×1 m xj  m,j j b z>b  NETALIGNBP ALGORITHM y(0) = 0, z(0) = 0, S(0) = 0, β = β/ 2 ˜ while t = 1, . . . do T d = bo nd0,β (S(t−1) + βS) · e ˜ ˜ y(t) = αw − bo nd0,∞ [(AT Ar − ) r z(t−1) ] + d z(t) = αw − bo nd0,∞ [(AT Ac − ) c y(t−1) ] + d T S(t) = (Y(t) + Z(t) − αW − D) · S − bo nd0,β (S(t−1) + βS) ˜ ˜ end while Note α = 1, β = 2, γ = 0.99 damping, max-weight matching rounding gives 15,214 overlap, 56,361 weight in 10 mins. David F. Gleich (Sandia) Network Alignment Algorithms IBM Almaden 38 / 47
  • 39. PageRank PageRank Algorithms Inner-outer Performance Network Alignment Results Motivation Network alignment Slide 39 of 47 Network Alignment Algorithms Results Conclusion
  • 40. Synthetic experiments: BP does well! 1 1 rounded objective values 0.8 0.8 fraction correct 0.6 0.6 0.4 0.4 MR−upper MR MR 0.2 BP 0.2 BP BPSC BPSC IsoRank IsoRank 0 0 0 5 10 15 20 0 5 10 15 20 expected degree of noise in L (p ⋅ n) expected degree of noise in L (p ⋅ n) David F. Gleich (Sandia) Results IBM Almaden 40 / 47
  • 41. Biological data: A close tie 400 1200 376 overlap upper bound 381 1076 overlap upper bound 1087 1000 300 800 Overlap Overlap 200 600 400 max weight max weight 100 671.551 2733 BP BP 200 SCBP SCBP IsoRank IsoRank MR MR 0 0 0 100 200 300 400 500 600 700 0 500 1000 1500 2000 2500 Weight Weight Problem |VA | |EA | |VB | |EB | |EL | dmela-scere 9459 25636 5696 31261 34582 Mus M.-Homo S. 3247 2793 9695 32890 15810 David F. Gleich (Sandia) Results IBM Almaden 41 / 47
  • 42. Real dataset 20000 overlap upper bound 16836 17608 15000 Overlap 10000 max weight 5000 60119.8 BP SCBP IsoRank MR 0 0 10000 20000 30000 40000 50000 60000 70000 Weight Problem |VA | |EA | |VB | |EB | |EL | lcsh2wiki 297,266 248,230 205,948 382,353 4,971,629 David F. Gleich (Sandia) Results IBM Almaden 42 / 47
  • 43. Matching results: A little too hot! LCSH WC Science fiction television series Science fiction television programs Turing test Turing test Machine learning Machine learning Hot tubs Hot dog David F. Gleich (Sandia) Results IBM Almaden 43 / 47
  • 44. Foreign subject headings The US uses LCSH for subj. headings (342k verts, 258k edges). France uses Rameau for subj. headings (155k verts, 156k edges). Generate L by automatic translation and text matching. Used Google’s automatic translation service (translate.google.com). Produces 22,195,304 possible links based on text. cardinality overlap correct Manual 54,259 39,749 MWM 125,609 17,134 29,133 50.54% NetAlignBP 121,316 46,534 32,467 56.32% NetAlignMR 119,120 45,977 25,086 43.52% Upper 50,753 Note NetAlignBP with α = 1, β = 2, γ = 0.99 for 100 iterations; NetAlignMR with α = 0, β = 1 for 1000 iterations. David F. Gleich (Sandia) Results IBM Almaden 44 / 47
  • 45. PageRank PageRank Algorithms Inner-outer Performance Network Alignment Conclusion Motivation Network alignment Slide 45 of 47 Network Alignment Algorithms Results Conclusion
  • 46. Philosophy Why matrix computations? Simple, iterative methods “Easy” to code “Easy” to parallelize “Often” apply to graph problems David F. Gleich (Sandia) Conclusion IBM Almaden 46 / 47
  • 47. Summary and Future ideas Inner-outer iterations for BP algorithms for network PageRank alignment Robust analysis Fast and scalable Good for general graphs Good results on biology PPI Can combine with other networks techniques Reasonable results with Works for Gauss-Seidel Rameau to LCSH Works for non-stationary Future work iterations No vertex label information for matches? Future work Are “overlap” scores Gauss-Seidel performance? significant? O P E N Asymptotic Are LCSH and Wikipedia performance of inner-outer? really similar? Dynamic β and η? O P E N An approx. algorithm? David F. Gleich (Sandia) Conclusion IBM Almaden 47 / 47
  • 48. PAPER 1 stanford.edu/~dgleich/publications/2009/ gleich-2009-inner-outer.html SIAM J. Scientific Computing Google “inner outer gleich” CODE stanford.edu/~dgleich/publications/2009/innout Google “innout gleich” PAPER 2 arxiv.org/abs/0907.3338 ICDM 2009 Google “network alignment gleich” CODE stanford.edu/~dgleich/publications/2009/netalign Google “netalign gleich”