Two numerical graph algorithms

Two matrix computations for
numerical graph problems:
PageRank and Network Alignment
David F. Gleich
Sandia National Labs
Livermore, CA

IBM Almaden Seminar
San Jose, CA
January 17th, 2011

In collaboration with
Andrew Gray (UBC), Chen Greif (UBC)
Tracy Lau (UBC/IBM?), Mohsen Bayati (Stanford)
Ying Wang (Stanford), Margot Gerritsen (Stanford)
Amin Saberi (Stanford)

Supported by the Library of Congress
and Microsoft Live Labs Fellowship
David F. Gleich (Sandia) IBM Almaden 1 / 47

Sketch of talk
two algorithms inner-outer and belief propagation
two problems PageRank and network alignment
big graphs for both
iterative matrix computations for both
multi-core parallel results inner-outer only

standard ﬂow
problem → algorithm → theory (hopefully) → empirical results
except “fun” results ﬁrst

some open questions at end


A PageRank algorithm
Instead of the power method,
Web Data, α = 0.99
x(k+1) = αPx(k) + (1 − α)v. Nodes 105,896,555
Edges 3,783,733,648
Use an outer iteration Power Method 964 its 5.15 hrs.
(k+1) Inner-Outer 857 its 4.45 hrs.
( − βP)x
= (α − β)Px(k) + (1 − α)v . Network-Alignment Data, α = 0.95
f Nodes 4,219,893,141
Edges 91,886,357,440
with the inner iteration Power Method 271 its 54.6 hrs.
Inner-Outer 188 its 36.2 hrs.
y(j+1) = βPy(j) + f.
Codes and data available.
It’s faster!
Note Web data is uk-2006 from UNIMI’s (Univ. Milano) DSI group.


Network Alignment

r Square s A is about 200,000 vertices
B is about 300,000 vertices
L has around 5,000,000 edges
5 million variable integer QP
t ∼ 90% of optimality in minutes.
t

A L B

Codes and data available.

DEMO


PageRank
PageRank Algorithms
Inner-outer Performance
Network Alignment
PageRank Motivation
Network alignment
Slide 5 of 47
Network Alignment
Algorithms
Results
Conclusion

PageRank is a ...
... modiﬁed Markov chain,
... damped random walk on a graph,
... pinball game on the reverse web, or
... random surfer model.
Proposed by Brin and Page in 1998, but similar ideas from
earlier... (Sebastiano Vigna is working on tracing the history –
the current history dates to 1949)

Langville and Meyer (2006) is a good
general reference; Berkhin (2005) has
lots of goodies; and Des Higham called
it pinball.

David F. Gleich (Sandia) PageRank IBM Almaden 6 / 47

The PageRank Random Surfer
important pages ↔ highly probable to visit
3
1. follow out-edges uniformly with
probability α, and
2 5
2. randomly jump according to v with
4 probability 1 − α, we’ll assume
= 1/ n.
1 6 Induces a Markov chain model

αP + (1 − α)veT x(α) = x(α)

 1/ 6 1/ 2 ↓ or the linear system
0 0 0 0

 1/ 6 1/ 2
0 0 1/ 3 0 0
 1/ 6 0 1/ 3 0 0 ( − αP)x(α) = (1 − α)v
 1/ 6 0 1/ 2 0 0 0
1/ 6 0 1/ 2 1/ 3 0 1
1/ 6 0 0 0 1 0 But it’s just a model.
P
Note I’m omitting important details about dangling nodes, I’ll mention them a bit later.


What is α?
Author α
Brin and Page (1998) 0.85
Najork et al. (2007) 0.85
Litvak et al. (2006) 0.5
Katz (1953) 0.5
Experiment (2009) 0.63 ≈ 0.85 · 0.5
Algorithms (...) ≥ 0.85
Our regime 3.0 InfBeta( 3.2 , 2.0 , 1.9e−05 , 0.0019 )

α from browsers
α ≥ .85 otherwise 2.5

power is fast. 2.0

density
1.5
P only available 1.0
for mat-vec 0.5
otherwise custom 0.0
techniques 0.0 0.2 0.4 0.6 0.8 1.0
Raw α
possible.
Constantine, Flaxman, Gleich, Gunawardana, Tracking the Random Surfer, WWW2010
Constantine and Gleich, Random Alpha PageRank, Internet Math.

PageRank
PageRank Algorithms
Network Alignment
PageRank Algorithms Motivation
Network alignment
Slide 9 of 47
Network Alignment
Algorithms
Results
Conclusion

PageRank formulations and theory
Codes Theory

Strongly prefer-
ential PageRank PseudoRank
Eigensystems
Graph or Substochastic Weakly prefer-
PageRank
Web graph matrix ential PageRank
Linear systems
Sink preferential
PageRank
Other transformations

v teleportation vector
¯
P substochastic matrix (for algorithms)
d dangling node vector (d = e − PT e)
P + vdT → P
¯ Strongly preferential PageRank
P + dT → P
¯ Weakly preferential PageRank ( = v)
P PageRank stochastic matrix (for theory)
( − αP)x = (1 − α)v PageRank linear system
David F. Gleich (Sandia) PageRank Algorithms IBM Almaden 10 / 47

Motivation
Why another PageRank algorithm?

An ideal algorithm is
1. reliable
2. fast over a range of α’s fancy
→ Use Matlab’s “”
3. efﬁcient for big problems
→ Use a Gauss-Seidel or
custom Richardson method
4. uses only matvec products
→ Use the inner-outer iteration
5. uses only 2 vectors of memory
→ Use the power method simple


Simple algorithms
The power method The Richardson method
For Ax = λx, the iteration For Ax = b, the iteration

x(k+1) = Ax(k) / Ax(k) x(k+1) = x(k) + ω (b − Ax(k) )
residual
computes the largest
eigenpair. computes x.
The PageRank Markov chain The PageRank linear system is
eigenvector problem is
( − αP)x = (1 − α)v.
[αP + (1 − α)veT ]x = x
For ω = 1
If eT x(0) = 1 and j ≥0 x(k+1) = αPx(k) + (1 − α)v

x(k+1) = αPx(k) +(1−α)v eT x(k) and the Richardson iteration is
=1 the power method.


Inner-Outer

Note PageRank is easier when α is smaller
Thus Solve PageRank with itself using β < α!

Outer ( − βP)x(k+1) = (α − β)Px(k) + (1 − α)v ≡ f(k)

Inner y(0) = x(k) y(j+1) = βPy(j) + f(k)

A new parameter? What is β? 0.5
How many inner iterations? Until a residual of 10−2

Gleich, Gray, Greif, Lau, SISC 2010.

Inner-Outer algorithm
uses only three vectors
Input: P, v, α, τ, (β = 0.5, η = 10−2 ) of memory
Output: x
1: x ← v Convergence?
2: y ← Px if 0 ≤ β ≤ α, with “ex-
3: while αy + (1 − α)v − x 1 ≥ τ act” iteration
4: f ← (α − β)y + (1 − α)v but also (small theo-
5: repeat rem) with any η!
6: x ← f + βy
7: y ← Px Parameters?
8: until f + βy − x 1 < η β = 0.5, η = 10−2 often
9: end while faster than the power
10: x ← αy + (1 − α)v method
(or just a titch slower)

Note Note that the inner-loop checks its condition after doing one iteration. An inexact iteration is
always at least as good as one-step of the power method.


Inner-Outer Parameters
Question: What parameters should we pick?
in−2004, α=0.99 in−2004, α=0.99
1500 1500
power power
η = 1e−01 β = 0.10
1400 η = 1e−02 1400 β = 0.30
η = 1e−03 β = 0.50
η = 1e−04 β = 0.70
1300 1300
η = 1e−05

1200 1200
Multiplications

Multiplications
1100 1100

1000 1000

900 900

800 800

700 700 −4 −3 −2 −1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 10 10 10 10
β η

α = 0.99, in-2004 graph (1.3M nodes, 16.9M edges)
Just use β = 0.5 and η = 10−2 !
Note Many similar plots appear in my thesis.


The Competition
Our Requirement: only Px is available!
Quadratic Extrapolation (Kamvar, Haveliwala, et al.)
Aggregation/Disaggregation
(Langville and Meyer; Stewart)
Permutations/Strong Components
(Del Corso, Gulli, and Romani; Langville and Meyer)
Krylov methods (Gleich, Zhukov, Berkhin;
Del Corso, Gulli, and Romani)
Padé-type extrapolation (Brezinski and Redivo-Zaglia)

Arnoldi methods (Greif and Golub)

Gauss-Seidel (Arasu, Novak, Tomkins, and Tomlin)


PageRank
PageRank Algorithms

Inner-outer Network Alignment
Motivation
Performance Network alignment
Slide 17 of 47 Network Alignment
Algorithms
Results
Conclusion

Datasets

name size nonzeros avg nz/row
ubc-cs-2006 51,681 673,010 13.0
ubc-2006 339,147 4,203,811 12.4
eu-2005 862,664 19,235,140 22.3
in-2004 1,382,908 16,917,053 12.2
wb-edu 9,845,725 57,156,537 5.8
arabic-2005 22,744,080 639,999,458 28.1
sk-2005 50,636,154 1,949,412,601 38.5
uk-2007 105,896,555 3,738,733,648 35.3

David F. Gleich (Sandia) Inner-outer Performance IBM Almaden 18 / 47

One example
wb−edu, α = 0.85 wb−edu, α = 0.99
0
10 0
10

−1 0
10 10 −1
10 10
0

−2
10 −2 −2
10 10 10
−2

5 10 15 20 20 40
−3 −3
10 10
Residual

Residual
−4 −4
10 10

−5 −5
10 10

−6 −6
10 10
power power
inout inout
−7 −7
10 10
10 20 30 40 50 60 70 80 200 400 600 800 1000 1200
Multiplication Multiplication

τ = 10−7 , β = 0.5, η = 10−2 ;
wb-edu graph (9.8M nodes, 57.M edges)


Advantage Inner-Outer
tol. graph work (mults.) time (secs.)
power in/out gain power in/out gain

10−3 ubc-cs-2006 226 141 37.6% 1.9 1.2 35.2%
ubc 242 141 41.7% 13.6 8.3 38.4%
α = 0.99, β = 0.5, η = 10−2

in-2004 232 129 44.4% 51.1 30.4 40.5%
eu-2005 149 150 -0.7% 26.9 28.3 -5.3%
wb-edu 221 130 41.2% 291.2 184.6 36.6%
arabic-2005 213 139 34.7% 779.2 502.5 35.5%
sk-2005 156 144 7.7% 1718.2 1595.9 7.1%
uk-2007 145 125 13.8% 2802.0 2359.3 15.8%

10−5 ubc-cs-2006 574 432 24.7% 4.7 3.6 22.9%
ubc 676 484 28.4% 37.7 27.8 26.2%
in-2004 657 428 34.9% 144.3 97.5 32.4%
eu-2005 499 476 4.6% 89.3 87.4 2.1%
wb-edu 647 417 35.5% 850.6 572.0 32.8%
arabic-2005 638 466 27.0% 2333.5 1670.0 28.4%
sk-2005 523 460 12.0% 5729.0 5077.1 11.4%
uk-2007 531 463 12.8% 10225.8 8661.9 15.3%

10−7 ubc-cs-2006 986 815 17.3% 8.0 6.8 15.4%
ubc 1121 856 23.6% 62.5 49.0 21.6%
in-2004 1108 795 28.2% 243.1 179.8 26.0%
eu-2005 896 814 9.2% 159.9 148.6 7.1%
wb-edu 1096 777 29.1% 1442.9 1059.0 26.6%
arabic-2005 1083 843 22.2% 3958.8 3012.9 23.9%
sk-2005 951 828 12.9% 10393.3 9122.9 12.2%
uk-2007 964 857 11.1% 18559.2 16016.7 13.7%


Parallelization
parallel Px
xi=x[i]/degree(i); for (j in edges of i) { atomic(y[j]+=xi); }.

8
linear
power relative 6
7 inout relative
1e−3 power
1e−3 inout
Speedup relative to best 1 processor

6 1e−5 power
1e−5 inout
1e−7 power
5 1e−7 inout

4
5

3

2

1

4
8
0
1 2 3 4 5 6 7 8
Number of processors

PageRank
PageRank Algorithms

Network Alignment Network Alignment
Motivation
Motivation Network alignment
Algorithms
Results
Conclusion

David F. Gleich (Sandia) Network Alignment Motivation IBM Almaden 23 / 47

Alignment and overlap: The goal
3
Educational psychology
2 b2
a
1 b1
Psychiatric hospitals b Mental health
is better than
3

2 b2
Health organizations Health
1 b1
Wikipedia LCSH

r Square s

t
t

A L B
Maximize squares/overlap in 1-1 matching
Find a good mapping to investigate similarity!

PageRank
PageRank Algorithms
Network Alignment
Network alignment Motivation
Network alignment
Slide 26 of 47
Network Alignment
Algorithms
Results
Conclusion

Integrating Matching and Overlap: A QP
Squares produce overlap → bonus for some and j → j

Variables, Data
r Square s

= edge indicator e ∈L
= weight of edges e = (t, )
Sj squares in S = t t
t

A L B

Problem
1
m ximize + j
m ximize wT x + 2 xT Sx
x
:e ∈L ,j∈S ↔ subject to Ax ≤ e
subject to is a matching ∈ {0, 1}

David F. Gleich (Sandia) Network alignment IBM Almaden 27 / 47

An example with overlap
(2,2 )
0 0 0 0 0 1 0 1 0 1 1 1 0.6
  
(2,1 ) 0 0 0 0 1 0 1 0 1 0 0 0   0.9 
(2,3 ) 0 0 0 0 1 0 1 0 1 0 0 0   0.3 
(2,4 )
0 0 0 0 1 0 1 0 1 0 0 0   0.1 
  
5 0.5 5 
(1,2 ) 0 1 1 1 0 0 0 0 0 0 0 1   0.9 
(1,1 )
1 0 0 0 0 0 0 0 0 0 0 0   0.6 
  
4 0.4 4
0.1 0.1 (3,2 )

0 1 1 1 0 0 0 0 0 0 0
,
0   0.3 
,
(3,3 )
1 0 0 0 0 0 0 0 0 0 0 0   0.5 
  

2 0.6 2 (4,2 ) 0 1 1 1 0 0 0 0 0 0 0 0   0.1 
0.3 0.3
(4,4 ) 1 0 0 0 0 0 0 0 0 0 0 0   0.4 
3 0.5 3
(5,5 )
   
1 0 0 0 0 0 0 0 0 0 0 0 0.5
(6,1 ) 1 0 0 0 1 0 0 0 0 0 0 0 1.0
0.9 0.9
edge order S w
6 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 1 1 0 0 0 0 0 0
0 0 0 0 0 0 1 1 0 0 0 0
0 0 0 0 0 0 0 0 1 1 0 0

1.0 0 0 0 0 0 0 0 0 0 0 1 0
A = 0 0 0 0 0 0 0 0 0 0 0 1
1 0.6 1 1 0 0 0 1 0 1 0 1 0 0 0
0 1 0 0 0 1 0 0 0 0 0 1
0 0 1 0 0 0 0 1 0 0 0 0
0 0 0 1 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 1 0


Network alignment

NETWORK ALIGNMENT
β
m ximize αwT x + 2 xT Sx
subject to Ax ≤ e, ∈ {0, 1}

History Sparse problems
QUADRATIC ASSIGNMENT Sparse L often ignored (a
MAXIMUM COMMON SUBGRAPH few exceptions).
Our paper tackles that
PATTERN RECOGNITION
case explicitly.
ONTOLOGY MATCHING We do large problems,
BIOINFORMATICS too.

Conte el al. Thirty years of graph matching, 2004.; Melnik et al. Similarity ooding, 2004; Blondel et al. SIREV 2004;
Singh et al. RECOMB 2007; Klau, BMC Bioinformatics 10:S59, 2009.

PageRank
PageRank Algorithms

Network Alignment Network Alignment
Motivation
Algorithms Network alignment
Algorithms
Results
Conclusion

Algorithms

1. L P Convert to LP, relax, solve (Skipped)
2. T I G H T L P Improve the LP (Skipped)

3. I S O R A N K Use a PageRank heuristic (Singh et al. 2007)
4. B P Max-product belief propagation for the LP
5. T I G H T B P BP for the TIGHTLP (skipped)
6. M R Sub-gradient descent on TIGHTLP (Klau 2009;
skipped)

Note Not discussed: early heuristic: Flannick et al. Genome Research 16:1169–1181, 2006; an
independent BP algorithm: Bradde et al. arXiv:0905.1893, 2009
Singh et al. RECOMB2007; Klau, 2009
David F. Gleich (Sandia) Network Alignment Algorithms IBM Almaden 31 / 47

IsoRank
m ximize αwT x + (β/ 2)xT Sx
subject to 0 ≤ Ax ≤ e, ∈ 0, 1
Solve PageRank on S and w!

1. Normalize S to stochastic P
2. Normalize w to stochastic v
3. Compute power iterations and round at each
4. Output best solution

Need to evaluate a range of PageRank α
Designed for complete bipartite L

Singh et al. RECOMB2007; Ninove Ph.D. Thesis Louvain, 2008

Inner-outer for this problem?
Only on the cores of the two graphs.
Dataset Size Non-Zeros
LCSH-2 59,849 227,464
WC-3 70,509 403,960
Product Graph 4,219,893,141 91,886,357,440

α = 0.95, w from text similarity
Inner-Outer 188 mat-vec 36.2 hours
Power 271 mat-vec 54.6 hours

Caveat: I’m ignoring all the details of
actually using this technique.


Belief propagation: Our algorithm

Summary History
Construct a probability BP used for computing
model where the most marginal probabilities and
likely state is the solution! maximum aposterori
Locally update information probability
Like a generalized dynamic Wildly successful at solving
program satisﬁability problems
Convergent algorithm for
It works max-weight matching

Most likely, it won’t
converge

Bayati et al. 2005;

M →j { = s} =
i Mj → { = s}
j ∈{N( )j}

j
variable tells function j what it thinks
about being in state s. This is just the
product of what all the other functions tell
about being in state s.

i Mj→ { = s} = m xim m
variables functions y:all possible choices
for variables ∈N(j)

max-product of function nodes
 
j
ƒj (y) M →j { = y }
 
variables have state 0 or 1
∈{N(j) }
function nodes compute a
product function j tells variable what it thinks
about being in state s. This means that we
messages are the belief (local have to locally maxamize ƒj among all
objective) about a node for a possible choices. Note y = s always (too
state cumbersome to include in notation.)


NetAlign factor graph: Loopy BP

Variables Functions
A B ƒ1
11
ƒ2
1 1 12
g1
22
2 2 g2
23
g3
3
11 22 h11 22

Note It’s pretty hairy to put all the stuff I should put here on a single slide. Most of it is in the paper.
The rest is just “turning the crank” with standard tricks in BP algorithms.


Get tropical

In the max-plus sense.


Belief propagation: A view

m xj
 bo nd ,b z
A :m×n 1,j j
 m xj ≡
min(b, m x( , z))
Ar 2,j j

A = A x≡ z<

Ac .
.  
. = z

≤z≤b
  

x :n×1 m xj

m,j j b z>b


NETALIGNBP ALGORITHM
y(0) = 0, z(0) = 0, S(0) = 0, β = β/ 2
˜
while t = 1, . . . do
T
d = bo nd0,β (S(t−1) + βS) · e
˜ ˜

y(t) = αw − bo nd0,∞ [(AT Ar − )
r
z(t−1) ] + d
z(t) = αw − bo nd0,∞ [(AT Ac − )
c
y(t−1) ] + d
T
S(t) = (Y(t) + Z(t) − αW − D) · S − bo nd0,β (S(t−1) + βS)
˜ ˜
end while
Note α = 1, β = 2, γ = 0.99 damping, max-weight matching rounding gives 15,214 overlap, 56,361
weight in 10 mins.


PageRank
PageRank Algorithms
Network Alignment
Results Motivation
Network alignment
Slide 39 of 47
Network Alignment
Algorithms
Results
Conclusion

Synthetic experiments: BP does well!

1 1
rounded objective values

0.8 0.8

fraction correct
0.6 0.6

0.4 0.4
MR−upper
MR MR
0.2 BP 0.2 BP
BPSC BPSC
IsoRank IsoRank
0 0
0 5 10 15 20 0 5 10 15 20
expected degree of noise in L (p ⋅ n) expected degree of noise in L (p ⋅ n)

David F. Gleich (Sandia) Results IBM Almaden 40 / 47

Biological data: A close tie
400 1200
376 overlap upper bound
381 1076 overlap upper bound
1087
1000

300

800
Overlap

Overlap
200 600

400
max weight max weight
100 671.551 2733

BP BP
200
SCBP SCBP
IsoRank IsoRank
MR MR
0 0
0 100 200 300 400 500 600 700 0 500 1000 1500 2000 2500
Weight Weight

Problem |VA | |EA | |VB | |EB | |EL |
dmela-scere 9459 25636 5696 31261 34582
Mus M.-Homo S. 3247 2793 9695 32890 15810


Real dataset
20000

overlap upper bound
16836 17608

15000
Overlap

10000

max weight
5000 60119.8

BP
SCBP
IsoRank
MR
0
0 10000 20000 30000 40000 50000 60000 70000
Weight

Problem |VA | |EA | |VB | |EB | |EL |
lcsh2wiki 297,266 248,230 205,948 382,353 4,971,629


Matching results: A little too hot!
LCSH WC
Science ﬁction television series Science ﬁction television programs
Turing test Turing test
Machine learning Machine learning
Hot tubs Hot dog


Foreign subject headings
The US uses LCSH for subj. headings (342k verts, 258k edges).
France uses Rameau for subj. headings (155k verts, 156k edges).
Generate L by automatic translation and text matching.
Used Google’s automatic translation service
(translate.google.com).
Produces 22,195,304 possible links based on text.

cardinality overlap correct
Manual 54,259 39,749
MWM 125,609 17,134 29,133 50.54%
NetAlignBP 121,316 46,534 32,467 56.32%
NetAlignMR 119,120 45,977 25,086 43.52%
Upper 50,753

Note NetAlignBP with α = 1, β = 2, γ = 0.99 for 100 iterations; NetAlignMR with α = 0, β = 1 for 1000
iterations.


PageRank
PageRank Algorithms
Network Alignment
Conclusion Motivation
Network alignment
Slide 45 of 47
Network Alignment
Algorithms
Results
Conclusion

Philosophy

Why matrix computations?
Simple, iterative methods
“Easy” to code
“Easy” to parallelize
“Often” apply to graph problems

David F. Gleich (Sandia) Conclusion IBM Almaden 46 / 47

Summary and Future ideas
Inner-outer iterations for BP algorithms for network
PageRank alignment
Robust analysis Fast and scalable
Good for general graphs Good results on biology PPI
Can combine with other networks
techniques Reasonable results with
Works for Gauss-Seidel Rameau to LCSH
Works for non-stationary Future work
iterations
No vertex label information
for matches?
Future work Are “overlap” scores
Gauss-Seidel performance? signiﬁcant?
O P E N Asymptotic Are LCSH and Wikipedia
performance of inner-outer? really similar?
Dynamic β and η? O P E N An approx. algorithm?

David F. Gleich (Sandia) Conclusion IBM Almaden 47 / 47

PAPER 1 stanford.edu/~dgleich/publications/2009/
gleich-2009-inner-outer.html
SIAM J. Scientiﬁc Computing
Google “inner outer gleich”
CODE stanford.edu/~dgleich/publications/2009/innout
Google “innout gleich”

PAPER 2 arxiv.org/abs/0907.3338
ICDM 2009
Google “network alignment gleich”
CODE stanford.edu/~dgleich/publications/2009/netalign
Google “netalign gleich”

Two numerical graph algorithms

Recommended

Recommended

More Related Content

Similar to Two numerical graph algorithms

Similar to Two numerical graph algorithms (20)

More from David Gleich

More from David Gleich (20)

Recently uploaded

Recently uploaded (20)

Two numerical graph algorithms