Localized methods for diffusions in large graphs

Localized methods for
diffusions in large graphs
David F. Gleich!
Purdue University!
Joint work with
Kyle Kloster @"
Purdue &
Michael
Mahoney @
Berkeley
supported by "
NSF CAREER
CCF-1149756
Code "www.cs.purdue.edu/homes/dgleich/codes/nexpokit !
"www.cs.purdue.edu/homes/dgleich/codes/l1pagerank!
David Gleich · Purdue
1
MMDS 2014

Image from rockysprings, deviantart, CC share-alike
Everything in the world can be
explained by a matrix, and we see
how deep the rabbit hole goes
The talk ends, you
believe -- whatever
you want to.

Graph diffusions
4
f =
1X
k=0
↵k Pk
s
ate
t in
on
work, or mesh, from a typical problem in scientiﬁc computing
high
low
A – adjacency matrix!
D – degree matrix!
P – column stochastic operator
s – the “seed” (a sparse vector)
f – the diffusion result
𝛼k – the path weights
P = AD 1
Px =
X
j!i
1
dj
xj
Graph diffusions help:
1.  Attribute prediction
2.  Community detection
3.  “Ranking”
4.  Find small conductance sets
MMDS 2014

Graph diffusions
5
ate
t in
on
work, or mesh, from a typical problem in scientiﬁc computing
high
low
h = e t
1X
k=0
tk
k!
Pk
s
h = e t
exp{tP}s
PageRank
Heat kernel
x = (1 )
1X
k=0
k
Pk
s
(I P)x = (1 )s
P = AD 1
Px =
X
j!i
1
dj
xj
MMDS 2014

Graph diffusions
6
h = e t
1X
k=0
tk
k!
Pk
s
h = e t
exp{tP}s
PageRank
Heat kernel
0 20 40 60 80 100
10
−5
10
0
t=1 t=5 t=15 α=0.85
α=0.99
Weight
Length
x = (1 )
1X
k=0
k
Pk
s
(I P)x = (1 )s
MMDS 2014

Uniformly localized "
solutions in livejournal
1 2 3 4 5
x 10
6
0
0.5
1
1.5
nnz = 4815948
magnitude
plot(x)
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
−14
10
−12
10
−10
10
−8
10
−6
10
−4
10
−2
10
0
1−normerror
largest non−zeros retained
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
−14
10
−12
10
−10
10
−8
10
−6
10
−4
10
−2
10
0
1−normerror
largest non−zeros retained
x = exp(P)ec
7
nnz(x) = 4, 815, 948
Gleich & Kloster,
arXiv:1310.3423
MMDS 2014

Our mission!
Find the solution with work "
roughly proportional to the "
localization, not the matrix.
8
MMDS 2014

Two types of localization
9
kx x⇤
k1  " kD 1
(x x⇤
)k1  "
x ⇡ x⇤
Uniform (Strong)! Entry-wise (Weak)!
Localized vectors are not sparse, but they
can be approximated by sparse vectors.
Good global approximation
using only a local region.
“Hard” to prove.
“Need” a graph property.
Good approximation for
cuts and communities.
“Easy” to prove.
“Fast” algorithms
MMDS 2014

We have four results
1.  A new interpretation for the PageRank
diffusion in relationship with a mincut
problem.
2.  A new understanding of the scalable,
localized PageRank “push” method
3.  A new algorithm for the heat kernel
diffusion in a degree weighted norm.
4.  Algorithms for diffusions as functions of
matrices (K. Kloster’s poster on Thurs.)
10
Undirected
graphs only

Entry-wise
localization
Directed,
uniform
localization
MMDS 2014

Our algorithms for uniform localization"
www.cs.purdue.edu/homes/dgleich/codes/nexpokit
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
−8
10
−6
10
−4
10
−2
10
0
non−zeros
1−normerror
gexpm
gexpmq
expmimv
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
−8
10
−6
10
−4
10
−2
10
0
non−zeros
1−normerror
11
MMDS 2014
work = O
⇣
log(1
" )(1
" )3/2
d2
(log d)2
⌘
nnz = O
⇣
log(1
" )(1
" )3/2
d(log d)
⌘

PageRank, mincuts, and the
push method via
Algorithmic Anti-Differentiation
12
Gleich & Mahoney,
ICML 2014
MMDS 2014

The PageRank problem & "
the Laplacian on undirected graphs
Combinatorial Laplacian L = D - A!
13

The PageRank random surfer
1.  With probability beta, follow a random-walk step
2.  With probability (1-beta), jump randomly ~ dist. s.
Goal ﬁnd the stationary dist. x!
x = (1 )
1X
k=0
k
Pk
s1. (I AD 1
)x = (1 )s;
2. [↵D + L]z = ↵s where = 1/(1 + ↵) and x = Dz.
MMDS 2014

minimize kBxkC,1 =
P
ij2E Ci,j |xi xj |
subject to xs = 1, xt = 0, x 0.
The s-t min-cut problem
Unweighted incidence matrix
Diagonal capacity matrix
14
t
s
In the unweighted case, "
solve via max-ﬂow.

In the weighted case,
solve via network simplex
or industrial LP.
MMDS 2014

The localized cut graph

Related to a construction
used in “FlowImprove” "
Andersen & Lang (2007); and
Orecchia & Zhu (2014)
AS =
2
4
0 ↵dT
S 0
↵dS A ↵d¯S
0 ↵dT
¯S 0
3
5
Connect s to vertices
in S with weight ↵ · degree
Connect t to vertices
in ¯S with weight ↵ · degree
15
MMDS 2014

BS =
2
4
e IS 0
0 B 0
0 I¯S e
3
5
minimize kBSxkC(↵),1
subject to xs = 1, xt = 0
x 0.
Solve the s-t min-cut
16
MMDS 2014

BS =
2
4
e IS 0
0 B 0
0 I¯S e
3
5
Solve the “electrical ﬂow”  
s-t min-cut
subject to xs = 1, xt = 0
17
MMDS 2014

s-t min-cut à PageRank
Proof
Square and expand
the objective into
a Laplacian, then
apply constraints.
18
MMDS 2014
The PageRank vector z that solves
(↵D + L)z = ↵s
with s = dS/vol(S) is a renormalized
solution of the electrical cut computation:
subject to xs = 1, xt = 0.
Speciﬁcally, if x is the solution, then
x =
2
4
1
vol(S)z
0
3
5

PageRank à s-t min-cut
That equivalence works if s is degree-weighted.
What if s is the uniform vector?
A(s) =
2
4
0 ↵sT
0
↵s A ↵(d s)
0 ↵(d s)T
0
3
5 .
19
MMDS 2014

Insight 1!
PageRank implicitly approximates the
solution of these s-t mincut problems
20
MMDS 2014

The Push Algorithm for PageRank
Proposed (in closest form) in Andersen, Chung, Lang "
(also by McSherry, Jeh & Widom) for personalized PageRank
Strongly related to Gauss-Seidel on Ax=b (see my talk at Simons)
Derived to show improved runtime for balanced solvers
1. x(1)
= 0, r(1)
= (1 )ei , k = 1
2. while any rj > ⌧dj (dj is the degree of node j)
3. x(k+1)
= x(k)
+ (rj ⌧dj ⇢)ej
4. r(k+1)
i =
8
><
>:
⌧dj ⇢ i = j
r(k)
i + (rj ⌧dj ⇢)/dj i ⇠ j
r(k)
i otherwise
5. k k + 1
The
Push
Method!
⌧, ⇢
21
a
b
c
MMDS 2014

Why do we care
about push?

1.  Used for empirical stud-
ies of “communities”
2.  Local Cheeger inequality.
3.  Used for “fast Page-
Rank approximation”
4.  It produces weakly
localized approximations
to PageRank!

Newman’s netscience!
379 vertices, 1828 nnz
“zero” on most of the nodes
s has a single "
one here
22
kD 1
(x x⇤
)k1  "
1
(1 )"
edges

The push method revisited
Let x be the output from the push method
with 0 < < 1, v = dS/vol(S),
⇢ = 1, and ⌧ > 0.
Set ↵ = 1
,  = ⌧vol(S)/ , and let zG solve:
minimize 1
2 kBSzk
2
C(↵),2 + kDzk1
subject to zs = 1, zt = 0, z 0
,
where z =
h 1
zG
0
i
.
Then x = DzG/vol(S).
Proof Write out KKT conditions
Show that the push method
solves them. Slackness was “tricky”
Regularization
for sparsity
23
Need for
normalization
MMDS 2014

Insight 2!
The PageRank push method
implicitly solves a 1-norm regularized
2-norm cut approximation.
24
MMDS 2014

Insight 2’
We get 3-digits of accuracy on P and
16-digits of accuracy on P’.
25
MMDS 2014

26
Anti-di↵erentiating Approximat
16 nonzeros 15 nonzeros
Figure 2. Examples of the di↵erent cut vectors on a portion of the netscience
with its vertices enlarged. In the other subfigures, we show the solution vectors
(4), and (6), solved with min-cut, PageRank, and ACL) for this set S . Each v
values are large and dark. White vertices with outlines are numerically non-zer
outlined, in contrast to the third figure). The true min-cut set is large in all ve
with many fewer non-zeros than the vanilla PageRank problem.
References
Andersen, Reid and Lang, Kevin. An algorithm for improving
graph partitions. In Proceedings of the 19th annual ACM-SIAM
Symposium on Discrete Algorithms, pp. 651–660, 2008.
Leskov
Mic
clus
Inte
Mahon
Anti-di↵erentiating Approximation Algorithms
eros 15 nonzeros 284 nonzeros 24 nonzeros
of the di↵erent cut vectors on a portion of the netscience graph. In the left subfigure, we show the set S highlighted
arged. In the other subfigures, we show the solution vectors from the various cut problems (from left to right, Probs. (2),
Push’s sparsity
helps it identify
the “right” graph
feature with fewer
non-zeros
The set S
The mincut solution
The push solution
The PageRank solution
MMDS 2014

The push method revisited
Let x be the output from the push method
with 0 < < 1, v = dS/vol(S),
⇢ = 1, and ⌧ > 0.
Set ↵ = 1
,  = ⌧vol(S)/ , and let zG solve:
minimize 1
2 kBSzk
2
C(↵),2 + kDzk1
subject to zs = 1, zt = 0, z 0
,
where z =
h 1
zG
0
i
.
Then x = DzG/vol(S).
Regularization
for sparsity in
solution and
residual
27
The push method is scalable because it gives
us sparse solutions AND sparse residuals r.
MMDS 2014

This is a case of
Algorithmic Anti-differentiation!
28
MMDS 2014

Understand why H works!
Show heuristic H solves P’
Guess and check!
until you find something H
solves
Derive characterization of
heuristic H
The real world
Given “find-communities”
Hack around "

Write paper presenting
“three steps of the power
method on P finds
communities”
Algorithmic Anti-differentiation!
Given heuristic H, is there a problem P’
such that H is an algorithm for P’ ?
MMDS 2014
29
e.g. Mahoney & Orecchia,
Dhillon et al. (Graclus);
Saunders

Without these insights, we’d
draw the wrong conclusion.
30
Gleich & Mahoney,
Submitted
Our s-t mincut framework extends to many
diffusions used in semi-supervised learning.
MMDS 2014

31
Gleich & Mahoney,
Submitted
2 4 6 8 10
0
0.2
0.4
0.6
0.8
errorrate
average training samples per class
K2
RK2
K3
RK3
Off the shelf SSL procedure
MMDS 2014

32
Gleich & Mahoney,
Submitted
2 4 6 8 10
0
0.2
0.4
0.6
0.8
errorrate
K2
RK2
K3
RK3
2 4 6 8 10
0
0.2
0.4
0.6
0.8
errorrate
K2
RK2
K3
RK3
Off the shelf SSL procedure
Rank-rounded SSL
MMDS 2014

Recap so far
1.  Used the relationship between
PageRank and mincut to get a new
understanding of the implicit properties
of the push method
2.  Showed that this insight helps improve
semi-supervised learning.
(next) A new algorithm for the heat kernel
diffusion in a degree weighted norm.

33
MMDS 2014

Graph diffusions
34
h = e t
1X
k=0
tk
k!
Pk
s
h = e t
exp{tP}s
PageRank
Heat kernel
0 20 40 60 80 100
10
−5
10
0
t=1 t=5 t=15 α=0.85
α=0.99
Weight
Length
x = (1 )
1X
k=0
k
Pk
s
(I P)x = (1 )s
Many “empirically useful” properties of PageRank
also hold for the Heat kernel diffusion, e.g. "
Chung (2007) showed a local Cheeger inequality.

No “local” algorithm until a randomized method by
Simpson & Chung (2013).
MMDS 2014

We can turn the heat kernel
into a linear system
Direct expansion!

"
!
!
!

35
x = exp(P)ec ⇡
PN
k=0
1
k! Pk
ec = xN
2
6
6
6
6
6
6
4
III
P/1 III
P/2
...
... III
P/N III
3
7
7
7
7
7
7
5
2
6
6
6
6
6
6
4
v0
v1
...
...
vN
3
7
7
7
7
7
7
5
=
2
6
6
6
6
6
6
4
ec
0
...
...
0
3
7
7
7
7
7
7
5
xN =
NX
i=0
vi
(III ⌦ IIIN SN ⌦ P)v = e1 ⌦ ec
Lemma we approximate xN well if we approximate v well
Kloster & Gleich,
WAW2013
MMDS 2014

There is a fast deterministic
adaptation of the push method
36
Kloster & Gleich,
KDD2014

ons
hat
hen,
erm
s is
d to
the
s:
(7)
(8)
tity
em
(9)
to
k ⇡
we
# G is graph as dictionary -of -sets ,
# seed is an array of seeds ,
# t, eps , N, psis are precomputed
x = {} # Store x, r as dictionaries
r = {} # initialize residual
Q = collections.deque () # initialize queue
for s in seed:
r[(s ,0)] = 1./ len(seed)
Q.append ((s ,0))
while len(Q) > 0:
(v,j) = Q.popleft () # v has r[(v,j)] ...
rvj = r[(v,j)]
# perform the hk -relax step
if v not in x: x[v] = 0.
x[v] += rvj
r[(v,j)] = 0.
mass = (t*rvj/( float(j)+1.))/ len(G[v])
for u in G[v]: # for neighbors of v
next = (u,j+1) # in the next block
if j+1 == N: # last step , add to soln
x[u] += rvj/len(G(v))
continue
if next not in r: r[next] = 0.
thresh = math.exp(t)*eps*len(G[u])
thresh = thresh /(N*psis[j+1])/2.
if r[next] < thresh and
r[next] + mass >= thresh:
Q.append(next) # add u to queue
r[next] = r[next] + mass
Figure 2: Pseudo-code for our algorithm as work-
ing python code. The graph is stored as a dic-
Let h = e t
exp{tP}s.
Let x = hk-push(") output
Then kD 1
(x h)k1  "
after looking at 2Net
" edges.
We believe that the bound below sufﬁces
N  2t log(1/")
MMDS 2014

PageRank vs. Heat Kernel
37
5 6 7 8 9
0
0.5
1
1.5
2
Runtime: hk vs. ppr
log10(|V|+|E|)
Runtime(s)
hkgrow 50%
25%
75%
pprgrow 50%
25%
75%
5 6 7 8 9
10
−2
10
−1
10
0
Conductances: hk vs. ppr
log10(|V|+|E|)
log10(Conductances)
hkgrow 50%
25%
75%
pprgrow 50%
25%
75%
5 6 7 8 9
0
0.5
1
1.5
2
Runtime: hk vs. ppr
log10(|V|+|E|)
Runtime(s)
hkgrow 50%
25%
75%
pprgrow 50%
25%
75%
10
−2
10
−1
10
0
Conductances: hk vs. ppr
log10(Conductances)
hkgrow 50%
25%
75%
pprgrow 50%
25%
75%
On large graphs, our heat kernel
takes slightly longer than a
localized PageRank, but
produces sets with smaller
(better) conductance scores.

Our python code on clueweb12
(72B edges) via libbvg:
•  99 seconds to load
•  1 second to compute
MMDS 2014

References and ongoing work
Gleich and Kloster – Relaxation methods for the matrix exponential,
Submitted"
Kloster and Gleich – Heat kernel based community detection KDD2014
Gleich and Mahoney – Algorithmic Anti-differentiation, ICML 2014 "
Gleich and Mahoney – Regularized diffusions, Submitted
www.cs.purdue.edu/homes/dgleich/codes/nexpokit!
www.cs.purdue.edu/homes/dgleich/codes/l1pagerank
•  Improved localization bounds for functions of matrices
•  Asynchronous and parallel “push”-style methods
38
Supported by NSF CAREER 1149756-CCF
www.cs.purdue.edu/homes/dgleich

Localized methods for diffusions in large graphs

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Localized methods for diffusions in large graphs

Ähnlich wie Localized methods for diffusions in large graphs (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Localized methods for diffusions in large graphs