I describe a few ongoing research projects on diffusions in large graphs and how we can create efficient matrix computations in order to determine them efficiently.
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
Localized methods for diffusions in large graphs
1. Localized methods for
diffusions in large graphs
David F. Gleich!
Purdue University!
Joint work with
Kyle Kloster @"
Purdue &
Michael
Mahoney @
Berkeley
supported by "
NSF CAREER
CCF-1149756
Code "www.cs.purdue.edu/homes/dgleich/codes/nexpokit !
"www.cs.purdue.edu/homes/dgleich/codes/l1pagerank!
David Gleich · Purdue
1
MMDS 2014
2. Image from rockysprings, deviantart, CC share-alike
Everything in the world can be
explained by a matrix, and we see
how deep the rabbit hole goes
The talk ends, you
believe -- whatever
you want to.
4. Graph diffusions
David Gleich · Purdue
4
f =
1X
k=0
↵k Pk
s
ate
t in
on
work, or mesh, from a typical problem in scientific computing
high
low
A – adjacency matrix!
D – degree matrix!
P – column stochastic operator
s – the “seed” (a sparse vector)
f – the diffusion result
𝛼k – the path weights
P = AD 1
Px =
X
j!i
1
dj
xj
Graph diffusions help:
1. Attribute prediction
2. Community detection
3. “Ranking”
4. Find small conductance sets
MMDS 2014
5. Graph diffusions
David Gleich · Purdue
5
ate
t in
on
work, or mesh, from a typical problem in scientific computing
high
low
h = e t
1X
k=0
tk
k!
Pk
s
h = e t
exp{tP}s
PageRank
Heat kernel
x = (1 )
1X
k=0
k
Pk
s
(I P)x = (1 )s
P = AD 1
Px =
X
j!i
1
dj
xj
MMDS 2014
6. Graph diffusions
David Gleich · Purdue
6
h = e t
1X
k=0
tk
k!
Pk
s
h = e t
exp{tP}s
PageRank
Heat kernel
0 20 40 60 80 100
10
−5
10
0
t=1 t=5 t=15 α=0.85
α=0.99
Weight
Length
x = (1 )
1X
k=0
k
Pk
s
(I P)x = (1 )s
MMDS 2014
8. Our mission!
Find the solution with work "
roughly proportional to the "
localization, not the matrix.
David Gleich · Purdue
8
MMDS 2014
9. Two types of localization
David Gleich · Purdue
9
kx x⇤
k1 " kD 1
(x x⇤
)k1 "
x ⇡ x⇤
Uniform (Strong)! Entry-wise (Weak)!
Localized vectors are not sparse, but they
can be approximated by sparse vectors.
Good global approximation
using only a local region.
“Hard” to prove.
“Need” a graph property.
Good approximation for
cuts and communities.
“Easy” to prove.
“Fast” algorithms
MMDS 2014
10. We have four results
1. A new interpretation for the PageRank
diffusion in relationship with a mincut
problem.
2. A new understanding of the scalable,
localized PageRank “push” method
3. A new algorithm for the heat kernel
diffusion in a degree weighted norm.
4. Algorithms for diffusions as functions of
matrices (K. Kloster’s poster on Thurs.)
David Gleich · Purdue
10
Undirected
graphs only
Entry-wise
localization
Directed,
uniform
localization
MMDS 2014
12. PageRank, mincuts, and the
push method via
Algorithmic Anti-Differentiation
David Gleich · Purdue
12
Gleich & Mahoney,
ICML 2014
MMDS 2014
13. The PageRank problem & "
the Laplacian on undirected graphs
Combinatorial Laplacian L = D - A!
David Gleich · Purdue
13
The PageRank random surfer
1. With probability beta, follow a random-walk step
2. With probability (1-beta), jump randomly ~ dist. s.
Goal find the stationary dist. x!
x = (1 )
1X
k=0
k
Pk
s1. (I AD 1
)x = (1 )s;
2. [↵D + L]z = ↵s where = 1/(1 + ↵) and x = Dz.
MMDS 2014
14. minimize kBxkC,1 =
P
ij2E Ci,j |xi xj |
subject to xs = 1, xt = 0, x 0.
The s-t min-cut problem
Unweighted incidence matrix
Diagonal capacity matrix
14
David Gleich · Purdue
t
s
In the unweighted case, "
solve via max-flow.
In the weighted case,
solve via network simplex
or industrial LP.
MMDS 2014
15. The localized cut graph
Related to a construction
used in “FlowImprove” "
Andersen & Lang (2007); and
Orecchia & Zhu (2014)
AS =
2
4
0 ↵dT
S 0
↵dS A ↵d¯S
0 ↵dT
¯S 0
3
5
Connect s to vertices
in S with weight ↵ · degree
Connect t to vertices
in ¯S with weight ↵ · degree
David Gleich · Purdue
15
MMDS 2014
16. The localized cut graph
Connect s to vertices
in S with weight ↵ · degree
Connect t to vertices
in ¯S with weight ↵ · degree
BS =
2
4
e IS 0
0 B 0
0 I¯S e
3
5
minimize kBSxkC(↵),1
subject to xs = 1, xt = 0
x 0.
Solve the s-t min-cut
David Gleich · Purdue
16
MMDS 2014
17. The localized cut graph
Connect s to vertices
in S with weight ↵ · degree
Connect t to vertices
in ¯S with weight ↵ · degree
BS =
2
4
e IS 0
0 B 0
0 I¯S e
3
5
Solve the “electrical flow”
s-t min-cut
minimize kBSxkC(↵),2
subject to xs = 1, xt = 0
David Gleich · Purdue
17
MMDS 2014
18. s-t min-cut à PageRank
Proof
Square and expand
the objective into
a Laplacian, then
apply constraints.
David Gleich · Purdue
18
MMDS 2014
The PageRank vector z that solves
(↵D + L)z = ↵s
with s = dS/vol(S) is a renormalized
solution of the electrical cut computation:
minimize kBSxkC(↵),2
subject to xs = 1, xt = 0.
Specifically, if x is the solution, then
x =
2
4
1
vol(S)z
0
3
5
19. PageRank à s-t min-cut
That equivalence works if s is degree-weighted.
What if s is the uniform vector?
A(s) =
2
4
0 ↵sT
0
↵s A ↵(d s)
0 ↵(d s)T
0
3
5 .
David Gleich · Purdue
19
MMDS 2014
20. Insight 1!
PageRank implicitly approximates the
solution of these s-t mincut problems
David Gleich · Purdue
20
MMDS 2014
21. The Push Algorithm for PageRank
Proposed (in closest form) in Andersen, Chung, Lang "
(also by McSherry, Jeh & Widom) for personalized PageRank
Strongly related to Gauss-Seidel on Ax=b (see my talk at Simons)
Derived to show improved runtime for balanced solvers
1. x(1)
= 0, r(1)
= (1 )ei , k = 1
2. while any rj > ⌧dj (dj is the degree of node j)
3. x(k+1)
= x(k)
+ (rj ⌧dj ⇢)ej
4. r(k+1)
i =
8
><
>:
⌧dj ⇢ i = j
r(k)
i + (rj ⌧dj ⇢)/dj i ⇠ j
r(k)
i otherwise
5. k k + 1
The
Push
Method!
⌧, ⇢
David Gleich · Purdue
21
a
b
c
MMDS 2014
22. Why do we care
about push?
1. Used for empirical stud-
ies of “communities”
2. Local Cheeger inequality.
3. Used for “fast Page-
Rank approximation”
4. It produces weakly
localized approximations
to PageRank!
Newman’s netscience!
379 vertices, 1828 nnz
“zero” on most of the nodes
s has a single "
one here
22
kD 1
(x x⇤
)k1 "
1
(1 )"
edges
23. The push method revisited
Let x be the output from the push method
with 0 < < 1, v = dS/vol(S),
⇢ = 1, and ⌧ > 0.
Set ↵ = 1
, = ⌧vol(S)/ , and let zG solve:
minimize 1
2 kBSzk
2
C(↵),2 + kDzk1
subject to zs = 1, zt = 0, z 0
,
where z =
h 1
zG
0
i
.
Then x = DzG/vol(S).
Proof Write out KKT conditions
Show that the push method
solves them. Slackness was “tricky”
Regularization
for sparsity
David Gleich · Purdue
23
Need for
normalization
MMDS 2014
24. Insight 2!
The PageRank push method
implicitly solves a 1-norm regularized
2-norm cut approximation.
David Gleich · Purdue
24
MMDS 2014
25. Insight 2’
We get 3-digits of accuracy on P and
16-digits of accuracy on P’.
David Gleich · Purdue
25
MMDS 2014
26. David Gleich · Purdue
26
Anti-di↵erentiating Approximat
16 nonzeros 15 nonzeros
Figure 2. Examples of the di↵erent cut vectors on a portion of the netscience
with its vertices enlarged. In the other subfigures, we show the solution vectors
(4), and (6), solved with min-cut, PageRank, and ACL) for this set S . Each v
values are large and dark. White vertices with outlines are numerically non-zer
outlined, in contrast to the third figure). The true min-cut set is large in all ve
with many fewer non-zeros than the vanilla PageRank problem.
References
Andersen, Reid and Lang, Kevin. An algorithm for improving
graph partitions. In Proceedings of the 19th annual ACM-SIAM
Symposium on Discrete Algorithms, pp. 651–660, 2008.
Leskov
Mic
clus
Inte
Mahon
Anti-di↵erentiating Approximation Algorithms
eros 15 nonzeros 284 nonzeros 24 nonzeros
of the di↵erent cut vectors on a portion of the netscience graph. In the left subfigure, we show the set S highlighted
arged. In the other subfigures, we show the solution vectors from the various cut problems (from left to right, Probs. (2),
Push’s sparsity
helps it identify
the “right” graph
feature with fewer
non-zeros
The set S
The mincut solution
The push solution
The PageRank solution
MMDS 2014
27. The push method revisited
Let x be the output from the push method
with 0 < < 1, v = dS/vol(S),
⇢ = 1, and ⌧ > 0.
Set ↵ = 1
, = ⌧vol(S)/ , and let zG solve:
minimize 1
2 kBSzk
2
C(↵),2 + kDzk1
subject to zs = 1, zt = 0, z 0
,
where z =
h 1
zG
0
i
.
Then x = DzG/vol(S).
Regularization
for sparsity in
solution and
residual
David Gleich · Purdue
27
The push method is scalable because it gives
us sparse solutions AND sparse residuals r.
MMDS 2014
28. This is a case of
Algorithmic Anti-differentiation!
28
MMDS 2014
David Gleich · Purdue
29. Understand why H works!
Show heuristic H solves P’
Guess and check!
until you find something H
solves
Derive characterization of
heuristic H
The real world
Given “find-communities”
Hack around "
Write paper presenting
“three steps of the power
method on P finds
communities”
Algorithmic Anti-differentiation!
Given heuristic H, is there a problem P’
such that H is an algorithm for P’ ?
MMDS 2014
David Gleich · Purdue
29
e.g. Mahoney & Orecchia,
Dhillon et al. (Graclus);
Saunders
30. Without these insights, we’d
draw the wrong conclusion.
David Gleich · Purdue
30
Gleich & Mahoney,
Submitted
Our s-t mincut framework extends to many
diffusions used in semi-supervised learning.
MMDS 2014
31. Without these insights, we’d
draw the wrong conclusion.
David Gleich · Purdue
31
Gleich & Mahoney,
Submitted
Our s-t mincut framework extends to many
diffusions used in semi-supervised learning.
2 4 6 8 10
0
0.2
0.4
0.6
0.8
errorrate
average training samples per class
K2
RK2
K3
RK3
Off the shelf SSL procedure
MMDS 2014
32. Without these insights, we’d
draw the wrong conclusion.
David Gleich · Purdue
32
Gleich & Mahoney,
Submitted
Our s-t mincut framework extends to many
diffusions used in semi-supervised learning.
2 4 6 8 10
0
0.2
0.4
0.6
0.8
errorrate
average training samples per class
K2
RK2
K3
RK3
2 4 6 8 10
0
0.2
0.4
0.6
0.8
errorrate
average training samples per class
K2
RK2
K3
RK3
Off the shelf SSL procedure
Rank-rounded SSL
MMDS 2014
33. Recap so far
1. Used the relationship between
PageRank and mincut to get a new
understanding of the implicit properties
of the push method
2. Showed that this insight helps improve
semi-supervised learning.
(next) A new algorithm for the heat kernel
diffusion in a degree weighted norm.
David Gleich · Purdue
33
MMDS 2014
34. Graph diffusions
David Gleich · Purdue
34
h = e t
1X
k=0
tk
k!
Pk
s
h = e t
exp{tP}s
PageRank
Heat kernel
0 20 40 60 80 100
10
−5
10
0
t=1 t=5 t=15 α=0.85
α=0.99
Weight
Length
x = (1 )
1X
k=0
k
Pk
s
(I P)x = (1 )s
Many “empirically useful” properties of PageRank
also hold for the Heat kernel diffusion, e.g. "
Chung (2007) showed a local Cheeger inequality.
No “local” algorithm until a randomized method by
Simpson & Chung (2013).
MMDS 2014
35. We can turn the heat kernel
into a linear system
Direct expansion!
"
!
!
!
David Gleich · Purdue
35
x = exp(P)ec ⇡
PN
k=0
1
k! Pk
ec = xN
2
6
6
6
6
6
6
4
III
P/1 III
P/2
...
... III
P/N III
3
7
7
7
7
7
7
5
2
6
6
6
6
6
6
4
v0
v1
...
...
vN
3
7
7
7
7
7
7
5
=
2
6
6
6
6
6
6
4
ec
0
...
...
0
3
7
7
7
7
7
7
5
xN =
NX
i=0
vi
(III ⌦ IIIN SN ⌦ P)v = e1 ⌦ ec
Lemma we approximate xN well if we approximate v well
Kloster & Gleich,
WAW2013
MMDS 2014
36. There is a fast deterministic
adaptation of the push method
David Gleich · Purdue
36
Kloster & Gleich,
KDD2014
ons
hat
hen,
erm
s is
d to
the
s:
(7)
(8)
tity
em
(9)
to
k ⇡
we
# G is graph as dictionary -of -sets ,
# seed is an array of seeds ,
# t, eps , N, psis are precomputed
x = {} # Store x, r as dictionaries
r = {} # initialize residual
Q = collections.deque () # initialize queue
for s in seed:
r[(s ,0)] = 1./ len(seed)
Q.append ((s ,0))
while len(Q) > 0:
(v,j) = Q.popleft () # v has r[(v,j)] ...
rvj = r[(v,j)]
# perform the hk -relax step
if v not in x: x[v] = 0.
x[v] += rvj
r[(v,j)] = 0.
mass = (t*rvj/( float(j)+1.))/ len(G[v])
for u in G[v]: # for neighbors of v
next = (u,j+1) # in the next block
if j+1 == N: # last step , add to soln
x[u] += rvj/len(G(v))
continue
if next not in r: r[next] = 0.
thresh = math.exp(t)*eps*len(G[u])
thresh = thresh /(N*psis[j+1])/2.
if r[next] < thresh and
r[next] + mass >= thresh:
Q.append(next) # add u to queue
r[next] = r[next] + mass
Figure 2: Pseudo-code for our algorithm as work-
ing python code. The graph is stored as a dic-
Let h = e t
exp{tP}s.
Let x = hk-push(") output
Then kD 1
(x h)k1 "
after looking at 2Net
" edges.
We believe that the bound below suffices
N 2t log(1/")
MMDS 2014
37. PageRank vs. Heat Kernel
David Gleich · Purdue
37
5 6 7 8 9
0
0.5
1
1.5
2
Runtime: hk vs. ppr
log10(|V|+|E|)
Runtime(s)
hkgrow 50%
25%
75%
pprgrow 50%
25%
75%
5 6 7 8 9
10
−2
10
−1
10
0
Conductances: hk vs. ppr
log10(|V|+|E|)
log10(Conductances)
hkgrow 50%
25%
75%
pprgrow 50%
25%
75%
5 6 7 8 9
0
0.5
1
1.5
2
Runtime: hk vs. ppr
log10(|V|+|E|)
Runtime(s)
hkgrow 50%
25%
75%
pprgrow 50%
25%
75%
10
−2
10
−1
10
0
Conductances: hk vs. ppr
log10(Conductances)
hkgrow 50%
25%
75%
pprgrow 50%
25%
75%
On large graphs, our heat kernel
takes slightly longer than a
localized PageRank, but
produces sets with smaller
(better) conductance scores.
Our python code on clueweb12
(72B edges) via libbvg:
• 99 seconds to load
• 1 second to compute
MMDS 2014
38. References and ongoing work
Gleich and Kloster – Relaxation methods for the matrix exponential,
Submitted"
Kloster and Gleich – Heat kernel based community detection KDD2014
Gleich and Mahoney – Algorithmic Anti-differentiation, ICML 2014 "
Gleich and Mahoney – Regularized diffusions, Submitted
www.cs.purdue.edu/homes/dgleich/codes/nexpokit!
www.cs.purdue.edu/homes/dgleich/codes/l1pagerank
• Improved localization bounds for functions of matrices
• Asynchronous and parallel “push”-style methods
David Gleich · Purdue
38
Supported by NSF CAREER 1149756-CCF
www.cs.purdue.edu/homes/dgleich