2. Two projects.
1. Circulant structure in tensors and a linear
system solver.
2. Localized structure in the matrix exponential
and relaxation methods.
David Gleich · Purdue
2
3. Circulant algebra
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) LA/Opt Seminar 2 / 29
Introduction
Kilmer, Martin, and Perrone (2008) presented a circulant
algebra: a set of operations that generalize matrix algebra to
three-way data and provided an SVD.
The essence of this approach amounts to viewing
three-dimensional objects as two-dimensional arrays (i.e.,
matrices) of one-dimensional arrays (i.e., vectors).
Braman (2010) developed spectral
and other decompositions.
We have extended this algebra with
the ingredients required for iterative
methods such as the power method
and Arnoldi method, and have char-
acterized the behavior of these algo-
rithms.
With Chen Greif and James Varah
at UBC.
David Gleich · Purdue
3
4. 40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) LA/Opt Seminar 4 / 29
Three-way arrays
Given an m ⇥ n ⇥ k table of data,
we view this data as an m ⇥ n ma-
trix where each “scalar” is a vector
of length k.
A 2 Km⇥n
k
We denote the space of length-k scalars as Kk.
These scalars interact like circulant matrices.
David Gleich · Purdue
4
5. 40
60
80
Circulant matrices are a commutative, closed class under the
standard matrix operations.
2
6
6
6
6
6
4
1 k . . . 2
2 1
...
...
...
...
... k
k . . . 2 1
3
7
7
7
7
7
5
We’ll see more of their properties shortly!
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) LA/Opt Seminar 6 / 29
The circ operation
We denote the space of length-k scalars as Kk.
These scalars interact like circulant matrices.
= { 1 ... k } 2 Kk.
$ circ( ) ⌘
2
6
6
6
6
6
4
1 k . . . 2
2 1
...
...
...
...
... k
k . . . 2 1
3
7
7
7
7
7
5
.
+ $ circ( )+circ( ) and $ circ( )circ( );
0 = {0 0 ... 0} 1 = {1 0 ... 0}
David Gleich · Purdue
5
6. Scalars to matrix-vector
products
40 60 80 100 120
40
60
mm
David F. Gleich (Purdue) LA/Opt Seminar 13
Operations (cont.)
More operations are simplified in the Fourier space too. Let
cft( ) = di g [ˆ1, ..., ˆk]. Because the ˆj values are the
eigenvalues of circ( ), we have:
abs( ) = icft(di g [| ˆ1|, ..., | ˆk|]),
= icft(di g [ˆ1, ..., ˆk]) = icft(cft( ) ), and
angle( ) = icft(di g [ˆ1/| ˆ1|, ..., ˆk/| ˆk|]).
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) LA/Opt Seminar 10 / 29
cft and icft
We define the “Circulant Fourier Transform” or cft
cft : 2 Kk 7! Ck⇥k
and its inverse
icft : Ck⇥k
7! Kk
as follows:
cft( ) ⌘
ñ
ˆ1
...
ˆk
ô
= F circ( )F,
icft
Çñ
ˆ1
...
ˆk
ôå
⌘ $ F cft( )F ,
where ˆj are the eigenvalues of circ( ) as produced in the
Fourier transform. These transformations satisfy
icft(cft( )) = and provide a convenient way of moving
between operations in Kk to the more familiar environment of
diagonal matrices in Ck⇥k.
40 60
40
60
mm
The circ operation on ma
A x =
2
6
4
Pn
j=1
A1,j j
...Pn
j=1
Am,j j
3
7
5 $
2
4
circ(A1,1) ..
...
..
circ(Am,1) ..
Define
circ(A) ⌘
2
4
circ(A1,1) ... circ(A1,n)
...
...
...
circ(Am,1) ... circ(Am,n)
3
5 c
David Gleich · Purdue
6
7. The special structure
This circulant structure is our special structure
for this first problem. We look at two types of
iterative methods:
1. the power method and
2. the Arnoldi method.
David Gleich · Purdue
7
8. A perplexing result!
40 60 80 100 120
40
60
80
mm
Example
Run the power method on
{2 3 1} {0 0 0}
{0 0 0} {3 1 1}
Result = (1/3) {10 4 4}
David Gleich · Purdue
8
10. Some understanding through
decoupling
40 60 80 100
40
60
80
mm
David F. Gleich (Purdue) LA/Opt Sem
Example
Let A =
î {2 3 1} {8 2 0}
{ 2 0 2} {3 1 1}
ó
. The result of the circ and cft
operations are:
circ(A) =
2
6
6
6
6
6
6
4
2 1 3 8 0 2
3 2 1 2 8 0
1 3 2 0 2 8
2 2 0 3 1 1
0 2 2 1 3 1
2 0 2 1 1 3
3
7
7
7
7
7
7
5
,
( ⌦ F )circ(A)( ⌦ F) =
2
6
6
6
6
6
6
6
4
6 6
p
3 9 +
p
3
p
3 9
p
3
0 5
3 +
p
3 2
3
p
3 2
3
7
7
7
7
7
7
7
5
,
cft(A) =
2
6
6
6
6
6
6
6
4
6 6
0 5
p
3 9 +
p
3
3 +
p
3 2
p
3 9
p
3
3
p
3 2
3
7
7
7
7
7
7
7
5
.
David Gleich · Purdue
10
11. 40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) LA/Opt Seminar 20 /
Example
A =
{2 3 1} {0 0 0}
{0 0 0} {3 1 1}
ˆA1 =
6 0
0 5
, ˆA2 =
ñ
-
p
3 0
0 2
ô
, ˆA3 =
ñ p
3 0
0 2
ô
.
1 = icft(di g [6 2 2]) = (1/3) {10 4 4}
2 = icft(di g [5 -
p
3
p
3]) = (1/3) {5 2 2}
3 = icft(di g [6 -
p
3
p
3]) = {2 3 1}
4 = icft(di g [5 2 2]) = (1/3) {3 1 1} .
The corresponding eigenvectors are
x1 =
{1/3 1/3 1/3}
{2/3 -1/3 -1/3}
; x2 =
{2/3 -1/3 -1/3}
{1/3 1/3 1/3}
;
x3 =
{1 0 0}
{0 0 0}
; x4 =
{0 0 0}
{1 0 0}
.
Some understanding through
decoupling
David Gleich · Purdue
11
12. Convergence of the power method is in
terms of the individual blocks
40 60 80 100 120
40
60
mm
David F. Gleich (Purdue) LA/Opt Seminar 23 / 29
The power method converges
Let A 2 Kn⇥n
k have a canonical set of eigenvalues 1, . . . , n
where | 1| > | 2|, then the power method in the circulant
algebra convergences to an eigenvector x1 with eigenvalue 1.
Where we use the ordering ...
< $ cft( ) < cft( ) elementwise
40
60
80
5 = icft(di g [6 -
p
3 2]) 6 = icft(di g [6 2
p
3])
7 = icft(di g [5 -
p
3 2]) 8 = icft(di g [5 2
p
3]),
altogether polynomial number, exceeds dimension of matrix.
Definition. A canonical set of eigenvalues and eigenvectors is
a set of minimum size, ordered such that
abs( 1) abs( 2) . . . abs( k), which contains the
information to reproduce any eigenvalue or eigenvector of A
In this case, the only canonical set is {( 1, x1), ( 2, x2)}. (Need
two, and have abs( 1) abs( 2).)
David Gleich · Purdue
12
13. 40 60 80 100 120
40
60
mm
David F. Gleich (Purdue) LA/Opt Seminar 28 / 29
2000 4000 6000 8000
0
−15
0
−10
0
−5
0
0
✓
2 +2 cos( 2⇡/n)
2 +2 cos( ⇡/n)
◆2 i
✓
6 +2 cos( 2⇡/n)
6 +2 cos( ⇡/n)
◆2 i
✓
6 +2 cos( 2⇡/n)
6 +2 cos( ⇡/n)
◆i
iteration
Eigenvalue Error
Eigenvector Change
ure: The convergence behavior of the power
od in the circulant algebra. The gray lines show
rror in the each eigenvalue in Fourier space.
e curves track the predictions made based on
igenvalues as discussed in the text. The red
0 10 20 30 40 50
10
−15
10
−10
10
−5
10
0
Arnoldi iteration
Magnitude
Absolute error
Residual magnitude
Figure: The convergence behavior of a GMRES
procedure using the circulant Arnoldi process. The
gray lines show the error in each Fourier component
and the red line shows the magnitude of the
residual. We observe poor convergence in one
The Arnoldi Method
Using our repertoire of
operations, the Arnoldi method
in the circulant algebra is
equivalent to individual Arnoldi
processes on each matrix.
Equivalent to a block Arnoldi
process.
Using the cft and icft operations,
we produce an Arnoldi
factorization:
40 60 80 100 120
40
60
80
mm
avid F. Gleich (Purdue) LA/Opt Seminar 24 / 29
e Arnoldi process
Let A be an n ⇥ n matrix with real valued entries. Then the
Arnoldi method is a technique to build an orthogonal basis
or the Krylov subspace Kt(A, v) = span{v, Av, . . . , At 1
v},
where v is an initial vector.
We have the decomposition
AQt = Qt+1Ht+1,t
where Qt is an n ⇥ t matrix, and Ht+1,t is a (t + 1) ⇥ t upper
Hessenberg matrix.
Using our repertoire of operations, the Arnoldi method in
he circulant algebra is equivalent to individual Arnoldi
processes on each matrix ˆAj.
Equivalent to a block Arnoldi process.
Using the cft and icft operations, we produce an Arnoldi
actorization:
A Qt = Qt+1 Ht+1,t.
David Gleich · Purdue
13
14. A number of interesting mathematical
results from this algebra
1. A case study of how “decoupled” block
iterations arise and are meaningful for an
application.
2. It’s a beautiful algebra. E.g.
40 60 80 100
40
60
80
mm
More operations are simplified in the Fourier space too. Le
ft( ) = di g [ˆ1, ..., ˆk]. Because the ˆj values are the
igenvalues of circ( ), we have:
abs( ) = icft(di g [| ˆ1|, ..., | ˆk|]),
= icft(di g [ˆ1, ..., ˆk]) = icft(cft( ) ), and
angle( ) = icft(di g [ˆ1/| ˆ1|, ..., ˆk/| ˆk|]).
Proofs are simple, e.g. angle( ) angle( ) = 1
Live Matlab demo
David Gleich · Purdue
14
15. Conclusion to the circulant
algebra
Paper available from "
https://www.cs.purdue.edu/homes/dgleich/
The power and Arnoldi method in an algebra of
circulants, NLA 2013, Gleich, Greif, Varah.
Code available from"
https://www.cs.purdue.edu/homes/dgleich/
codes/camat
David Gleich · Purdue
15
16. Project 2
Fast relaxation methods to estimate a column of
the martrix exponential.
With Kyle Kloster
David Gleich · Purdue
16
17. Matrix exponentials
exp(A) is defined as
1X
k=0
1
k!
Ak Always converges
dx
dt
= Ax(t) , x(t) = exp(tA)x(0) Evolution operator "
for an ODE
A is n ⇥ n, real
David Gleich · Purdue
17
special case of a function of a matrix f(A)
others are f(x) = 1/x; f(x) = sinh(x)...
18. Matrix exponentials on large networks
exp(A) =
1X
k=0
1
k!
Ak If A is the adjacency matrix, then
Ak counts the number of length k
paths between node pairs.
[Estrada 2000, Farahat et al. 2002, 2006]
Large entries denote important nodes or edges.
Used for link prediction and centrality
If P is a transition matrix (column
stochastic), then Pk is the
probability of a length k walk
between node pairs.
[Kondor & Lafferty 2002, Kunegis & Lommatzsch 2009, Chung 2007]
Used for link prediction, kernels, and
clustering or community detection
exp(P) =
1X
k=0
1
k!
Pk
David Gleich · Purdue
18
19. This talk: a column of the
matrix exponential
x = exp(P)ec
x the solution
P the matrix
ec the column
David Gleich · Purdue
19
20. This talk: a column of the
matrix exponential
x = exp(P)ec
x the solution
P the matrix
ec the column
localized
large, sparse, stochastic
David Gleich · Purdue
20
22. Our mission!
Exploit the structure
Find the solution with work "
roughly proportional to the "
localization, not the matrix.
David Gleich · Purdue
22
24. Matrix exponentials on large networks
Is a single column interesting? Yes!
exp(P)ec =
1X
k=0
1
k!
Pk
ec Link prediction scores for node c
A community relative to node c
But …
modern networks are "
large ~ O(109) nodes,
sparse ~ O(1011) edges,
constantly changing …
and so we’d like "
speed over accuracy
David Gleich · Purdue
24
25. SIAM REVIEW c⃝ 2003 Society for Industrial and Applied Mathematics
Vol. 45, No. 1, pp. 3–49
Nineteen Dubious Ways to
Compute the Exponential of a
Matrix, Twenty-Five Years Later∗
Cleve Moler†
Charles Van Loan‡
David Gleich · Purdue
25
26. Our underlying method
Direct expansion!
A few matvecs, quick loss of sparsity due to fill-in
This method is stable for stochastic P!
"… no cancellation, unbounded norm, etc.
!
!
David Gleich · Purdue
26
x = exp(P)ec ⇡
PN
k=0
1
k! Pk
ec = xN
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your
computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
27. Our underlying method "
as a linear system
Direct expansion!
"
!
!
!
David Gleich · Purdue
27
x = exp(P)ec ⇡
PN
k=0
1
k! Pk
ec = xN
2
6
6
6
6
6
6
4
III
P/1 III
P/2
...
... III
P/N III
3
7
7
7
7
7
7
5
2
6
6
6
6
6
6
4
v0
v1
...
...
vN
3
7
7
7
7
7
7
5
=
2
6
6
6
6
6
6
4
ec
0
...
...
0
3
7
7
7
7
7
7
5
xN =
NX
i=0
vi
(III ⌦ IIIN SN ⌦ P)v = e1 ⌦ ec
Lemma we approximate xN well if we approximate v well
29. Coordinate descent, Gauss-Southwell,
Gauss-Seidel, relaxation & “push” methods
David Gleich · Purdue
29
Algebraically! Procedurally!
Solve(A,b)
x = sparse(size(A,1),1)
r = b
While (1)
Pick j where r(j) != 0
z = r(j)
x(j) = x(j) + r(j)
For i where A(i,j) != 0
r(i) = r(i) – z*A(i,j)
Ax = b
r(k)
= b Ax(k)
x(k+1)
= x(k)
+ ej eT
j r(k)
r(k+1)
= r(k)
r(k)
j Aej
30. Back to the exponential
David Gleich · Purdue
30
2
6
6
6
6
6
6
4
III
P/1 III
P/2
...
... III
P/N III
3
7
7
7
7
7
7
5
2
6
6
6
6
6
6
4
v0
v1
...
...
vN
3
7
7
7
7
7
7
5
=
2
6
6
6
6
6
6
4
ec
0
...
...
0
3
7
7
7
7
7
7
5
xN =
NX
i=0
vi
(III ⌦ IIIN SN ⌦ P)v = e1 ⌦ ec
Solve this system via the same method.
Optimization 1 build system implicitly
Optimization 2 don’t store vi, just store sum xN
31. Error analysis for Gauss-Southwell
David Gleich · Purdue
31
Theorem
Assume P is column-stochastic, v(0)
= 0.
(Nonnegativity)
iterates and residuals are nonnegative
v(l)
0 and r(l)
0
(Convergence)
residual goes to 0:
kr(l)
k1
Ql
k=1 1 1
2dk l( 1
2d )
(III ⌦ IIIN SN ⌦ P)v = e1 ⌦ ec
“easy”
“annoying”
d is the
largest degree
32. Proof sketch
Gauss-Southwell picks largest residual
⇒ Bound the update by avg. nonzeros in residual (sloppy)
⇒ Algebraic convergence with slow rate, but each update is
REALLY fast O(d max log n).
If d is log log n, then our method runs in sub-linear time "
(but so does just about anything)
David Gleich · Purdue
32
33. Overall error analysis
David Gleich · Purdue
33
Components!
Truncation to N terms
Residual to error
Approximate solve
Theorem kxN
(`)
xk1
1
N!N
+
1
e
· `
1
2d
After ℓ steps of Gauss-Southwell
34. More recent error analysis
David Gleich · Purdue
34
Theorem (Gleich and Kloster, 2013 arXiv:1310.3423)"
Consider computing the matrix exponential using the
Gauss-Southwell relaxation method in a graph with a
Zipf-law in the degrees with exponent p=1 and max-
degree d, then the work involved in getting a solution with
1-norm error ε is
work = O
⇣
log(1
" )(1
" )3/2
d2
(log d)2
⌘
35. Problem size &"
Runtimes
10
6
10
7
10
8
10
9
10
10
10
−4
10
−2
10
0
10
2
10
4
|V|+ nnz(P)
runtime(s)
expmv
half
gexpmq
gexpm
expmimv
– The median runtime of our methods for the seven graphs over 100 trials (only
Table 3 – The real-world datasets we use in our experiments span three orders of magnitude in
size.
Graph |V | nnz(P ) nnz(P )/|V |
itdk0304 190,914 1,215,220 6.37
dblp-2010 226,413 1,432,920 6.33
flickr-scc 527,476 9,357,071 17.74
ljournal-2008 5,363,260 77,991,514 14.54
webbase-2001 118,142,155 1,019,903,190 8.63
twitter-2010 33,479,734 1,394,440,635 41.65
friendster 65,608,366 3,612,134,270 55.06
Real-world networks The datasets used are summarized in Table 3. They include
a version of the flickr graph from [Bonchi et al., 2012] containing just the largest
strongly-connected component of the original graph; dblp-2010 from [Boldi et al., 2011],
itdk0304 in [(The Cooperative Association for Internet Data Analyais), 2005], ljournal-
2008 from [Boldi et al., 2011, Chierichetti et al., 2009], twitter-2010 [Kwak et al., 2010]
webbase-2001 from [Hirai et al., 2000, Boldi and Vigna, 2005], and the friendster graph
in [Yang and Leskovec, 2012].
Implementation details All experiments were performed on either a dual processor
Xeon e5-2670 system with 16 cores (total) and 256GB of RAM or a single processor
Intel i7-990X, 3.47 GHz CPU and 24 GB of RAM. Our algorithms were implemented in
C++ using the Matlab MEX interface. All data structures used are memory-e cient:
the solution and residual are stored as hash tables using Google’s sparsehash pack-
age. The precise code for the algorithms and the experiments below are available via
https://www.cs.purdue.edu/homes/dgleich/codes/nexpokit/.
Comparison We compare our implementation with a state-of-the-art Matlab function
for computing the exponential of a matrix times a vector, expmv [Al-Mohy and Higham,
2011]. We customized this method with knowledge that kP k1 = 1. This single change
results in a great improvement to the runtime of their code. In each experiment, we
use as the “true solution” the result of a call to expmv using the ‘single’ option, which
guarantees a 1-norm error bounded by 2 24
, or, for smaller problems, we use a Taylor
approximation with the number of terms predicted by Lemma 12.David Gleich · Purdue
35
36. References and ongoing work
Kloster and Gleich, Workshop on Algorithms for the
Web-graph, 2013. Also see the journal version on arXiv.
www.cs.purdue.edu/homes/dgleich/codes/nexpokit
• Error analysis using the queue (almost done …)
• Better linear systems for faster convergence
• Asynchronous coordinate descent methods
• Scaling up to billion node graphs (done …)
David Gleich · Purdue
36
Supported by NSF CAREER 1149756-CCF
www.cs.purdue.edu/homes/dgleich