The document summarizes Danai Koutra's presentation on techniques for understanding large graphs through summarization and measuring similarity. The presentation introduces VoG, a method for graph summarization that finds overlapping substructures to minimize description length. It also presents DeltaCon, an algorithm that measures graph similarity based on pairwise node influence values calculated through belief propagation. DeltaCon runs in time linear to the number of edges. Examples are given of applying these techniques to email networks, wiki graphs, and brain connectivity networks.
Summarizing and Comparing Large Graphs Using VoG and DeltaCon
1. Carnegie
Mellon
University
Making
Sense
of
Large
Graphs:
Summarization
and
Similarity
Danai Koutra
Computer Science Department
Carnegie Mellon University
danai@cs.cmu.edu
http://www.cs.cmu.edu/~dkoutra
Mlconf
‘14,
Atlanta,
GA
2. Making
sense
of
large
graphs
Human
Connectome
Project
>1.25B
users!
scalable algorithms and models
for understanding massive graphs.
Danai Koutra (CMU) 2
4. Ever
tried
visualizing
a
large
79,870 email
accounts
288,364 emails
graph?
Danai Koutra (CMU) 4
5. Ever
tried
visualizing
a
large
79,870 email
accounts
288,364 emails
graph?
Danai Koutra (CMU) 5
6. After
this
talk,
you’ll
know
how
to
Cind…
VoG Top-3 Stars
klay@enron.com
kenneth.lay@enron.com
Danai Koutra (CMU) 6
7. Enron
Summary
VoG Top Near Bipartite Core
Commenters CC’ed
Danai Koutra (CMU) 7
Ski
excursion
organizers
participants
“Affair”
8. Problem
DeCinition
Given: a graph
Find:
a succinct summary
with possibly
overlapping subgraphs
≈
important graph
structures.
[Koutra, Kang, Vreeken, Faloutsos. SDM’14]
Danai Koutra (CMU) 8
Lady Gaga
Fan Club
9. Main
Ideas
Idea 1: Use well-known structures (vocabulary):
Idea 2: Best graph summary
Shortest lossless description
è optimal compression (MDL)
Danai Koutra (CMU) 9
10. BACKGROUND
Minimum
Description
Length
~Occam’s razor
min
L(M)
+
L(D|M)
# bits
for M
a1 x + a0
# bits for the
data using M
errors
a10 x10 + a9 x9 + … + a0
{ }
simple & good
explanations
Danai Koutra (CMU) 10
11. Formally:
Minimum
Graph
Description
Given: - a graph G
- vocabulary Ω
Danai Koutra (CMU) 11
Find: model M
s.t. min L(G,M) = min{ L(M) + L(E) }
Adjacency A Model M Error E
19. VoG:
summary
• Focus on important
• possibly-overlapping structures
• with known graph-theoretic properties
Danai Koutra (CMU) 19
www.cs.cmu.edu/~dkoutra/SRC/vog.tar
21. friendship
graph
≈
wall
posts
graph?
VS.
1
Behavioral
PaOerns
Are
the
graphs
/
behaviors
similar?
Danai Koutra (CMU) 21
22. Why
graph
similarity?
Day
1
Day
2
Day
3
Day
4
Danai Koutra (CMU) 22
2 Classification
Temporal
anomaly
detec@on
3
4
Intrusion
detec@on
! ! 12 13 14 22 23
sim1
sim2
sim3
23. Problem
DeCinition:
Graph
Similarity
• Given:
(i) 2 graphs with the
same nodes and
different edge sets
(ii) node correspondence
• Find: similarity score
s [0,1]
€
∈
GA
GB
Danai Koutra (CMU) 23
24. Obvious
solution?
Edge Overlap (EO)
# of common edges
(normalized or not)
Danai Koutra 24
GA
GB
25. …
but
“barbell”…
EO(B10,mB10) == EO(B10,mmB10)
GA GA
GB GB’
Danai Koutra 25
26. What
makes
a
similarity
function
good?
26
• Properties:
² Intuitive
ProperFes
like:
“Edge-‐importance”
Danai Koutra
28. MAIN
IDEA:
DELTACON
28
① Find the pairwise node influence, SA SB.
② Find the similarity between SA SB.
SA
=
SB =
Danai Koutra (CMU)
DETAILS
29. INTUITION
How?
Using
Belief
Propagation
Attenuating Neighboring Influence for small ε:
1-hop 2-hops …
29
S =[I+ε 2D−εA]−1 ≈
≈ [I −εA]−1 = I+εA+ε 2A2 +...
Note: ε ε2 ..., 0ε1
Danai Koutra (CMU)
30. OUR
SOLUTION:
DELTACON
DETAILS
30
① Find the pairwise node influence, SA SB.
② Find the similarity between SA SB.
Danai Koutra (CMU)
sim( ) =
1
1+ Σ
( 2
s− s)i, j A,ij B,ij SA,SB
SA
=
SB =
“Root”
Euclidean
Distance
31. …
but
O(n2)
…
31
f a s t e r ?
O(m1+m2)
in the paper J
Danai Koutra (CMU)
32. 32
• Nodes:
Temporal
Anomaly
Detection
email
accounts
of
employees
• Edges:
email
exchange
sim1
sim2
sim3
sim4
Day
1
Day
2
Day
3
Day
4
Day
5
Danai Koutra (CMU)