SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
Carnegie 
Mellon 
University 
Making 
Sense 
of 
Large 
Graphs: 
Summarization 
and 
Similarity 
Danai Koutra 
Computer Science Department 
Carnegie Mellon University 
danai@cs.cmu.edu 
http://www.cs.cmu.edu/~dkoutra 
Mlconf 
‘14, 
Atlanta, 
GA
Making 
sense 
of 
large 
graphs 
Human 
Connectome 
Project 
>1.25B 
users! 
scalable algorithms and models 
for understanding massive graphs. 
Danai Koutra (CMU) 2
Understanding 
Large 
Graphs 
Part 1 
S u m m a r i z a t i o n 
Danai Koutra (CMU) 3
Ever 
tried 
visualizing 
a 
large 
79,870 email 
accounts 
288,364 emails 
graph? 
Danai Koutra (CMU) 4
Ever 
tried 
visualizing 
a 
large 
79,870 email 
accounts 
288,364 emails 
graph? 
Danai Koutra (CMU) 5
After 
this 
talk, 
you’ll 
know 
how 
to 
Cind… 
VoG Top-3 Stars 
klay@enron.com 
kenneth.lay@enron.com 
Danai Koutra (CMU) 6
Enron 
Summary 
VoG Top Near Bipartite Core 
Commenters CC’ed 
Danai Koutra (CMU) 7 
Ski 
excursion 
organizers 
participants 
“Affair”
Problem 
DeCinition 
Given: a graph 
Find: 
a succinct summary 
with possibly 
overlapping subgraphs 
≈ 
important graph 
structures. 
[Koutra, Kang, Vreeken, Faloutsos. SDM’14] 
Danai Koutra (CMU) 8 
Lady Gaga 
Fan Club
Main 
Ideas 
Idea 1: Use well-known structures (vocabulary): 
Idea 2: Best graph summary 
Shortest lossless description 
è optimal compression (MDL) 
Danai Koutra (CMU) 9
BACKGROUND 
Minimum 
Description 
Length 
~Occam’s razor 
min 
L(M) 
+ 
L(D|M) 
# bits 
for M 
a1 x + a0 
# bits for the 
data using M 
errors 
a10 x10 + a9 x9 + … + a0 
{ } 
simple & good 
explanations 
Danai Koutra (CMU) 10
Formally: 
Minimum 
Graph 
Description 
Given: - a graph G 
- vocabulary Ω 
Danai Koutra (CMU) 11 
Find: model M 
s.t. min L(G,M) = min{ L(M) + L(E) } 
Adjacency A Model M Error E
VoG: 
Overview 
≈? 
argmin 
≈ 
Danai Koutra (CMU) 12
VoG: 
Overview 
Danai Koutra (CMU) 13 
Pick best 
(with some criterion) 
Summary
Q: 
Which 
structures 
to 
pick? 
A: Those that 
min description length 
S of G 
2|S| combinations 
Danai Koutra (CMU) 14
Runtime 
1.25B 
users! 
VOG is near-linear on # edges of the input graph. 
Danai Koutra (CMU) 15
Understanding 
a 
wiki 
graph 
I don’t see 
anything! L 
Nodes: wiki editors 
Edges: co-edited 
Danai Koutra (CMU) 16
Wiki 
Controversial 
Article 
Danai Koutra (CMU) 17 
Stars: 
admins, 
bots, 
heavy users 
Bipartite cores: edit wars 
Kiev vs. Kyiv vandals vs. admins
VoG 
vs. 
other 
methods 
[Navlakha+’08] [Dunne+’13] [Chakrabarti+’03] 
Stars, cliques near-cliques 
Danai Koutra (CMU) 18 
VoG 
Bounded-­‐Error 
Summariza@on 
Mo@f 
Simplifica@on 
Clustering 
Methods 
Cross-­‐ 
Associa@ons 
Variety 
of 
Structures 
✔ 
✗ 
✗ 
✗ 
✗ 
Important 
Structures 
✔ 
✗ 
✗ 
✗ 
✗ 
Low 
Complexity 
✔ 
✗ 
✗ 
✔(?) 
✔ 
Visualiza@on 
✔ 
✔ 
✔ 
✗ 
✗ 
Graph 
Summary 
✔ 
✔ 
✔ 
✗ 
✗
VoG: 
summary 
• Focus on important 
• possibly-overlapping structures 
• with known graph-theoretic properties 
Danai Koutra (CMU) 19 
www.cs.cmu.edu/~dkoutra/SRC/vog.tar
Understanding 
Large 
Graphs 
Part 2 
S i m i l a r i t i e s 
Danai Koutra (CMU) 20
friendship 
graph 
≈ 
wall 
posts 
graph? 
VS. 
1 
Behavioral 
PaOerns 
Are 
the 
graphs 
/ 
behaviors 
similar? 
Danai Koutra (CMU) 21
Why 
graph 
similarity? 
Day 
1 
Day 
2 
Day 
3 
Day 
4 
Danai Koutra (CMU) 22 
2 Classification 
Temporal 
anomaly 
detec@on 
3 
4 
Intrusion 
detec@on 
! ! 12 13 14 22 23 
sim1 
sim2 
sim3
Problem 
DeCinition: 
Graph 
Similarity 
• Given: 
(i) 2 graphs with the 
same nodes and 
different edge sets 
(ii) node correspondence 
• Find: similarity score 
s [0,1] 
€ 
∈ 
GA 
GB 
Danai Koutra (CMU) 23
Obvious 
solution? 
Edge Overlap (EO) 
# of common edges 
(normalized or not) 
Danai Koutra 24 
GA 
GB
… 
but 
“barbell”… 
EO(B10,mB10) == EO(B10,mmB10) 
GA GA 
GB GB’ 
Danai Koutra 25
What 
makes 
a 
similarity 
function 
good? 
26 
• Properties: 
² Intuitive 
ProperFes 
like: 
“Edge-­‐importance” 
Danai Koutra
ProperFes 
like: 
“Weight-­‐awareness” 
✗ 
What 
makes 
a 
similarity 
function 
good? 
27 
• Properties: 
² Intuitive 
² Scalable 
Danai Koutra 
✗
MAIN 
IDEA: 
DELTACON 
28 
① Find the pairwise node influence, SA  SB. 
② Find the similarity between SA  SB. 
SA 
= 
SB = 
Danai Koutra (CMU) 
DETAILS
INTUITION 
How? 
Using 
Belief 
Propagation 
Attenuating Neighboring Influence for small ε: 
1-hop 2-hops … 
29 
S =[I+ε 2D−εA]−1 ≈ 
≈ [I −εA]−1 = I+εA+ε 2A2 +... 
Note: ε  ε2  ..., 0ε1 
Danai Koutra (CMU)
OUR 
SOLUTION: 
DELTACON 
DETAILS 
30 
① Find the pairwise node influence, SA  SB. 
② Find the similarity between SA  SB. 
Danai Koutra (CMU) 
sim( ) = 
1 
1+ Σ 
( 2 
s− s)i, j A,ij B,ij SA,SB 
SA 
= 
SB = 
“Root” 
Euclidean 
Distance
… 
but 
O(n2) 
… 
31 
f a s t e r ? 
O(m1+m2) 
in the paper J 
Danai Koutra (CMU)
32 
• Nodes: 
Temporal 
Anomaly 
Detection 
email 
accounts 
of 
employees 
• Edges: 
email 
exchange 
sim1 
sim2 
sim3 
sim4 
Day 
1 
Day 
2 
Day 
3 
Day 
4 
Day 
5 
Danai Koutra (CMU)
Temporal 
Anomaly 
Detection 
similarity 
Feb 
4: 
Lay 
resigns 
consecu@ve 
days 
Danai Koutra (CMU) 
33
Brain-­‐Connectivity 
Graph 
Clustering 
34 
• 114 brain graphs 
² Nodes: 70 cortical regions 
² Edges: connections 
• Attributes: gender, IQ, age… 
Danai Koutra (CMU)
Brain-­‐Connectivity 
Graph 
Clustering 
Danai Koutra (CMU) 35 
t-­‐test 
p-­‐value 
= 
0.0057
Graph 
Understanding 
via 
… 
• … Summarization … 
² VoG: to spot the important graph structures 
• … Comparison … 
² DeltaCon: to find the similarity between 
aligned networks 
² BiG-Align to align bi/uni-partite 
² Uni-Align graphs efficiently 
Danai Koutra (CMU) 36
Thank 
you! 
Understanding 
summarization similarities 
www.cs.cmu.edu/~dkoutra/pub.htm 
danai@cs.cmu.edu 
Danai Koutra (CMU) 37

Weitere ähnliche Inhalte

Was ist angesagt?

Alg March 23, 2009
Alg March 23, 2009Alg March 23, 2009
Alg March 23, 2009Mr. Smith
 
Productos notables
Productos notablesProductos notables
Productos notablesPROFTEBA
 
2.7 more parabolas a& hyperbolas (optional) t
2.7 more parabolas a& hyperbolas (optional) t2.7 more parabolas a& hyperbolas (optional) t
2.7 more parabolas a& hyperbolas (optional) tmath260
 
Microsoft word exercicio matemática com gabarito equações do 2º grau
Microsoft word   exercicio matemática com  gabarito equações do 2º grauMicrosoft word   exercicio matemática com  gabarito equações do 2º grau
Microsoft word exercicio matemática com gabarito equações do 2º grauBetão Betão
 
Finding coordinates parabol,cubic
Finding coordinates   parabol,cubicFinding coordinates   parabol,cubic
Finding coordinates parabol,cubicshaminakhan
 
Class 9th ch 2 polynomials quiz
Class 9th ch 2 polynomials quizClass 9th ch 2 polynomials quiz
Class 9th ch 2 polynomials quizssusera41fd2
 
001 int techintro
001  int techintro001  int techintro
001 int techintrojbianco9910
 
March 15
March 15March 15
March 15khyps13
 
Chapter 5: Mapping and Scheduling
Chapter  5: Mapping and SchedulingChapter  5: Mapping and Scheduling
Chapter 5: Mapping and SchedulingHeman Pathak
 
Kelompok 5
Kelompok 5Kelompok 5
Kelompok 5teddy
 

Was ist angesagt? (19)

numbers system
numbers systemnumbers system
numbers system
 
Alg March 23, 2009
Alg March 23, 2009Alg March 23, 2009
Alg March 23, 2009
 
Productos notables
Productos notablesProductos notables
Productos notables
 
Rich gets richer-Bitcoin Network
Rich gets richer-Bitcoin NetworkRich gets richer-Bitcoin Network
Rich gets richer-Bitcoin Network
 
2.7 more parabolas a& hyperbolas (optional) t
2.7 more parabolas a& hyperbolas (optional) t2.7 more parabolas a& hyperbolas (optional) t
2.7 more parabolas a& hyperbolas (optional) t
 
Microsoft word exercicio matemática com gabarito equações do 2º grau
Microsoft word   exercicio matemática com  gabarito equações do 2º grauMicrosoft word   exercicio matemática com  gabarito equações do 2º grau
Microsoft word exercicio matemática com gabarito equações do 2º grau
 
Finding coordinates parabol,cubic
Finding coordinates   parabol,cubicFinding coordinates   parabol,cubic
Finding coordinates parabol,cubic
 
Relations2 qa
Relations2 qaRelations2 qa
Relations2 qa
 
Class 9th ch 2 polynomials quiz
Class 9th ch 2 polynomials quizClass 9th ch 2 polynomials quiz
Class 9th ch 2 polynomials quiz
 
001 int techintro
001  int techintro001  int techintro
001 int techintro
 
March 15
March 15March 15
March 15
 
Discrete time signals on MATLAB
Discrete time signals on MATLABDiscrete time signals on MATLAB
Discrete time signals on MATLAB
 
Chapter 5: Mapping and Scheduling
Chapter  5: Mapping and SchedulingChapter  5: Mapping and Scheduling
Chapter 5: Mapping and Scheduling
 
Polynomial function
Polynomial functionPolynomial function
Polynomial function
 
Tsetsegmaa
TsetsegmaaTsetsegmaa
Tsetsegmaa
 
Kelompok 5
Kelompok 5Kelompok 5
Kelompok 5
 
Mth 4108-1 a
Mth 4108-1 aMth 4108-1 a
Mth 4108-1 a
 
2.circle
2.circle2.circle
2.circle
 
March 8
March 8March 8
March 8
 

Ähnlich wie Summarizing and Comparing Large Graphs Using VoG and DeltaCon

Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos FaloutsosLarge Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos FaloutsosBigMine
 
SMU BCA SUMMER 2014 ASSIGNMENTS
SMU BCA SUMMER 2014 ASSIGNMENTSSMU BCA SUMMER 2014 ASSIGNMENTS
SMU BCA SUMMER 2014 ASSIGNMENTSsolved_assignments
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networksDavid Gleich
 
Output Units and Cost Function in FNN
Output Units and Cost Function in FNNOutput Units and Cost Function in FNN
Output Units and Cost Function in FNNLin JiaMing
 
theory of computation lecture 01
theory of computation lecture 01theory of computation lecture 01
theory of computation lecture 018threspecter
 
1627 simultaneous equations and intersections
1627 simultaneous equations and intersections1627 simultaneous equations and intersections
1627 simultaneous equations and intersectionsDr Fereidoun Dejahang
 
Shearlet Frames and Optimally Sparse Approximations
Shearlet Frames and Optimally Sparse ApproximationsShearlet Frames and Optimally Sparse Approximations
Shearlet Frames and Optimally Sparse ApproximationsJakob Lemvig
 
Smu bca sem 1 winter 2014 assignments
Smu bca sem 1 winter 2014 assignmentsSmu bca sem 1 winter 2014 assignments
Smu bca sem 1 winter 2014 assignmentssmumbahelp
 
CS 354 Bezier Curves
CS 354 Bezier Curves CS 354 Bezier Curves
CS 354 Bezier Curves Mark Kilgard
 
Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...Thang Nguyen
 
1 College Algebra Final Examination---FHSU Math & C.S.docx
 1 College Algebra Final Examination---FHSU Math & C.S.docx 1 College Algebra Final Examination---FHSU Math & C.S.docx
1 College Algebra Final Examination---FHSU Math & C.S.docxjoyjonna282
 
CAD Lab model viva questions
CAD Lab model viva questions CAD Lab model viva questions
CAD Lab model viva questions SHAMJITH KM
 
Solid modelling Slide share academic writing assignment 2
Solid modelling Slide share academic writing assignment 2Solid modelling Slide share academic writing assignment 2
Solid modelling Slide share academic writing assignment 2somu12bemech
 
Presentation European Actuarial Journal conference 2016
Presentation European Actuarial Journal conference 2016Presentation European Actuarial Journal conference 2016
Presentation European Actuarial Journal conference 2016Thierry Moudiki
 

Ähnlich wie Summarizing and Comparing Large Graphs Using VoG and DeltaCon (20)

Interactive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social GraphsInteractive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social Graphs
 
Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos FaloutsosLarge Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
 
SMU BCA SUMMER 2014 ASSIGNMENTS
SMU BCA SUMMER 2014 ASSIGNMENTSSMU BCA SUMMER 2014 ASSIGNMENTS
SMU BCA SUMMER 2014 ASSIGNMENTS
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
 
SolidModeling.ppt
SolidModeling.pptSolidModeling.ppt
SolidModeling.ppt
 
Cad notes
Cad notesCad notes
Cad notes
 
Output Units and Cost Function in FNN
Output Units and Cost Function in FNNOutput Units and Cost Function in FNN
Output Units and Cost Function in FNN
 
theory of computation lecture 01
theory of computation lecture 01theory of computation lecture 01
theory of computation lecture 01
 
1627 simultaneous equations and intersections
1627 simultaneous equations and intersections1627 simultaneous equations and intersections
1627 simultaneous equations and intersections
 
Shearlet Frames and Optimally Sparse Approximations
Shearlet Frames and Optimally Sparse ApproximationsShearlet Frames and Optimally Sparse Approximations
Shearlet Frames and Optimally Sparse Approximations
 
Smu bca sem 1 winter 2014 assignments
Smu bca sem 1 winter 2014 assignmentsSmu bca sem 1 winter 2014 assignments
Smu bca sem 1 winter 2014 assignments
 
CS 354 Bezier Curves
CS 354 Bezier Curves CS 354 Bezier Curves
CS 354 Bezier Curves
 
Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...
 
1 College Algebra Final Examination---FHSU Math & C.S.docx
 1 College Algebra Final Examination---FHSU Math & C.S.docx 1 College Algebra Final Examination---FHSU Math & C.S.docx
1 College Algebra Final Examination---FHSU Math & C.S.docx
 
CAD Lab model viva questions
CAD Lab model viva questions CAD Lab model viva questions
CAD Lab model viva questions
 
Gate-Cs 2010
Gate-Cs 2010Gate-Cs 2010
Gate-Cs 2010
 
Solid modelling Slide share academic writing assignment 2
Solid modelling Slide share academic writing assignment 2Solid modelling Slide share academic writing assignment 2
Solid modelling Slide share academic writing assignment 2
 
Presentation European Actuarial Journal conference 2016
Presentation European Actuarial Journal conference 2016Presentation European Actuarial Journal conference 2016
Presentation European Actuarial Journal conference 2016
 
CP 2011
CP 2011CP 2011
CP 2011
 
Lecture8 xing
Lecture8 xingLecture8 xing
Lecture8 xing
 

Mehr von MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

Mehr von MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Kürzlich hochgeladen

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Kürzlich hochgeladen (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Summarizing and Comparing Large Graphs Using VoG and DeltaCon

  • 1. Carnegie Mellon University Making Sense of Large Graphs: Summarization and Similarity Danai Koutra Computer Science Department Carnegie Mellon University danai@cs.cmu.edu http://www.cs.cmu.edu/~dkoutra Mlconf ‘14, Atlanta, GA
  • 2. Making sense of large graphs Human Connectome Project >1.25B users! scalable algorithms and models for understanding massive graphs. Danai Koutra (CMU) 2
  • 3. Understanding Large Graphs Part 1 S u m m a r i z a t i o n Danai Koutra (CMU) 3
  • 4. Ever tried visualizing a large 79,870 email accounts 288,364 emails graph? Danai Koutra (CMU) 4
  • 5. Ever tried visualizing a large 79,870 email accounts 288,364 emails graph? Danai Koutra (CMU) 5
  • 6. After this talk, you’ll know how to Cind… VoG Top-3 Stars klay@enron.com kenneth.lay@enron.com Danai Koutra (CMU) 6
  • 7. Enron Summary VoG Top Near Bipartite Core Commenters CC’ed Danai Koutra (CMU) 7 Ski excursion organizers participants “Affair”
  • 8. Problem DeCinition Given: a graph Find: a succinct summary with possibly overlapping subgraphs ≈ important graph structures. [Koutra, Kang, Vreeken, Faloutsos. SDM’14] Danai Koutra (CMU) 8 Lady Gaga Fan Club
  • 9. Main Ideas Idea 1: Use well-known structures (vocabulary): Idea 2: Best graph summary Shortest lossless description è optimal compression (MDL) Danai Koutra (CMU) 9
  • 10. BACKGROUND Minimum Description Length ~Occam’s razor min L(M) + L(D|M) # bits for M a1 x + a0 # bits for the data using M errors a10 x10 + a9 x9 + … + a0 { } simple & good explanations Danai Koutra (CMU) 10
  • 11. Formally: Minimum Graph Description Given: - a graph G - vocabulary Ω Danai Koutra (CMU) 11 Find: model M s.t. min L(G,M) = min{ L(M) + L(E) } Adjacency A Model M Error E
  • 12. VoG: Overview ≈? argmin ≈ Danai Koutra (CMU) 12
  • 13. VoG: Overview Danai Koutra (CMU) 13 Pick best (with some criterion) Summary
  • 14. Q: Which structures to pick? A: Those that min description length S of G 2|S| combinations Danai Koutra (CMU) 14
  • 15. Runtime 1.25B users! VOG is near-linear on # edges of the input graph. Danai Koutra (CMU) 15
  • 16. Understanding a wiki graph I don’t see anything! L Nodes: wiki editors Edges: co-edited Danai Koutra (CMU) 16
  • 17. Wiki Controversial Article Danai Koutra (CMU) 17 Stars: admins, bots, heavy users Bipartite cores: edit wars Kiev vs. Kyiv vandals vs. admins
  • 18. VoG vs. other methods [Navlakha+’08] [Dunne+’13] [Chakrabarti+’03] Stars, cliques near-cliques Danai Koutra (CMU) 18 VoG Bounded-­‐Error Summariza@on Mo@f Simplifica@on Clustering Methods Cross-­‐ Associa@ons Variety of Structures ✔ ✗ ✗ ✗ ✗ Important Structures ✔ ✗ ✗ ✗ ✗ Low Complexity ✔ ✗ ✗ ✔(?) ✔ Visualiza@on ✔ ✔ ✔ ✗ ✗ Graph Summary ✔ ✔ ✔ ✗ ✗
  • 19. VoG: summary • Focus on important • possibly-overlapping structures • with known graph-theoretic properties Danai Koutra (CMU) 19 www.cs.cmu.edu/~dkoutra/SRC/vog.tar
  • 20. Understanding Large Graphs Part 2 S i m i l a r i t i e s Danai Koutra (CMU) 20
  • 21. friendship graph ≈ wall posts graph? VS. 1 Behavioral PaOerns Are the graphs / behaviors similar? Danai Koutra (CMU) 21
  • 22. Why graph similarity? Day 1 Day 2 Day 3 Day 4 Danai Koutra (CMU) 22 2 Classification Temporal anomaly detec@on 3 4 Intrusion detec@on ! ! 12 13 14 22 23 sim1 sim2 sim3
  • 23. Problem DeCinition: Graph Similarity • Given: (i) 2 graphs with the same nodes and different edge sets (ii) node correspondence • Find: similarity score s [0,1] € ∈ GA GB Danai Koutra (CMU) 23
  • 24. Obvious solution? Edge Overlap (EO) # of common edges (normalized or not) Danai Koutra 24 GA GB
  • 25. … but “barbell”… EO(B10,mB10) == EO(B10,mmB10) GA GA GB GB’ Danai Koutra 25
  • 26. What makes a similarity function good? 26 • Properties: ² Intuitive ProperFes like: “Edge-­‐importance” Danai Koutra
  • 27. ProperFes like: “Weight-­‐awareness” ✗ What makes a similarity function good? 27 • Properties: ² Intuitive ² Scalable Danai Koutra ✗
  • 28. MAIN IDEA: DELTACON 28 ① Find the pairwise node influence, SA SB. ② Find the similarity between SA SB. SA = SB = Danai Koutra (CMU) DETAILS
  • 29. INTUITION How? Using Belief Propagation Attenuating Neighboring Influence for small ε: 1-hop 2-hops … 29 S =[I+ε 2D−εA]−1 ≈ ≈ [I −εA]−1 = I+εA+ε 2A2 +... Note: ε ε2 ..., 0ε1 Danai Koutra (CMU)
  • 30. OUR SOLUTION: DELTACON DETAILS 30 ① Find the pairwise node influence, SA SB. ② Find the similarity between SA SB. Danai Koutra (CMU) sim( ) = 1 1+ Σ ( 2 s− s)i, j A,ij B,ij SA,SB SA = SB = “Root” Euclidean Distance
  • 31. … but O(n2) … 31 f a s t e r ? O(m1+m2) in the paper J Danai Koutra (CMU)
  • 32. 32 • Nodes: Temporal Anomaly Detection email accounts of employees • Edges: email exchange sim1 sim2 sim3 sim4 Day 1 Day 2 Day 3 Day 4 Day 5 Danai Koutra (CMU)
  • 33. Temporal Anomaly Detection similarity Feb 4: Lay resigns consecu@ve days Danai Koutra (CMU) 33
  • 34. Brain-­‐Connectivity Graph Clustering 34 • 114 brain graphs ² Nodes: 70 cortical regions ² Edges: connections • Attributes: gender, IQ, age… Danai Koutra (CMU)
  • 35. Brain-­‐Connectivity Graph Clustering Danai Koutra (CMU) 35 t-­‐test p-­‐value = 0.0057
  • 36. Graph Understanding via … • … Summarization … ² VoG: to spot the important graph structures • … Comparison … ² DeltaCon: to find the similarity between aligned networks ² BiG-Align to align bi/uni-partite ² Uni-Align graphs efficiently Danai Koutra (CMU) 36
  • 37. Thank you! Understanding summarization similarities www.cs.cmu.edu/~dkoutra/pub.htm danai@cs.cmu.edu Danai Koutra (CMU) 37