SlideShare a Scribd company logo
1 of 48
Download to read offline
http://nesreenahmed.com
High-Performance Graph Analysis and Modeling
§ This  talk  is  NOT about  graph  processing  systems  
• (e.g.,  graphx,  Giraph,  …).
§ This  talk  is  NOT about  graph  processing  systems  
• (e.g.,  graphx,  Giraph,  …).
§ Instead,  this  talk  is  about:
(1) Knowledge  discovery  and  extracting  insights  from  graph  data.
(2) Graph  Machine  Learning  
(3) High-­‐Performance  algorithms  for  solving    (1)  and  (2)
§ Graphs  encode  dependencies/relationships  between  entities
IID Relational/graph
p(Y | X) p(Y | X, XR)
IID  Classification Relational Classification
How	
  to	
  detect	
  abnormal	
  traffic?
Port  scanning DDoS Normal  Traffic
Adj.  Matrix
ibm.com
google.com
IP  src
IP  dest
IP  src
IP  dest
IP  src
IP  dest
How	
  to	
  select	
  k ‘best’ nodes	
  for	
  immunization?
34
33
25
26
27
28
29
30
31 32
22
21
20
19
18
17
23 24
12
13
14
15
16
1
9
10
11
3
4
5
6
7
8
2
Ebola	
  virus	
  epidemic	
  (costs	
  9,000+	
  lives,	
  potentially	
  32+	
  bn)
SARS	
  (costs	
  700+	
  lives;	
  $40+	
  bn)
-­‐
-­‐
-­‐
-­‐
-­‐
Social  network  
Human  Disease  Network  
[Barabasi 2007]
Food  Web  [2007]
Terrorist  Network
[Krebs  2002]Internet  (AS)  [2005]
Gene  Regulatory  Network  
[Decourty 2008]
Protein  Interactions  
[breast  cancer]
Political  blogs
Power  grid
New  Insights
Knowledge
Reports
Data Graph
Cleaning  
Selection
Processing
Modeling  
Ranking  
Querying
New  Insights
Knowledge
Reports
Data Graph
Cleaning  
Selection
Processing
Modeling  
Ranking  
Querying
Observation  1:  Graphs  are  never  given/observed
Graphs  are  usually  constructed/inferred  from  input  data
How  to  construct/infer  the  graph  from  input  data?
Data Graph
Social  Network
Relationship  may  represent:
-­ Friendship
-­ Email/IM/Communication
-­ Co-­location
-­ Re-­tweet
-­ Tagging
Biological  Network  /  
Chemoinformatics
Relationship  may  represent:
-­ Protein  Interaction
-­ Chemical  bonds  between  
Atoms
Infrastructure  Network
e.g.  Power  Grid
Web/Information  Network
Xu  et.  al,  Frontiers  in  behavioral  neuroscience,  2015
Brain  Functional  Connectivity  Network
New  Insights
Knowledge
Reports
Data Graph
XGraph  Representation
What’s  a  node?
Attributes?  Types?
What’s  an  edge?
Directed?  Undirected?
Time-­evolving?  Dynamic?
Observation  2:  Graph  Data  Management  is  challenging
GPU$
CPU$
Core Core
Core Core
System'
Memory''
Memory'
(GPU)
GPU$
CPU$
Core Core
Core Core
System'
Memory''
Memory'
(GPU)
Node Interconnect
v1 ! v4 ! v7 !v10 ! v2 ! v5 ! v8 !v11 ! v3 !v6 ! v9 ! v12 !
GPU$
CPU$
Core Core
Core Core
System'
Memory''
Memory'
(GPU)
v1 ! v2 ! v3 ! v4 ! v5 ! v6 ! v7 ! v8 ! v9 !v10 !v11 !v12 ! …"" vn !
t-p t-1 t
⋯
⋯
⋯
⋯
Large  data
Attributed
Dynamic
Heterogeneous
GPU$
CPU$
Core Core
Core Core
System'
Memory''
Memory'
(GPU)
GPU$
CPU$
Core Core
Core Core
System'
Memory''
Memory'
(GPU)
Node Interconnect
v1 ! v4 ! v7 !v10 ! v2 ! v5 ! v8 !v11 ! v3 !v6 ! v9 ! v12 !
GPU$
CPU$
Core Core
Core Core
System'
Memory''
Memory'
(GPU)
v1 ! v2 ! v3 ! v4 ! v5 ! v6 ! v7 ! v8 ! v9 !v10 !v11 !v12 ! …"" vn !
t-p t-1 t
⋯
⋯
⋯
⋯
Large  data
Attributed
Dynamic
Heterogeneous
Graph
Mining
& ML
Machine  learning/Data  mining
+
Statistics,  Graph  theory/algorithms  
New  Insights
Knowledge
Reports
Data
Graph  Representation
New  Insights
Knowledge
Reports
Data
Graph  Representation
How  to  extract  insights  from  data  represented  as  a  graph?
New  Insights
Knowledge
Reports
Graph  Representation
How  to  extract  insights  from  data  represented  as  a  graph?
New  Insights
Knowledge
Reports
Graph  Representation
(1) Graph  Decomposition
(1) Unsupervised  Representation  Learning  
Network  Motifs:  Simple  Building  Blocks  of  Complex  Networks  – [Milo  et.  al  – Science  2002]
The  Structure  and  Function  of  Complex  Networks  – [Newman  – Siam  Review  2003]
2-­node  
Graphlets
3-­node  
Graphlets
4-­node  
Graphlets
Connected  
Disconnected
Ex:  Given  an  input  graph  G
-­‐ How  many  triangles  in  G?
-­‐ How  many  cliques  of  size  4-­‐nodes  in  G?
-­‐ How  many  cycles  of  size  4-­‐nodes  in  G?
à In  practice,  we  would  like  to  count  all  k-­‐vertex  graphlets
Ranking  by  graphlet counts
Nodes  are  colored/weighted  
by  triangle  counts
Links  are  colored/weighted  
by  stars  of  size  4  nodes
Leukemia
Colon  
cancer
Deafness
§ Enumerate  all  possible  graphlets
à Exhaustive  enumeration  is  too  expensive  
§ Count  graphlets for  each  node  – and  combine  all  node  counts
à Still  expensive  for  relatively  large  k   [Shervashidze et.  al  – AISTAT  2009]  
§ Other  recent  work  counts  only  connected  graphlets of  size  k=4
[Marcus  &  Shavitt – Computer  Networks  2012]  
Not  practical  – scales  only  for  small  graphs  with  few  
hundred/thousand  nodes/edges
-­‐ taking  2400  secs for  a  graph  with  26K  nodes
± 1  edge
Graphlet Transition  Diagram  
± 1  edge
Count  Cliques  &  Cycles  ONLY
Use  relationships  &  transitions  
to  count  all  other  graphlets in  constant  time
4-­‐Cliques
4-­‐Cycles
Maximum  no.  triangles  
Incident  to  an  edge
Maximum  no.  stars
Incident  to  an  edge
Graphlet Transition  Diagram  
T T
Relationship  between  4-­‐cliques  &  4-­‐ChordalCycles
4-­‐Cliques 4-­‐ChordalCycle
e
T T
e
No.  4-­‐ChordalCycles No.    4-­‐Cliques
Proof  in  Lemma  1  -­ Ahmed  et  al.,  ICDM  2015
T T
Relationship  between  4-­‐cliques  &  4-­‐ChordalCycles
T T
No.  4-­‐ChordalCycles No.    4-­‐Cliques
4-­‐Cliques 4-­‐ChordalCycle
e e
Proof  in  Lemma  1  -­ Ahmed  et  al.,  ICDM  2015
1 2 4 8 12 16
0
2
4
6
8
10
12
14
16
Number of processing units
Speedup
socfb−MIT
bio−dmela
soc−gowalla
tech−RL−caida
web−wikipedia09
1 2 4 8 12 16
0
2
4
6
8
10
12
14
16
Number of processing units
Speedup
Strong  scaling  results
Using  Intel  Xeon  E5-­‐2687W  server,  16  cores
Motif  Counting
How  to  extract  insights  from  data  represented  as  a  graph?
New  Insights
Knowledge
Reports
Graph  Representation
(1) Graph  Decomposition
(1) Unsupervised  Representation  Learning  
input 0 …
1 …
0 …
Feature	
  
Engineering
features
1 …
1 … 0
0
1
0
0
Learning	
  
AlgorithmModel
Prediction	
  Task
Link	
  prediction
Classification	
  
Anomaly	
  detection
input 0 …
1 …
0 …
Feature	
  
Engineering
features
1 …
1 … 0
0
1
0
0
Learning	
  
AlgorithmModel
Prediction	
  Task
Automatic	
  
Feature	
  Learning
Link	
  prediction
Classification	
  
Anomaly	
  detection
§ Goal:  Learn  representation  (features)  for  a  set  of  graph  
elements  (nodes,  edges,  etc.)
§ Key  intuition:  Map  the  graph  elements  (e.g.,  nodes)  to  the  
d-­‐dimension  space,  while  preserving  node  similarity
§ Use  the  features  for  any  downstream  prediction  task
Communities:  cohesive  subsets  of  nodes
Roles:  represent  structural  patterns
-­‐ two  nodes  belong  to  the  same  role  if  they’ve  similar  structural  patterns
Cj#
Ci#
Ck#
Rossi	
  &	
  Ahmed	
  TKDE	
  2015
Ahmed	
  et	
  al.	
  AAAI	
  2017
Goal:  Find  a  mapping  of  nodes  to  d-­‐dimensions  that  preserves  
proximity  and  node  similarity
Using  structure  +  attributes  (if  any)
Ahmed	
  et.	
  al	
  2017
A  (conditional)  attributed  walk  is  a  finite  sequence  of  adjacent  
node  types  (words)  in  the  graph
Ahmed	
  et.	
  al	
  2017
Deepwalk (DW)  – Perrozi et  al.  KDD  2014
node2vec    (N2V)  – Grover  et  al.  KDD  2016
LINE:  Tang  et  al.  – WWW  2015
Link  Prediction
Observation  3:  Useful  insights  and  accurate  modeling  
depend  on  the  data  representation
Open  Source/data  Tools
§ Open  data  repository  with  interactive  visual  analytics  &  
exploration
§ Largest  with  500+  graphs,  over  20+  collections  
§ Community-­‐oriented  
• discuss,  post  data,  comments,  vis,  etc.
AAAI’15
NetworkRepository.com
High-Performance Graph Analysis and Modeling
Observation  3:  Useful  insights  and  accurate  modeling  
depend  on  the  data  representation
Observation  2:  Graph  Data  Management  is  challenging
Observation  1:  Graphs  are  never  given/observe
Graphs  are  usually  constructed/inferred  from  input  data
§ Efficient  estimation  of  word  representations  in  vector  space.  ICLR  2013  [Mikolov et.  al]
§ A  Framework  for  Generalizing  Graph-­‐based  Representation  Learning  Methods.  arXiv:1709.04596    2017  [Ahmed  et.  al]
§ Role  Discovery  in  Networks.  TKDE  2015  [Rossi  &  Ahmed]
§ A  Higher-­‐order  Latent  Space  Network  Model.  AAAI  2017  [Ahmed,  Rossi,  Willke,  Zhou]
§ node2vec:  Scalable  Feature  Learning  for  Networks.  KDD  2016  [Grover,  Leskovec]
§ DeepWalk:  online  learning  of  social  representations.  KDD  2014  [Perozzi,  Al-­‐Rafou,  Skiena]
§ Efficient  Graphlet Counting  for  Large  Networks.  ICDM  2015,  [Ahmed  et  al.]
§ Graphlet Decomposition:  Framework,  Algorithms,  and  Applications.  J.  Know.  &  Info.  2016  [Ahmed  et  al.]
§ Network  Motifs:  Simple  Building  Blocks  of  Complex  Networks.  Science  2002,  [Milo  et  al.]
§ Uncovering  Biological  Network  Function  via  Graphlet Degree  Signatures.  Cancer  Informatics  2008  [Milenković-­‐Pržulj]
§ Graph  Kernels.  JMLR  2010,  [Vishwanathan et  al.]
§ The  Structure   and  Function  of  Complex  Networks.  SIAM  Review  2003,  [Newman]
§ Biological  network  comparison  using  graphlet degree  distribution.  Bioinformatics  2007  [Pržulj]
§ Efficient  Graphlet Kernels  for  Large  Graph  Comparison.  AISTAT  2009  [Shervashidze et  al.]
§ Local  structure   in  social  networks.  Sociological  methodology  1976,  [Holland-­‐Leinhardt]
§ The  strength   of  weak  ties:  A  network  theory  revisited.  Sociological  theory 1983  [Granovetter]
Thank  you!
Questions?
nesreen.k.ahmed@intel.com
http://nesreenahmed.com

More Related Content

What's hot

GraphChi big graph processing
GraphChi big graph processingGraphChi big graph processing
GraphChi big graph processinghuguk
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Jen Aman
 
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defenseLarge-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defenseAapo Kyrölä
 
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...MLconf
 
Ashfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataAshfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataMLconf
 
Comparative Analysis of Algorithms for Single Source Shortest Path Problem
Comparative Analysis of Algorithms for Single Source Shortest Path ProblemComparative Analysis of Algorithms for Single Source Shortest Path Problem
Comparative Analysis of Algorithms for Single Source Shortest Path ProblemCSCJournals
 
Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisDavid Gleich
 
Graph x pregel
Graph x pregelGraph x pregel
Graph x pregelSigmoid
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCAapo Kyrölä
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Christopher Morris
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...MLconf
 
Recent Advances in Kernel-Based Graph Classification
Recent Advances in Kernel-Based Graph ClassificationRecent Advances in Kernel-Based Graph Classification
Recent Advances in Kernel-Based Graph ClassificationChristopher Morris
 
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs Christopher Morris
 
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...Waqas Nawaz
 
Introduction of "TrailBlazer" algorithm
Introduction of "TrailBlazer" algorithmIntroduction of "TrailBlazer" algorithm
Introduction of "TrailBlazer" algorithmKatsuki Ohto
 
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement LearningSafe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learningmooopan
 
強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷Eiji Sekiya
 

What's hot (20)

GraphChi big graph processing
GraphChi big graph processingGraphChi big graph processing
GraphChi big graph processing
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
 
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defenseLarge-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
 
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
 
Lic may17
Lic may17Lic may17
Lic may17
 
Ashfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataAshfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, Pepperdata
 
Comparative Analysis of Algorithms for Single Source Shortest Path Problem
Comparative Analysis of Algorithms for Single Source Shortest Path ProblemComparative Analysis of Algorithms for Single Source Shortest Path Problem
Comparative Analysis of Algorithms for Single Source Shortest Path Problem
 
Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Graph x pregel
Graph x pregelGraph x pregel
Graph x pregel
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PC
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
Encoding survey
Encoding surveyEncoding survey
Encoding survey
 
Recent Advances in Kernel-Based Graph Classification
Recent Advances in Kernel-Based Graph ClassificationRecent Advances in Kernel-Based Graph Classification
Recent Advances in Kernel-Based Graph Classification
 
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
 
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
 
Scalable Link Discovery for Modern Data-Driven Applications
Scalable Link Discovery for Modern Data-Driven ApplicationsScalable Link Discovery for Modern Data-Driven Applications
Scalable Link Discovery for Modern Data-Driven Applications
 
Introduction of "TrailBlazer" algorithm
Introduction of "TrailBlazer" algorithmIntroduction of "TrailBlazer" algorithm
Introduction of "TrailBlazer" algorithm
 
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement LearningSafe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learning
 
強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷
 

Similar to High-Performance Graph Analysis and Modeling

Representation Learning in Large Attributed Graphs
Representation Learning in Large Attributed GraphsRepresentation Learning in Large Attributed Graphs
Representation Learning in Large Attributed GraphsNesreen K. Ahmed
 
Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphsStanka Dalekova
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processingjins0618
 
2009 - Node XL v.84+ - Social Media Network Visualization Tools For Excel 2007
2009 - Node XL v.84+ - Social Media Network Visualization Tools For Excel 20072009 - Node XL v.84+ - Social Media Network Visualization Tools For Excel 2007
2009 - Node XL v.84+ - Social Media Network Visualization Tools For Excel 2007Marc Smith
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)Ankur Dave
 
Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks
Leveraging Multiple GPUs and CPUs for  Graphlet Counting in Large Networks Leveraging Multiple GPUs and CPUs for  Graphlet Counting in Large Networks
Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks Ryan Rossi
 
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim HunterWeb-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim HunterDatabricks
 
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会Eiji Sekiya
 
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013Amazon Web Services
 
Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013MLconf
 
PGQL: A Language for Graphs
PGQL: A Language for GraphsPGQL: A Language for Graphs
PGQL: A Language for GraphsJean Ihm
 
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsDiscovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsWee Hyong Tok
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADtab0ris_1
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernelsivaderivader
 
Data Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZoneData Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZoneDoug Needham
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXBenjamin Bengfort
 

Similar to High-Performance Graph Analysis and Modeling (20)

Representation Learning in Large Attributed Graphs
Representation Learning in Large Attributed GraphsRepresentation Learning in Large Attributed Graphs
Representation Learning in Large Attributed Graphs
 
Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphs
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
2009 - Node XL v.84+ - Social Media Network Visualization Tools For Excel 2007
2009 - Node XL v.84+ - Social Media Network Visualization Tools For Excel 20072009 - Node XL v.84+ - Social Media Network Visualization Tools For Excel 2007
2009 - Node XL v.84+ - Social Media Network Visualization Tools For Excel 2007
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
 
Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks
Leveraging Multiple GPUs and CPUs for  Graphlet Counting in Large Networks Leveraging Multiple GPUs and CPUs for  Graphlet Counting in Large Networks
Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks
 
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim HunterWeb-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
 
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
 
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
 
Portfolio
PortfolioPortfolio
Portfolio
 
Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013
 
PGQL: A Language for Graphs
PGQL: A Language for GraphsPGQL: A Language for Graphs
PGQL: A Language for Graphs
 
PointNet
PointNetPointNet
PointNet
 
3DRepo
3DRepo3DRepo
3DRepo
 
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsDiscovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
 
Data Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZoneData Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZone
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkX
 

Recently uploaded

Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxdhiyaneswaranv1
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxFinatron037
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsThinkInnovation
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 

Recently uploaded (16)

Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptx
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in Logistics
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 

High-Performance Graph Analysis and Modeling

  • 3. § This  talk  is  NOT about  graph  processing  systems   • (e.g.,  graphx,  Giraph,  …).
  • 4. § This  talk  is  NOT about  graph  processing  systems   • (e.g.,  graphx,  Giraph,  …). § Instead,  this  talk  is  about: (1) Knowledge  discovery  and  extracting  insights  from  graph  data. (2) Graph  Machine  Learning   (3) High-­‐Performance  algorithms  for  solving    (1)  and  (2)
  • 5. § Graphs  encode  dependencies/relationships  between  entities IID Relational/graph
  • 6. p(Y | X) p(Y | X, XR) IID  Classification Relational Classification
  • 7. How  to  detect  abnormal  traffic? Port  scanning DDoS Normal  Traffic Adj.  Matrix ibm.com google.com IP  src IP  dest IP  src IP  dest IP  src IP  dest
  • 8. How  to  select  k ‘best’ nodes  for  immunization? 34 33 25 26 27 28 29 30 31 32 22 21 20 19 18 17 23 24 12 13 14 15 16 1 9 10 11 3 4 5 6 7 8 2 Ebola  virus  epidemic  (costs  9,000+  lives,  potentially  32+  bn) SARS  (costs  700+  lives;  $40+  bn)
  • 9. -­‐ -­‐ -­‐ -­‐ -­‐ Social  network   Human  Disease  Network   [Barabasi 2007] Food  Web  [2007] Terrorist  Network [Krebs  2002]Internet  (AS)  [2005] Gene  Regulatory  Network   [Decourty 2008] Protein  Interactions   [breast  cancer] Political  blogs Power  grid
  • 10. New  Insights Knowledge Reports Data Graph Cleaning   Selection Processing Modeling   Ranking   Querying
  • 11. New  Insights Knowledge Reports Data Graph Cleaning   Selection Processing Modeling   Ranking   Querying Observation  1:  Graphs  are  never  given/observed Graphs  are  usually  constructed/inferred  from  input  data
  • 12. How  to  construct/infer  the  graph  from  input  data? Data Graph
  • 13. Social  Network Relationship  may  represent: -­ Friendship -­ Email/IM/Communication -­ Co-­location -­ Re-­tweet -­ Tagging Biological  Network  /   Chemoinformatics Relationship  may  represent: -­ Protein  Interaction -­ Chemical  bonds  between   Atoms
  • 14. Infrastructure  Network e.g.  Power  Grid Web/Information  Network
  • 15. Xu  et.  al,  Frontiers  in  behavioral  neuroscience,  2015 Brain  Functional  Connectivity  Network
  • 16. New  Insights Knowledge Reports Data Graph XGraph  Representation What’s  a  node? Attributes?  Types? What’s  an  edge? Directed?  Undirected? Time-­evolving?  Dynamic?
  • 17. Observation  2:  Graph  Data  Management  is  challenging
  • 18. GPU$ CPU$ Core Core Core Core System' Memory'' Memory' (GPU) GPU$ CPU$ Core Core Core Core System' Memory'' Memory' (GPU) Node Interconnect v1 ! v4 ! v7 !v10 ! v2 ! v5 ! v8 !v11 ! v3 !v6 ! v9 ! v12 ! GPU$ CPU$ Core Core Core Core System' Memory'' Memory' (GPU) v1 ! v2 ! v3 ! v4 ! v5 ! v6 ! v7 ! v8 ! v9 !v10 !v11 !v12 ! …"" vn ! t-p t-1 t ⋯ ⋯ ⋯ ⋯ Large  data Attributed Dynamic Heterogeneous
  • 19. GPU$ CPU$ Core Core Core Core System' Memory'' Memory' (GPU) GPU$ CPU$ Core Core Core Core System' Memory'' Memory' (GPU) Node Interconnect v1 ! v4 ! v7 !v10 ! v2 ! v5 ! v8 !v11 ! v3 !v6 ! v9 ! v12 ! GPU$ CPU$ Core Core Core Core System' Memory'' Memory' (GPU) v1 ! v2 ! v3 ! v4 ! v5 ! v6 ! v7 ! v8 ! v9 !v10 !v11 !v12 ! …"" vn ! t-p t-1 t ⋯ ⋯ ⋯ ⋯ Large  data Attributed Dynamic Heterogeneous Graph Mining & ML Machine  learning/Data  mining + Statistics,  Graph  theory/algorithms  
  • 22. How  to  extract  insights  from  data  represented  as  a  graph? New  Insights Knowledge Reports Graph  Representation
  • 23. How  to  extract  insights  from  data  represented  as  a  graph? New  Insights Knowledge Reports Graph  Representation (1) Graph  Decomposition (1) Unsupervised  Representation  Learning  
  • 24. Network  Motifs:  Simple  Building  Blocks  of  Complex  Networks  – [Milo  et.  al  – Science  2002] The  Structure  and  Function  of  Complex  Networks  – [Newman  – Siam  Review  2003] 2-­node   Graphlets 3-­node   Graphlets 4-­node   Graphlets Connected   Disconnected
  • 25. Ex:  Given  an  input  graph  G -­‐ How  many  triangles  in  G? -­‐ How  many  cliques  of  size  4-­‐nodes  in  G? -­‐ How  many  cycles  of  size  4-­‐nodes  in  G? à In  practice,  we  would  like  to  count  all  k-­‐vertex  graphlets
  • 26. Ranking  by  graphlet counts Nodes  are  colored/weighted   by  triangle  counts Links  are  colored/weighted   by  stars  of  size  4  nodes Leukemia Colon   cancer Deafness
  • 27. § Enumerate  all  possible  graphlets à Exhaustive  enumeration  is  too  expensive   § Count  graphlets for  each  node  – and  combine  all  node  counts à Still  expensive  for  relatively  large  k   [Shervashidze et.  al  – AISTAT  2009]   § Other  recent  work  counts  only  connected  graphlets of  size  k=4 [Marcus  &  Shavitt – Computer  Networks  2012]   Not  practical  – scales  only  for  small  graphs  with  few   hundred/thousand  nodes/edges -­‐ taking  2400  secs for  a  graph  with  26K  nodes
  • 28. ± 1  edge Graphlet Transition  Diagram  
  • 29. ± 1  edge Count  Cliques  &  Cycles  ONLY Use  relationships  &  transitions   to  count  all  other  graphlets in  constant  time 4-­‐Cliques 4-­‐Cycles Maximum  no.  triangles   Incident  to  an  edge Maximum  no.  stars Incident  to  an  edge Graphlet Transition  Diagram  
  • 30. T T Relationship  between  4-­‐cliques  &  4-­‐ChordalCycles 4-­‐Cliques 4-­‐ChordalCycle e T T e No.  4-­‐ChordalCycles No.    4-­‐Cliques Proof  in  Lemma  1  -­ Ahmed  et  al.,  ICDM  2015
  • 31. T T Relationship  between  4-­‐cliques  &  4-­‐ChordalCycles T T No.  4-­‐ChordalCycles No.    4-­‐Cliques 4-­‐Cliques 4-­‐ChordalCycle e e Proof  in  Lemma  1  -­ Ahmed  et  al.,  ICDM  2015
  • 32. 1 2 4 8 12 16 0 2 4 6 8 10 12 14 16 Number of processing units Speedup socfb−MIT bio−dmela soc−gowalla tech−RL−caida web−wikipedia09 1 2 4 8 12 16 0 2 4 6 8 10 12 14 16 Number of processing units Speedup Strong  scaling  results Using  Intel  Xeon  E5-­‐2687W  server,  16  cores Motif  Counting
  • 33. How  to  extract  insights  from  data  represented  as  a  graph? New  Insights Knowledge Reports Graph  Representation (1) Graph  Decomposition (1) Unsupervised  Representation  Learning  
  • 34. input 0 … 1 … 0 … Feature   Engineering features 1 … 1 … 0 0 1 0 0 Learning   AlgorithmModel Prediction  Task Link  prediction Classification   Anomaly  detection
  • 35. input 0 … 1 … 0 … Feature   Engineering features 1 … 1 … 0 0 1 0 0 Learning   AlgorithmModel Prediction  Task Automatic   Feature  Learning Link  prediction Classification   Anomaly  detection
  • 36. § Goal:  Learn  representation  (features)  for  a  set  of  graph   elements  (nodes,  edges,  etc.) § Key  intuition:  Map  the  graph  elements  (e.g.,  nodes)  to  the   d-­‐dimension  space,  while  preserving  node  similarity § Use  the  features  for  any  downstream  prediction  task
  • 37. Communities:  cohesive  subsets  of  nodes Roles:  represent  structural  patterns -­‐ two  nodes  belong  to  the  same  role  if  they’ve  similar  structural  patterns Cj# Ci# Ck# Rossi  &  Ahmed  TKDE  2015 Ahmed  et  al.  AAAI  2017
  • 38. Goal:  Find  a  mapping  of  nodes  to  d-­‐dimensions  that  preserves   proximity  and  node  similarity Using  structure  +  attributes  (if  any)
  • 40. A  (conditional)  attributed  walk  is  a  finite  sequence  of  adjacent   node  types  (words)  in  the  graph Ahmed  et.  al  2017
  • 41. Deepwalk (DW)  – Perrozi et  al.  KDD  2014 node2vec    (N2V)  – Grover  et  al.  KDD  2016 LINE:  Tang  et  al.  – WWW  2015 Link  Prediction
  • 42. Observation  3:  Useful  insights  and  accurate  modeling   depend  on  the  data  representation
  • 44. § Open  data  repository  with  interactive  visual  analytics  &   exploration § Largest  with  500+  graphs,  over  20+  collections   § Community-­‐oriented   • discuss,  post  data,  comments,  vis,  etc. AAAI’15 NetworkRepository.com
  • 46. Observation  3:  Useful  insights  and  accurate  modeling   depend  on  the  data  representation Observation  2:  Graph  Data  Management  is  challenging Observation  1:  Graphs  are  never  given/observe Graphs  are  usually  constructed/inferred  from  input  data
  • 47. § Efficient  estimation  of  word  representations  in  vector  space.  ICLR  2013  [Mikolov et.  al] § A  Framework  for  Generalizing  Graph-­‐based  Representation  Learning  Methods.  arXiv:1709.04596    2017  [Ahmed  et.  al] § Role  Discovery  in  Networks.  TKDE  2015  [Rossi  &  Ahmed] § A  Higher-­‐order  Latent  Space  Network  Model.  AAAI  2017  [Ahmed,  Rossi,  Willke,  Zhou] § node2vec:  Scalable  Feature  Learning  for  Networks.  KDD  2016  [Grover,  Leskovec] § DeepWalk:  online  learning  of  social  representations.  KDD  2014  [Perozzi,  Al-­‐Rafou,  Skiena] § Efficient  Graphlet Counting  for  Large  Networks.  ICDM  2015,  [Ahmed  et  al.] § Graphlet Decomposition:  Framework,  Algorithms,  and  Applications.  J.  Know.  &  Info.  2016  [Ahmed  et  al.] § Network  Motifs:  Simple  Building  Blocks  of  Complex  Networks.  Science  2002,  [Milo  et  al.] § Uncovering  Biological  Network  Function  via  Graphlet Degree  Signatures.  Cancer  Informatics  2008  [Milenković-­‐Pržulj] § Graph  Kernels.  JMLR  2010,  [Vishwanathan et  al.] § The  Structure   and  Function  of  Complex  Networks.  SIAM  Review  2003,  [Newman] § Biological  network  comparison  using  graphlet degree  distribution.  Bioinformatics  2007  [Pržulj] § Efficient  Graphlet Kernels  for  Large  Graph  Comparison.  AISTAT  2009  [Shervashidze et  al.] § Local  structure   in  social  networks.  Sociological  methodology  1976,  [Holland-­‐Leinhardt] § The  strength   of  weak  ties:  A  network  theory  revisited.  Sociological  theory 1983  [Granovetter]