3. § This talk is NOT about graph processing systems
• (e.g., graphx, Giraph, …).
4. § This talk is NOT about graph processing systems
• (e.g., graphx, Giraph, …).
§ Instead, this talk is about:
(1) Knowledge discovery and extracting insights from graph data.
(2) Graph Machine Learning
(3) High-‐Performance algorithms for solving (1) and (2)
5. § Graphs encode dependencies/relationships between entities
IID Relational/graph
11. New Insights
Knowledge
Reports
Data Graph
Cleaning
Selection
Processing
Modeling
Ranking
Querying
Observation 1: Graphs are never given/observed
Graphs are usually constructed/inferred from input data
22. How to extract insights from data represented as a graph?
New Insights
Knowledge
Reports
Graph Representation
23. How to extract insights from data represented as a graph?
New Insights
Knowledge
Reports
Graph Representation
(1) Graph Decomposition
(1) Unsupervised Representation Learning
24. Network Motifs: Simple Building Blocks of Complex Networks – [Milo et. al – Science 2002]
The Structure and Function of Complex Networks – [Newman – Siam Review 2003]
2-node
Graphlets
3-node
Graphlets
4-node
Graphlets
Connected
Disconnected
25. Ex: Given an input graph G
-‐ How many triangles in G?
-‐ How many cliques of size 4-‐nodes in G?
-‐ How many cycles of size 4-‐nodes in G?
à In practice, we would like to count all k-‐vertex graphlets
26. Ranking by graphlet counts
Nodes are colored/weighted
by triangle counts
Links are colored/weighted
by stars of size 4 nodes
Leukemia
Colon
cancer
Deafness
27. § Enumerate all possible graphlets
à Exhaustive enumeration is too expensive
§ Count graphlets for each node – and combine all node counts
à Still expensive for relatively large k [Shervashidze et. al – AISTAT 2009]
§ Other recent work counts only connected graphlets of size k=4
[Marcus & Shavitt – Computer Networks 2012]
Not practical – scales only for small graphs with few
hundred/thousand nodes/edges
-‐ taking 2400 secs for a graph with 26K nodes
29. ± 1 edge
Count Cliques & Cycles ONLY
Use relationships & transitions
to count all other graphlets in constant time
4-‐Cliques
4-‐Cycles
Maximum no. triangles
Incident to an edge
Maximum no. stars
Incident to an edge
Graphlet Transition Diagram
30. T T
Relationship between 4-‐cliques & 4-‐ChordalCycles
4-‐Cliques 4-‐ChordalCycle
e
T T
e
No. 4-‐ChordalCycles No. 4-‐Cliques
Proof in Lemma 1 - Ahmed et al., ICDM 2015
31. T T
Relationship between 4-‐cliques & 4-‐ChordalCycles
T T
No. 4-‐ChordalCycles No. 4-‐Cliques
4-‐Cliques 4-‐ChordalCycle
e e
Proof in Lemma 1 - Ahmed et al., ICDM 2015
32. 1 2 4 8 12 16
0
2
4
6
8
10
12
14
16
Number of processing units
Speedup
socfb−MIT
bio−dmela
soc−gowalla
tech−RL−caida
web−wikipedia09
1 2 4 8 12 16
0
2
4
6
8
10
12
14
16
Number of processing units
Speedup
Strong scaling results
Using Intel Xeon E5-‐2687W server, 16 cores
Motif Counting
33. How to extract insights from data represented as a graph?
New Insights
Knowledge
Reports
Graph Representation
(1) Graph Decomposition
(1) Unsupervised Representation Learning
36. § Goal: Learn representation (features) for a set of graph
elements (nodes, edges, etc.)
§ Key intuition: Map the graph elements (e.g., nodes) to the
d-‐dimension space, while preserving node similarity
§ Use the features for any downstream prediction task
37. Communities: cohesive subsets of nodes
Roles: represent structural patterns
-‐ two nodes belong to the same role if they’ve similar structural patterns
Cj#
Ci#
Ck#
Rossi
&
Ahmed
TKDE
2015
Ahmed
et
al.
AAAI
2017
38. Goal: Find a mapping of nodes to d-‐dimensions that preserves
proximity and node similarity
Using structure + attributes (if any)
44. § Open data repository with interactive visual analytics &
exploration
§ Largest with 500+ graphs, over 20+ collections
§ Community-‐oriented
• discuss, post data, comments, vis, etc.
AAAI’15
NetworkRepository.com
46. Observation 3: Useful insights and accurate modeling
depend on the data representation
Observation 2: Graph Data Management is challenging
Observation 1: Graphs are never given/observe
Graphs are usually constructed/inferred from input data
47. § Efficient estimation of word representations in vector space. ICLR 2013 [Mikolov et. al]
§ A Framework for Generalizing Graph-‐based Representation Learning Methods. arXiv:1709.04596 2017 [Ahmed et. al]
§ Role Discovery in Networks. TKDE 2015 [Rossi & Ahmed]
§ A Higher-‐order Latent Space Network Model. AAAI 2017 [Ahmed, Rossi, Willke, Zhou]
§ node2vec: Scalable Feature Learning for Networks. KDD 2016 [Grover, Leskovec]
§ DeepWalk: online learning of social representations. KDD 2014 [Perozzi, Al-‐Rafou, Skiena]
§ Efficient Graphlet Counting for Large Networks. ICDM 2015, [Ahmed et al.]
§ Graphlet Decomposition: Framework, Algorithms, and Applications. J. Know. & Info. 2016 [Ahmed et al.]
§ Network Motifs: Simple Building Blocks of Complex Networks. Science 2002, [Milo et al.]
§ Uncovering Biological Network Function via Graphlet Degree Signatures. Cancer Informatics 2008 [Milenković-‐Pržulj]
§ Graph Kernels. JMLR 2010, [Vishwanathan et al.]
§ The Structure and Function of Complex Networks. SIAM Review 2003, [Newman]
§ Biological network comparison using graphlet degree distribution. Bioinformatics 2007 [Pržulj]
§ Efficient Graphlet Kernels for Large Graph Comparison. AISTAT 2009 [Shervashidze et al.]
§ Local structure in social networks. Sociological methodology 1976, [Holland-‐Leinhardt]
§ The strength of weak ties: A network theory revisited. Sociological theory 1983 [Granovetter]