Standard vs Custom Battery Packs - Decoding the Power Play
Locally densest subgraph discovery
1. L. Qin, et al., KDD, 2015
Locally Densest Subgraph Discovery
Aftab Alam
September 20, 2017
Department of Computer Engineering, Kyung Hee University
2. Locally Densest Subgraph Discovery
Contents
Introduction
Conclusion
Performance Studies
Related Work
Algorithm Optimization
7
6
5
2
1
4
3 Locally Densest Subgraph
A Polynomial Algorithm
3. Locally Densest Subgraph Discovery
Contents
Introduction
Conclusion
Performance Studies
Related Work
Algorithm Optimization
7
6
5
2
1
4
3 Locally Densest Subgraph
A Polynomial Algorithm
4. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Introduction
• Many interactions can be represented as graphs
– Webgraphs:
o search engine, etc.
– Social networks:
o mine user communities, viral marketing
– Email exchanges:
o security. virus spread, spam detection
– Market basket data:
o customer profiles, targeted advertising
– Netflow graphs
o (which IPs talk to each other):
o traffic patterns, security, worm attacks
Large Graphs
5. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Introduction
Graph System Structure
Data Environments
Static, Streaming, Dynamic Graph, Probabilistic, Spatial, Evolving Graph, Random Graph
Computing Models
Main-memory, Distributed/Cloud/MapReduce/BSP/Spark/Pregel,
SSD, Parallel/Multi-core, External/Semi-External
Advanced Applications
Social Network (Twitter, Facebook), Geo Social (Checkin), Chemical, Biological,
Web Graph (Wiki), Collaboration (DBLP), Public Opinion Mining
Query Primitives
• Given a Graph Pattern:
Similarity, Pattern, Sub/Super Graph
• Given a Set of Nodes:
Topology: SimRank, Connectivity, Path
K-hop, Flow, Community, Reachability
• Given a Set of Keywords:
Knowledge Graph, Attributed Graph,
RDF
Mining Primitives
• Subgraph Based:
Cohesive Subgraph Mining
Community Detection
Graph Clustering, Partition
Frequent Subgraph Mining
Dense Subgraph Mining
• Aggregate Based:
PageRank, Outlier, Anonymity
Influence Maximization
Primitive Computing Paradigms
Joins, BFS, DFS, Topological Sort, Spanning Tree, Diameter
6. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Introduction
Dense Subgraph Mining
For a subgraph g:
Density = (#Edges) / (#Nodes)
7. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Introduction
• Mining dense subgraphs from large graphs is
– a fundamental graph mining task in many application domains
• Applications
– Network Science
o Cohesive Group / Community Discovery
– Biology
o Regulatory Motif identification
o Pattern Discovery in Gene Annotation Graph
– Graph Database
o Index Construction for Shortest path and Reachability queries
– Web mining
o Link Spam Detection
– graph compression
– Etc.…
Dense Subgraph Mining > Applications
8. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Introduction
• Densest subgraph computation
– widely used in many graph mining tasks
• In many applications
– finding one dense subgraph is usually not sufficient.
– E.g. such as community detection
• top-k subgraphs
– to represent different dense regions of the graph
Dense Subgraph Mining >Top-k Subgraphs
9. Locally Densest Subgraph Discovery
Contents
Introduction
Conclusion
Performance Studies
Related Work
Algorithm Optimization
7
6
5
2
1
4
3 Locally Densest Subgraph
A Polynomial Algorithm
10. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Dense Subgraph Mining
• Dense subgraph mining problem
– aims is to identify the subgraphs from a large graph
– with high density (i.e., #.edge / #.node)
• Existing studies
– Focus on finding the densest subgraph
o The subgraph with the highest density [1, 2]
o Identifying an optimal clique-like dense subgraph [3]
o To find top-k dense subgraphs,
a simple greedy procedure
which iteratively invokes the same algorithm k times in the residual graph
after deleting the identified dense subgraphs in the previous iterations.
Related Work
1. A. V. Goldberg. Finding a maximum density subgraph. Technical report, University of California at Berkeley, 1984.
2. Y. Asahiro, K. Iwama, H. Tamaki, and T. Tokuyama. Greedily finding a dense subgraph. J. Algorithms, 34(2), 2000.
3. C. E. Tsourakakis, et. al. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. KDD, 2013.
11. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Dense Subgraph Mining
• A Greedy approach has several drawbacks:
1. The top-k results may not fully reflect the top-k densest regions of a graph.
o If the graph contains a very large dense region, subgraphs in other dense regions may have
low chance of appearing in the top-k results.
2. A subgraph returned by the greedy approach can be partial and
o subsumed by a better subgraph.
o This makes it hard to characterize each result.
3. A greedy approach does not provide a formal definition of a result.
o A formal definition is important for graph mining tasks,
o because without a formal definition,
it is not clear how to analyze each result.
4. Subgraphs identified by the greedy approach
o Only provide density information
o Hard to find other structural properties of each graph.
Related Work > Problems
12. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Dense Subgraph Mining
• Subgraph Graph G
Related Work > Problems > Example
A real coauthor Subgraph G (citation network, BDLP)
13. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Dense Subgraph Mining
• Subgraph Graph G
– IR= Information Retrieval
Related Work > Problems > Example
A real coauthor Subgraph G (citation network, BDLP)
14. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Dense Subgraph Mining
• Subgraph Graph G
– IR = Information Retrieval
– BN = Bayesian Networks
Related Work > Problems > Example
A real coauthor Subgraph G (citation network, BDLP)
15. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Dense Subgraph Mining
• Subgraph Graph G
– IR = Information Retrieval
– BN = Bayesian Networks
Related Work > Problems > Example
A real coauthor Subgraph G (citation network, BDLP)
16. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Dense Subgraph Mining
• Subgraph Graph G
– IR = Information Retrieval
– BN = Bayesian Networks
Related Work > Problems > Example
A real coauthor Subgraph G (citation network, BDLP)
17. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Dense Subgraph Mining
• Lets use greedy approach [1], or optimal quasi-clique model [2] (top-2 dense regions)
• The result will be G’IR and G*IR
• Which don’t fully reflect the top-2 representative dense regions of the graph
• Both are located in the same dense region
• G*IR and G*BN are good top-2 representative instead
Related Work > Problems > Example
A real coauthor Subgraph G (citation network, BDLP)
18. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Preliminaries
• G = (V (G), E(G))
– n = |V (G)| nodes and
– m = |E(G)| edge
• For each node u ∈ V (G)
• neighbor set of u in G by N(u, G)
• degree of node u: d(u, G)
• induced subgraph of G
• g = (V (g), E(g))
– if and only if V(g) ⊆ V(G) and E(g) is the induced edge set
• Density of (G):
19. Locally Densest Subgraph Discovery
Contents
Introduction
Conclusion
Performance Studies
Related Work
Algorithm Optimization
7
6
5
2
1
4
3 Locally Densest Subgraph
A Polynomial Algorithm
20. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Locally Densest Subgraph (LDS)
• Densest subgraph computation
– widely used in many graph mining tasks
Greed is Not Good
• In many applications
– finding one dense subgraph is usually not sufficient.
– E.g. community detection
• Top-k subgraphs
– to represent different dense regions of the graph
21. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Locally Densest Subgraph (LDS)
• Each identified subgraph should be
– densest in its local region.
• to define a locally densest subgraph
– is to ensure that each identified subgraph is not contained in a denser subgraph.
• Such definition is not good
– because the denser subgraph may not be compact.
Dense or Compact?
22. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Locally Densest Subgraph (LDS)
• A ρ-Compact Graph is a graph:
– Iff G is connected
– the removal of any subset S of nodes
o results in removing at least ρ x |S| edges
• Example:
– G is 1-compact;
– G*BN is 2-compact
ρ-compact Graph
• If a graph G is ρ-Compact
– Then every node in G has degree at least ceil(ρ) and
– thus it is a ceil(ρ)-core subgraph.
– ρ-compact graph G has density at least ρ
– ρ‘ > ρ,
– A ρ’-compact graph is also ρ-compact graph
23. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Locally Densest Subgraph (LDS)
Maximal ρ-compact Subgraph
Maximal mean that:
it is not contained in a larger ρ-compact subgraph.
24. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
• Definition of LDS.
– Based on Definition 3.2, we can formally define as:
Locally Densest Subgraph (LDS)
LDS
• By Definition 3.3
– an LDS itself is compact
– an LDS is not contained in a better sub-graph that is more compact than itself
25. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Locally Densest Subgraph (LDS)
Example: GIR is not an LDS
Subgraph Density ρ-compact
GIR 53/8 4-cmpt
G’IR 13/6 13/6-cmpt
G*IR 4.5 4.5-cmpt
G*BN 2 2-cmpt
A real coauthor Subgraph G (citation network, BDLP)
26. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Locally Densest Subgraph (LDS)
• Subgraph GIR with density 35/8
– is 4-compact, and also a maximal 4-compact subgraph.
• Subgraph GIR is not an LDS
– Reason:
o GIR contains denser subgraph G*IR
o Because it is not a maximal 35/8-compact subgraph.
Example: GIR is not an LDS
Subgraph Density ρ-compact
GIR 53/8 4-cmpt
G’IR 13/6 13/6-cmpt
G*IR 4.5 4.5-cmpt
G*BN 2 2-cmpt
A real coauthor Subgraph G (citation network, BDLP)
27. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Locally Densest Subgraph (LDS)
• Subgraph G’IR
– with density(13/6) is a 13/6-compact subgraph.
• Subgraph G’IR is not an LDS
– Reason:
o consider the density of the super-graph
o It is contained in a better subgraph GIR which is 4-
compact
Example: Subgraph G’IR is not an LDS
A real coauthor Subgraph G (citation network, BDLP)
Subgraph Density ρ-compact
GIR 53/8 4-cmpt
G’IR 13/6 13/6-cmpt
G*IR 4.5 4.5-cmpt
G*BN 2 2-cmpt
28. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Locally Densest Subgraph (LDS)
• G*IR with density 4.5
• G*IR is an LDS
– Reason
o because it is a maximal 4.5-compact subgraphs
Example: G*IR and G*BN are LDS
A real coauthor Subgraph G (citation network, BDLP)
Subgraph Density ρ-compact
GIR 53/8 4-cmpt
G’IR 13/6 13/6-cmpt
G*IR 4.5 4.5-cmpt
G*BN 2 2-cmpt
29. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Locally Densest Subgraph (LDS)
• G*BN with density 2 respectively
• G*BN is an LDS
– Reason
o because it is a maximal 2-compact subgraphs
Example: G*IR and G*BN are LDS
A real coauthor Subgraph G (citation network, BDLP)
Subgraph Density ρ-compact
GIR 53/8 4-cmpt
G’IR 13/6 13/6-cmpt
G*IR 4.5 4.5-cmpt
G*BN 2 2-cmpt
30. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Locally Densest Subgraph (LDS)
• Summary: An LDS
– Is not contained in a more compact subgraph
– Does not contain a denser subgraph
• Definition of LDS: A maximal ρ-compact subgraph with density r
Formal definition of LDS
A real coauthor Subgraph G (citation network, BDLP)
31. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Locally Densest Subgraph (LDS)
Structural Properties of LDS
(A maximal r-compact subgraph with density r)
1. Does not contain a denser subgraph
2. Is not contained in a more compact subgraph
3. Parameter free
4. Pairwise disjoint
5. Cohesive: each node has degree of at least r in LDS
6. Polynomial time computable
Problem Statement
• Given:
– a graph G & an integer k
• Compute
– the top-K LDSes with Highest Density in graph G
32. Locally Densest Subgraph Discovery
Contents
Introduction
Conclusion
Performance Studies
Related Work
Algorithm Optimization
7
6
5
2
1
4
3 Locally Densest Subgraph
A Polynomial Algorithm
33. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Polynomial Algorithm
Lemma:
Any densest subgraph component of graph G is an LDS in G.
The Basic Algorithm & LDS Verification Challenges
The Basic Algorithm
Input: Graph G, Integer K
Repeat
Find a maximal densest subgraph component g of G
If g is an LDS of G
Report g as an answer
Remove g from G
Until G is empty or k answers are reported
• Challenge 1: How to verify whether a subgraph is an LDS
• Challenge 2: How to reduce the computational cost
//Verification
The maximal densest subgraph of graph G can be computed using:
• parametric maximum flow [4]
• in O(n*m*log(n2/m)
34. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Polynomial Algorithm
Lemma:
• A sub-graph g of G is ρ-compact Iff it is a sub-graph with the maximum |E(g) - ρ|V(g)|
LDS Verification
LDS Verification for Subgraph g
1. Compute the maximal subgraph s with maximum
– |E(g) - ρ|V(g)| where ρ = density(g)
2. Check whether g is a connected component of s
Compute a subgraph s with maximum |E(s) – ρ |V(s)|
• Can be solved using maximum flow (1989) [4]
• Time Complexity: O(m*(m+n)*log(n))
Maximal: it is not contained in a larger ρ-compact subgraph.
35. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Polynomial Algorithm
Greedy Algorithm
• Compute Densest subgraph
– Line 5: Goldberg’s algorithm (1984)
o Can use any other algorithm
– Line 7: Verification procedure
• TryDensity(p1, p2)
– g is a maximal density(g)-compact
subgraph in G
– Returns: g is a connected component of
the subgraph
The Basic Algorithm & LDS Verification Challenges
• Time Complexity
– Using Goldberg’s algorithm
o O(m*n*(m + n)*log2 n)
– Using Maximum flow
o O(n2 * m * log(n2/m))
Too Costly
Need optimization
36. Locally Densest Subgraph Discovery
Contents
Introduction
Conclusion
Performance Studies
Related Work
Algorithm Optimization
7
6
5
2
1
4
3 Locally Densest Subgraph
A Polynomial Algorithm
37. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
The Optimized Algorithm
1. Pruning Invalid Nodes
2. Optimizing Densest Subgraph Computation
3. Optimizing LDS Verification
The LDS* Algorithm
38. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
The Optimized Algorithm
• to prune all the invalid nodes in a subgraph G’ of G,
• First computes
– LB(ρ(v)) & UB(ρ(v)) for each v ∈ V (GI )
• For any node v in G, we define:
– LB(ρ (v)):
o a density ρ, s.t v is in an ρ-compact
subgraph
– UB (ρ (v)) :
o If v is contained in an LDS:
Upper bound of the density of LDS
o Otherwise: Any non-negative real value
1- Pruning Invalid Nodes
A Node v is Invalid iff:
• Rule 1: UB(v) < LB(v) or
• Rule 2: v has a neighbor u with LB(u) > UB(v)
Pruning Rule:
• An invalid node is not contained in any LDS
39. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
The Optimized Algorithm
Lemma:
• The densest subgraph g of G
– is contained in the ρ-core of G,
– where ρ = density(g)
• Just need to set ρ to be a
lower bound of density(g)
Optimizing Densest Subgraph Computation
• Remove the nodes
– whose core numbers are smaller than the lower bound ρmax
,
– then compute the maximal densest subgraph in the residual graph.
40. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
The Optimized Algorithm
Lemma
• For a ρ-compact subgraph g of G, g is an LDS in G iff g is an
LDS in an ρ-core component of G
Optimizing LDS Verification
Pruning Rule
• We only need to verify
– g in the ρ-core component of G,
– Where ρ=density(g)
41. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
The Optimized Algorithm
The LDS* Algorithm
• Combining all the pruning techniques
– Derive LDS* (Optimized)
– maintains a priority queue H to compute the
top-k LDSes.
– Each entry in H is a triplet consists of:
o G = Subgraph
o ρ = priority of g
o a Boolean variable
determine g is ρ-compact
G ρ Boolean exact
42. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
The Optimized Algorithm
The LDS* Algorithm (Cont’d)
• L1: Initialize all variables (Line 1)
• L2: Call prune and return G’
• L(3-4): pushes all the connected components of G’
into H
• L(5-17): After that, the algorithm finds the top-k
LDSes in k iterations.
• L-8: In each iteration,
– it processes the popped entry from H,
– denoted by (g, ρ, exact)
• L(9-11): Call Algo 4;
– If g is a ρ-compact subgraph with density ρ
(exact is true),
– then it invokes Algorithm 5
o to verify whether it is an LDS,
o and if so, the algorithm outputs it and
continues to the next iteration
43. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
The Optimized Algorithm
The LDS* Algorithm (Cont’d)
• L(12): If exact is false
– The compute maximal DS of g using algo 4
• L (13):
– each densest subgraph component g* of g must be
density(g*)-compact
– Randomly selects 1 densest subgraph component g* of
g and pushes (g*, density(g*) true) into H
• L(14):
– the algorithm obtains the residual graph G’ by deleting
subgraph g*
• L(15-17):
– Call Algorithm 3 to prune the invalid nodes in G’
– pushes each connected component of G’ into the
priority queue H
44. Locally Densest Subgraph Discovery
Contents
Introduction
Conclusion
Performance Studies
Related Work
Algorithm Optimization
7
6
5
2
1
4
3 Locally Densest Subgraph
A Polynomial Algorithm
45. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Experiments
• Datasets
– Implemented in C++
– CPU: Intel Xeon 3.4 GHz
– RAM: 32 GB
– OS: Red Hat Linux
– Datasets
• Real Datasets
• dmax is the maximum degree of the nodes
Environment & Datasets
46. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Experiments
• QC: The state of the art top-k dense subgraph model
– (Tsourakakis et al. KDD’13)
– Denser than the densest subgraph: extracting optimal quasi-cliques with quality
Guarantees
• LDS: Apply optimizations for densest subgraph computation and LDS
verificaton
• LDS* Apply all optimizations
Algorithms
47. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Experiments
• Measures: For effectiveness testing
– Density: density(g)
– Relative Density:
– Edge Density:
– Diameter:
o Longest distance of all pairs of nodes in the graph
Measures
48. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Experiments
Density Testing
49. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Experiments
Relative Density Testing
50. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Experiments
Edge Density Testing
51. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Experiments
Diameter Testing
52. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Experiments
Efficiency Testing
53. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Experiments
• Performed a case study on the Coauthor dataset
• Aim: LDS can represents different dense regions of the whole graph
• Nodes = n, density ρ,
• Found: LDS are the best to represent local dense regions of the graph.
Case Study
54. Locally Densest Subgraph Discovery
Contents
Introduction
Conclusion
Performance Studies
Related Work
Algorithm Optimization
7
6
5
2
1
4
3 Locally Densest Subgraph
A Polynomial Algorithm
55. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Conclusion
• Proposed a new dense subgraph model
– LDS
– With some good properties
• Derive a polynomial algorithm
– To compute top-k LDSes in a graph
• Propose several optimization techniques
– To improve the efficiency of LDS algorithm
• Done experiments and performance studies
– Demonstrate the effectiveness and efficiency of the LDS
56. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
References
1. A. V. Goldberg.
– Finding a maximum density subgraph.
– Technical report, University of California at Berkeley, 1984.
2. Y. Asahiro, K. Iwama, H. Tamaki, and T. Tokuyama.
– Greedily finding a dense subgraph.
– J. Algorithms, 34(2), 2000.
3. C. E. Tsourakakis, et. al.
– Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees.
– KDD, 2013.
4. G. Gallo, M. D. Grigoriadis, and R. E. Tarjan.
– A fast parametric maximum flow algorithm and applications.
– J. Comput., 18(1), 1989.
5. https://www.youtube.com/watch?v=kND11L-oi8A