Parallel Community Detection for Massive Graphs

Parallel Community Detection for Massive
Graphs
E. Jason Riedy, Henning Meyerhenke, David Ediger, and
David A. Bader

13 September 2011

Main contributions

• First massively parallel algorithm for community detection.
• Very general algorithm using a weighted matching and applying
to many optimization criteria.
• Partition a 122 million vertex, 1.99 billion edge graph into
communities in 2 hours.

PPAM11—Parallel Community Detection—Jason Riedy 2/25

Exascale data analysis

Health care Finding outbreaks, population epidemiology
Social networks Advertising, searching, grouping
Intelligence Decisions at scale, regulating algorithms
Systems biology Understanding interactions, drug design
Power grid Disruptions, conservation
Simulation Discrete events, cracking meshes

• Graph clustering is common in all application areas.


These are not easy graphs.
Yifan Hu’s (AT&T) visualization of the Livejournal data set


But no shortage of structure...

Protein interactions, Giot et al., “A Protein
Interaction Map of Drosophila melanogaster”,
Jason’s network via LinkedIn Labs
Science 302, 1722-1736, 2003.

• Locally, there are clusters or communities.
• First pass over a massive social graph:
• Find smaller communities of interest.
• Analyze / visualize top-ranked communities.
• Our part: First massively parallel community detection method.


Outline

Motivation

Deﬁning community detection and metrics

Our parallel method

Results
Implementation and platform details
Performance

Conclusions and plans


Community detection

What do we mean?
• Partition a graph’s
vertices into disjoint
communities.
• A community locally
maximizes some metric.
• Modularity,
conductance, ...
• Trying to capture that
vertices are more similar
within one community
than between
communities. Jason’s network via LinkedIn Labs


Community detection

Assumptions
• Disjoint partitioning of
vertices.
• There is no one unique
answer.
• Some metrics are
NP-complete to opti-
mize [Brandes, et al.].
• Graph is lossy
representation.
• Want an adaptable
detection method.
Jason’s network via LinkedIn Labs


Common community metric: Modularity

• Modularity: Deviation of connectivity in the community
induced by a vertex set S from some expected background
model of connectivity.
• We take [Newman]’s basic uniform model.
• Let m count all edges in graph G, mS count of edges with
both endpoints in S, and xS count the edges with any
endpoint in S. Modularity QS :

QS = (mS − x2 /4m)/m
S

• Total modularity: sum of each partion’s subsets’ modularity.
• A suﬃciently positive modularity implies some structure.
• Known issues: Resolution limit, NP-complete opt. prob.


Sequential agglomerative method

• A common method
(e.g. [Clauset, et al.]) agglomerates
A C vertices into communities.
• Each vertex begins in its own
B community.
D • An edge is chosen to contract.
E • Merging maximally increases
modularity.
G • Priority queue.
F
• Known often to fall into an O(n2 )
performance trap with
modularity [Wakita & Tsurumi].



• A common method
B community.
modularity.
F


New parallel method

• We use a matching to avoid the queue.
• Compute a heavy weight, large
matching.
A C • Greedy algorithm.
• Maximal matching.
• Within factor of 2 in weight.
B
• Merge all communities at once.
D
E • Maintains some balance.
• Produces diﬀerent results.
G
F • Agnostic to weighting, matching...
• Can maximize modularity, minimize
conductance.
• Modifying matching permits easy
exploration.


New parallel method

matching.
B
D
G
conductance.
exploration.


Implementation: Cray XMT
Tolerates latency by massive multithreading.
• Hardware: 128 threads per processor
• Every-cycle context switch
• Many outstanding memory requests (180/proc)
• Flexibly supports dynamic load balancing
• Globally hashed address space, no data cache
• Support for ﬁne-grained, word-level synchronization
• Full/empty bit on with every memory word

• 128 processor XMT at
Paciﬁc Northwest Nat’l Lab
• 500 MHz processors, 16384
threads, 1 TB of shared
memory
Image: cray.com


Implementation: Data structures

Extremely basic for graph G = (V, E)
• An array of (i, j; w) weighted edge pairs, i < j, |E|
• An array to store self-edges, d(i) = w, |V |
• A temporary ﬂoating-point array for scores, |E|
• Two additional temporary arrays (|V |, |E|) to store degrees,
matching choices, linked list for compression, ...

• Weights count number of agglomerated vertices or edges.
• Scoring methods need only vertex-local counts.
• Greedy matching parallelizes trivially over the edge array.
• Contraction compacts the edge list in place through a
temporary, hashed linked list of duplicates.
• Full-empty bits or compare&swap keep correctness.


Data and experiment

• Generated R-MAT graphs [Chakrabarti, et al.] (a = 0.55,
b = c = 0.1, and d = 0.25) and extracted the largest
component.
Scale Fact. |V | |E| Avg. degree Edge group
18 8 236 605 2 009 752 8.5 2M
16 252 427 3 936 239 15.6 4M
32 259 372 7 605 572 29.3 8M
19 8 467 993 3 480 977 7.4 4M
16 502 152 7 369 885 14.7 8M
32 517 452 14 853 837 28.7 16M
• Three runs with each parameter setting.
• Tested both modularity (CNM for Clauset-Newman-Moore)
and McCloskey-Bader’s ﬁltered modularity (MB).


Performance: Time

CNM MB

210
1321.1s 1316.6s

464.4s 460.8s
28

Edge group
144.6s 144.7s q 2M
Time (s)

26 4M
q q

8M
51.4s 19.8s 51.9s 19.7s
q q 16M

24 9.8s 10.0s
q q

q q
q
q
q
q q
q
22 q
4.3s q
q
4.5s
q
q
q
q
q q q q
q q
q
q q q q q
2.7s 2.7s

1 2 4 8 16 32 64 128 1 2 4 8 16 32 64 128

Number of processors


Performance: Scaling

CNM MB

128

16M: 66.8x at 128P 16M: 66.7x at 128P
64
8M: 48.6x at 96P 8M: 48.0x at 96P
4M: 33.4x at 48P 4M: 32.5x at 48P
32

2M: 19.2x at 32P 2M: 19.4x at 48P Edge group
q
q q q q q
q q
q q q
q q
16 q q 2M
Speedup

q
q q q q
q q
q
q
q
q 4M
q

8 q q
8M
q q
16M

4 q q
q

2 q
q q
q

1 q q

1 2 4 8 16 32 64 128 1 2 4 8 16 32 64 128

Number of processors


Performance: Modularity

(18, 8): 0.423 (18,16): 0.425
• SNAP: GT’s sequential
(18,32): 0.418
0.4
implementation.
• Plain modularity (CNM)
q (18, 8): 0.319
0.3
known for too-large
q (18, 8): 0.240 communities (resolution
Modularity

Implementation

0.2 q Parallel limit).
(18,16): 0.187 q SNAP
(18,16): 0.171 q
• MB ﬁlters out statistically
0.1 q (18,32): 0.104 negligible merges, smaller
q (18,32): 0.084
communities & modularity.
(18,16): 0.013
(18, 8): 0.023
(18,32): 0.009
• Our more balanced, parallel
CNM MB algorithm falls between the
Scoring method
two?


Performance: Large and real-world

Large R-MAT Scale 27, edge factor 16 generated a component with
122 million vertices and 1.99 billion edges.
7258 seconds ≈ 2 hours on 128P
R-MAT known not to have much community
structure...
LiveJournal 4.8 million vertices, 68 million edges
2779 seconds ≈ 47 minutes on 128P
Long tail of around 1 000 matched pairs per iteration.
Tail is important to modularity and community count,
but to users?


Conclusions and plans

• First massively parallel algorithm based on matching for
agglomerative community detection.
• Non-deterministic algorithm, still studying impact.
• Now can easily experiment with agglomerative community
detection on real-world graphs.
• How volatile are modularity and conductance to perturbations?
• What matching schemes work well?
• How do diﬀerent metrics compare in applications?
• Experimenting on OpenMP platforms.
• CSR format is more cache-friendly, but uses more memory.
• Extending to streaming graph data!
• Code available on request, to be integrated into GT packages
like GraphCT.


And if you can do better...
Then please do!
10th DIMACS Implementation Challenge — Graph Partitioning
and Graph Clustering
• 12-13 Feb, 2012 in Atlanta,
http://www.cc.gatech.edu/dimacs10/
• Paper deadline: 21 October, 2011
http://www.cc.gatech.edu/dimacs10/data/call-for-papers.pdf
• Challenge problem statement and evaluation rules:
http://www.cc.gatech.edu/dimacs10/data/dimacs10-rules.pdf
• Co-sponsored by DIMACS, by the Command, Control, and Interoperability
Center for Advanced Data Analysis (CCICADA); Paciﬁc Northwest
National Lab.; Sandia National Lab.; and Deutsche
Forschungsgemeinschaft (DFG).


Acknowledgment of support


Bibliography I

D. Bader and J. McCloskey.
Modularity and graph algorithms.
Presented at UMBC, Sept. 2009.
J. Berry., B. Hendrickson, R. LaViolette, and C. Phillips.
Tolerating the community detection resolution limit with edge
weighting.
CoRR, abs/0903.1072, 2009.
U. Brandes, D. Delling, M. Gaertler, R. G¨rke, M. Hoefer,
o
Z. Nikoloski, and D. Wagner.
On modularity clustering.
IEEE Trans. Knowledge and Data Engineering, 20(2):172–188,
2008.


Bibliography II

D. Chakrabarti, Y. Zhan, and C. Faloutsos.
R-MAT: A recursive model for graph mining.
In Proc. 4th SIAM Intl. Conf. on Data Mining (SDM), Orlando,
FL, Apr. 2004. SIAM.
A. Clauset, M. Newman, and C. Moore.
Finding community structure in very large networks.
Physical Review E, 70(6):66111, 2004.
M. Newman.
Modularity and community structure in networks.
Proc. of the National Academy of Sciences, 103(23):8577–8582,
2006.
K. Wakita and T. Tsurumi.
Finding community structure in mega-scale social networks.
CoRR, abs/cs/0702048, 2007.


Parallel Community Detection for Massive Graphs

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Mehr von Jason Riedy

Mehr von Jason Riedy (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Parallel Community Detection for Massive Graphs