SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Downloaden Sie, um offline zu lesen
Scalable Multi-threaded Community Detection
in Social Networks
E. Jason Riedy1 , David A. Bader1 , and Henning Meyerhenke2
1
    School of Comp. Science and Engineering, Georgia Inst. of Technology
2
    Inst. of Theoretical Informatics, Karlsruhe Inst. of Technology (KIT)

                                                            25 May 2012
Exascale data analysis



             Health care Finding outbreaks, population epidemiology
       Social networks Advertising, searching, grouping
             Intelligence Decisions at scale, regulating algorithms
      Systems biology Understanding interactions, drug design
              Power grid Disruptions, conservation
              Simulation Discrete events, cracking meshes

          • Graph clustering is common in all application areas.




MTAAP 2012—Scalable Community Detection—Jason Riedy                   2/35
These are not easy graphs.
                              Yifan Hu’s (AT&T) visualization of the in-2004 data set
                             http://www2.research.att.com/~yifanhu/gallery.html




MTAAP 2012—Scalable Community Detection—Jason Riedy                                     3/35
But no shortage of structure...




          Protein interactions, Giot et al., “A Protein
          Interaction Map of Drosophila melanogaster”,
                                                          Jason’s network via LinkedIn Labs
          Science 302, 1722-1736, 2003.



          • Locally, there are clusters or communities.
          • First pass over a massive social graph:
               • Find smaller communities of interest.
               • Analyze / visualize top-ranked communities.
          • Our part: Community detection at massive scale. (Or kinda
              large, given available data.)
MTAAP 2012—Scalable Community Detection—Jason Riedy                                           4/35
Outline

      Motivation

      Defining community detection and metrics

      Shooting for massive graphs

      Our parallel method

      Implementation and platform details

      Performance

      Conclusions and plans



MTAAP 2012—Scalable Community Detection—Jason Riedy   5/35
Community detection

     What do we mean?
         • Partition a graph’s
           vertices into disjoint
           communities.
         • A community locally
           maximizes some metric.
                • Modularity,
                   conductance, ...
         • Trying to capture that
            vertices are more similar
            within one community
            than between
            communities.                              Jason’s network via LinkedIn Labs




MTAAP 2012—Scalable Community Detection—Jason Riedy                                       6/35
Community detection


     Assumptions
         • Disjoint partitioning of
           vertices.
         • There is no one unique
           answer.
                • Many metrics are
                  NP-complete to
                  optimize (Brandes, et
                  al.[1]).
                • Graph is lossy
                  representation.
         • Want an adaptable
            detection method.                         Jason’s network via LinkedIn Labs




MTAAP 2012—Scalable Community Detection—Jason Riedy                                       7/35
Common community metric: Modularity

          • Modularity: Deviation of connectivity in the community
             induced by a vertex set S from some expected background
             model of connectivity.
          • We take Newman [2]’s basic uniform model.
          • Let m count all edges in graph G, mS count of edges with
             both endpoints in S, and xS count the edges with any
             endpoint in S. Modularity QS :

                                        QS = (mS − x2 /4m)/m
                                                    S

          • Total modularity: sum of modularities of disjoint subsets.
          • A sufficiently positive modularity implies some structure.
          • Known issues: Resolution limit, NP-complete opt. prob.



MTAAP 2012—Scalable Community Detection—Jason Riedy                      8/35
Can we tackle massive graphs now?
      Parallel, of course...
          • Massive needs distributed memory, right?
          • Well... Not really. Can buy a 2 TiB Intel-based Dell server
             on-line for around $200k USD, a 1.5 TiB from IBM, etc.




                                                Image: dell.com.

                          Not an endorsement, just evidence!
          • Publicly available “real-world” data fits...
          • Start with shared memory to see what needs done.
          • Specialized architectures provide larger shared-memory views
             over distributed implementations (e.g. Cray XMT).
MTAAP 2012—Scalable Community Detection—Jason Riedy                        9/35
Multi-threaded algorithm design points




      A scalable multi-threaded graph analysis algorithm
          • ... avoids global locks and frequent global synchronization.
          • ... distributes computation over edges rather than only vertices.
          • ... works with data as local to an edge as possible.
          • ... uses compact data structures that agglomerate memory
             references.




MTAAP 2012—Scalable Community Detection—Jason Riedy                             10/35
Sequential agglomerative method


                                              • A common method (e.g. Clauset, et
                                                  al. [3]) agglomerates vertices into
       A                C                         communities.
                                              • Each vertex begins in its own
                    B                             community.
       D                                      • An edge is chosen to contract.
                                E                 • Merging maximally increases
                                                    modularity.
           G                                      • Priority queue.
                        F
                                              • Known often to fall into an O(n2 )
                                                  performance trap with modularity
                                                  (Wakita & Tsurumi [4]).



MTAAP 2012—Scalable Community Detection—Jason Riedy                                     11/35
Sequential agglomerative method


                                              • A common method (e.g. Clauset, et
                                                  al. [3]) agglomerates vertices into
       A                C                         communities.
                                              • Each vertex begins in its own
                    B                             community.
       D                                      • An edge is chosen to contract.
                                E                 • Merging maximally increases
                                                    modularity.
           G                                      • Priority queue.
                        F
                                              • Known often to fall into an O(n2 )
                                                  performance trap with modularity
                                                  (Wakita & Tsurumi [4]).



MTAAP 2012—Scalable Community Detection—Jason Riedy                                     12/35
Sequential agglomerative method


                                              • A common method (e.g. Clauset, et
                                                  al. [3]) agglomerates vertices into
       A                C                         communities.
                                              • Each vertex begins in its own
                    B                             community.
       D                                      • An edge is chosen to contract.
                                E                 • Merging maximally increases
                                                    modularity.
           G                                      • Priority queue.
                        F
                                              • Known often to fall into an O(n2 )
                                                  performance trap with modularity
                                                  (Wakita & Tsurumi [4]).



MTAAP 2012—Scalable Community Detection—Jason Riedy                                     13/35
Sequential agglomerative method


                                              • A common method (e.g. Clauset, et
                                                  al. [3]) agglomerates vertices into
       A                C                         communities.
                                              • Each vertex begins in its own
                    B                             community.
       D                                      • An edge is chosen to contract.
                                E                 • Merging maximally increases
                                                    modularity.
           G                                      • Priority queue.
                        F
                                              • Known often to fall into an O(n2 )
                                                  performance trap with modularity
                                                  (Wakita & Tsurumi [4]).



MTAAP 2012—Scalable Community Detection—Jason Riedy                                     14/35
Parallel agglomerative method

                                              • We use a matching to avoid the queue.
                                              • Compute a heavy weight, large
                                                  matching.
                                                      • Simple greedy algorithm.
       A                C                             • Maximal matching.
                                                      • Within factor of 2 in weight.
                    B                         • Merge all matched communities at
       D                                          once.
                                E             • Maintains some balance.
           G                                  • Produces different results.
                        F
                                              • Agnostic to weighting, matching...
                                                  • Can maximize modularity, minimize
                                                    conductance.
                                                  • Modifying matching permits easy
                                                    exploration.

MTAAP 2012—Scalable Community Detection—Jason Riedy                                     15/35
Parallel agglomerative method

                                              • We use a matching to avoid the queue.
                                              • Compute a heavy weight, large
                                                  matching.
                                                      • Simple greedy algorithm.
       A                C                             • Maximal matching.
                                                      • Within factor of 2 in weight.
                    B                         • Merge all matched communities at
       D                                          once.
                                E             • Maintains some balance.
           G                                  • Produces different results.
                        F
                                              • Agnostic to weighting, matching...
                                                  • Can maximize modularity, minimize
                                                    conductance.
                                                  • Modifying matching permits easy
                                                    exploration.

MTAAP 2012—Scalable Community Detection—Jason Riedy                                     16/35
Parallel agglomerative method

                                              • We use a matching to avoid the queue.
                                              • Compute a heavy weight, large
                                                  matching.
                                                      • Simple greedy algorithm.
       A                C                             • Maximal matching.
                                                      • Within factor of 2 in weight.
                    B                         • Merge all matched communities at
       D                                          once.
                                E             • Maintains some balance.
           G                                  • Produces different results.
                        F
                                              • Agnostic to weighting, matching...
                                                  • Can maximize modularity, minimize
                                                    conductance.
                                                  • Modifying matching permits easy
                                                    exploration.

MTAAP 2012—Scalable Community Detection—Jason Riedy                                     17/35
Platform: Cray XMT2
      Tolerates latency by massive multithreading.
          • Hardware: 128 threads per processor
               • Context switch on every cycle (500 MHz)
               • Many outstanding memory requests (180/proc)
               • “No” caches...
          • Flexibly supports dynamic load balancing
               • Globally hashed address space, no data cache
          • Support for fine-grained, word-level synchronization
               • Full/empty bit on with every memory word


            • 64 processor XMT2 at CSCS,
               the Swiss National
               Supercomputer Centre.
            • 500 MHz processors, 8192
               threads, 2 TiB of shared
               memory                                 Image: cray.com

MTAAP 2012—Scalable Community Detection—Jason Riedy                     18/35
Platform: Intel R E7-8870-based server
      Tolerates some latency by hyperthreading.
          • “Westmere:” 2 threads / core, 10 cores / socket, four sockets.
              • Fast cores (2.4 GHz), fast memory (1 066 MHz).
              • Not so many outstanding memory requests (60/socket), but
                large caches (30 MiB L3 per socket).
          • Good system support
              • Transparent hugepages reduces TLB costs.
              • Fast, user-level locking. (HLE would be better...)
              • OpenMP, although I didn’t tune it...


            • mirasol, #17 on Graph500
               (thanks to UCB)
            • Four processors (80 threads),
               256 GiB memory
            • gcc 4.6.1, Linux kernel
                                                      Image: Intel R press kit
               3.2.0-rc5
MTAAP 2012—Scalable Community Detection—Jason Riedy                              19/35
Platform: Other Intel R -based servers

      Different design points
          • “Nehalem” X5570: 2.93 GHz, 2 threads/core, 4 cores/socket,
             2 sockets, 8 MiB cache/socket
          • “Westmere” X5650: 2.66 GHz, 2 threads/core, 6 cores/socket,
             2 sockets, 12 MiB cache/socket
          • All with 1 066 MHz memory.
          • Does the Westmere E7-8870’s scale affect performance?


            • Nodes in Georgia Tech CSE
               cluster jinx
            • 24-48 GiB memory, small
               tests
                                                      Image: Intel R press kit




MTAAP 2012—Scalable Community Detection—Jason Riedy                              20/35
Implementation: Data structures

      Extremely basic for graph G = (V, E)
          • An array of (i, j; w) weighted edge pairs, each i, j stored only
             once and packed, uses 3|E| space
          • An array to store self-edges, d(i) = w, |V |
          • A temporary floating-point array for scores, |E|
          • A additional temporary arrays using 4|V | + 2|E| to store
             degrees, matching choices, offsets...

          • Weights count number of agglomerated vertices or edges.
          • Scoring methods (modularity, conductance) need only
             vertex-local counts.
          • Storing an undirected graph in a symmetric manner reduces
             memory usage drastically and works with our simple matcher.


MTAAP 2012—Scalable Community Detection—Jason Riedy                            21/35
Implementation: Data structures

      Extremely basic for graph G = (V, E)
          • An array of (i, j; w) weighted edge pairs, each i, j stored only
             once and packed, uses 3|E| space
          • An array to store self-edges, d(i) = w, |V |
          • A temporary floating-point array for scores, |E|
          • A additional temporary arrays using 4|V | + 2|E| to store
             degrees, matching choices, offsets...

          • Original ignored order in edge array, killed OpenMP.
          • New: Roughly bucket edge array by first stored index.
             Non-adjacent CSR-like structure.
          • New: Hash i, j to determine order. Scatter among buckets.



MTAAP 2012—Scalable Community Detection—Jason Riedy                            22/35
Implementation: Routines

      Three primitives: Scoring, matching, contracting
              Scoring Trivial.
           Matching Repeat until no ready, unmatched vertex:
                             1   For each unmatched vertex in parallel, find the
                                 best unmatched neighbor in its bucket.
                             2   Try to point remote match at that edge (lock,
                                 check if best, unlock).
                             3   If pointing succeeded, try to point self-match at
                                 that edge.
                             4   If both succeeded, yeah! If not and there was
                                 some eligible neighbor, re-add self to ready,
                                 unmatched list.
      (Possibly too simple, but...)


MTAAP 2012—Scalable Community Detection—Jason Riedy                                  23/35
Implementation: Routines


      Contracting
          1   Map each i, j to new vertices, re-order by hashing.
          2   Accumulate counts for new i bins, prefix-sum for offset.
          3   Copy into new bins.

          • Only synchronizing in the prefix-sum. That could be removed if
              I don’t re-order the i , j pair; haven’t timed the difference.
          • Actually, the current code copies twice... On short list for
              fixing.
          • Binning as opposed to original list-chasing enabled
              Intel/OpenMP support with reasonable performance.




MTAAP 2012—Scalable Community Detection—Jason Riedy                           24/35
Performance summary
      Two moderate-sized graphs, one large
                    Graph                     |V |              |E|        Reference
               rmat-24-16                15 580 378          262 482 711     [5, 6]
            soc-LiveJournal1              4 847 571           68 993 773       [7]
               uk-2007-05               105 896 555        3 301 876 564       [8]

      Peak processing rates in edges/second
               Platform        rmat-24-16            soc-LiveJournal1   uk-2007-05
                X5570           1.83 ×    106             3.89 × 106
                X5650           2.54 × 106                4.98 × 106
               E7-8870          5.86 × 106                6.90 × 106    6.54 × 106
                XMT             1.20 × 106                0.41 × 106
                XMT2            2.11 × 106                1.73 × 106    3.11 × 106

MTAAP 2012—Scalable Community Detection—Jason Riedy                                    25/35
Performance: Time to solution

                                                   rmat−24−16                                       soc−LiveJournal1

                             3162
                             1000   q
                                    q
                                        q
                                        q
                                              q
                              316                   q        q
                                                             q




                                                                                                                                  Intel
                                                             q
                              100                                                       q
                                                                                            q
                                                                                            q
                                                                                                q
                                                                                                q
                               32                                                                      q
                                                                                                       q
                                                                                                             q
                                                                                                             q
                               10
                                3
                  Time (s)




                             3162
                             1000
                              316




                                                                                                                                  Cray XMT
                              100
                               32
                               10
                                3


                                    1   2     4     8    16      32    64    128        1   2   4      8    16    32   64   128


                                                             Threads (OpenMP) / Processors (XMT)
                                                                             Platform
                                        q   X5570 (4−core)       X5650 (6−core)    E7−8870 (10−core)       XMT     XMT2




MTAAP 2012—Scalable Community Detection—Jason Riedy                                                                                          26/35
Performance: Rate (edges/second)

                                                            rmat−24−16                                         soc−LiveJournal1


                                     107.5

                                      107
                                                                                                                        q
                                     106.5




                                                                                                                                                Intel
                                                                                                                  q
                                                                                                                  q
                                                                      q
                                                                      q                                    q
                                                                                                           q
                                                             q
                                                             q        q
                                      106              q
                                                       q                                       q
                                                                                                     q
                                                                                                     q

                                                 q
                                                 q
                  Edges per second




                                     105.5   q
                                             q


                                         5
                                      10



                                     107.5

                                      107




                                                                                                                                                Cray XMT
                                     106.5

                                      106

                                     105.5

                                      105

                                             1   2     4     8        16    32    64    128    1     2     4      8     16    32     64   128


                                                                      Threads (OpenMP) / Processors (XMT)
                                                                                       Platform
                                                 q   X5570 (4−core)        X5650 (6−core)     E7−8870 (10−core)       XMT     XMT2




MTAAP 2012—Scalable Community Detection—Jason Riedy                                                                                                        27/35
Performance: Modularity
                                    0.8




                                    0.6
                                                      q
                                                                                 q
                       Modularity                                                q

                                    0.4




                                    0.2




                                    0.0

                                          0                            10                           20                   30
                                                                                            step


                                                                      Graph                                      Termination metric
                                              q   coAuthorsCiteseer    q    eu−2005   q   uk−2002        q   Coverage     Max         Average




          • Timing results: Stop when coverage ≥ 0.5 (communities cover 1/2 edges).
          • More work ⇒ higher modularity. Choice up to application.


MTAAP 2012—Scalable Community Detection—Jason Riedy                                                                                             28/35
Performance: Small-scale speedup

                                                                                                rmat−24−16                                      soc−LiveJournal1
                  Speed−up over one thread (OpenMP) or processor (XMT)
                                                                         32

                                                                         16

                                                                          8




                                                                                                                                                                                 Intel
                                                                                                      q
                                                                                                      q                                                  q
                                                                                                                                                         q
                                                                          4                     q     q
                                                                                                q
                                                                                                                                                    q
                                                                                                                                                    q
                                                                                                                                                    q
                                                                                          q
                                                                                          q                                                 q
                                                                                          q
                                                                          2                                                                 q
                                                                                                                                            q
                                                                                  q                                                   q
                                                                                  q
                                                                                                                                      q
                                                                          1


                                                                         32

                                                                         16




                                                                                                                                                                                 Cray XMT
                                                                          8

                                                                          4

                                                                          2

                                                                          1


                                                                              1   2       4     8     16     32    64   128     1     2     4       8    16   32      64   128


                                                                                                           Threads (OpenMP) / Processors (XMT)
                                                                                                                          Platform
                                                                                      q   X5570 (4−core)      X5650 (6−core)    E7−8870 (10−core)       XMT        XMT2




MTAAP 2012—Scalable Community Detection—Jason Riedy                                                                                                                                         29/35
Performance: Large-scale time


                                       31568s


                             214




                                   q
                                       6917s
                  Time (s)




                              12                q
                             2

                                                       q
                                                       q



                                                                       q
                                                                       q



                                                                                   q
                                                                                                           1063s
                             210                                                   q
                                                                                          q   q
                                                                                              q
                                                                                                   q
                                                                                                   q   q
                                                                                                               q
                                                                                                       q   q
                                                                                                           q
                                                                                                               q
                                                                                                                504.9s
                                                                                                                 q
                                                                                                                 q q
                                                                                                                    q




                                   1            2      4               8           16         32               64

                                                    Threads (OpenMP) / Processors (XMT)
                                                                       Platform
                                                           q   E7−8870 (10−core)   XMT2




MTAAP 2012—Scalable Community Detection—Jason Riedy                                                                      30/35
Performance: Large-scale speedup


                  Speed up over single processor/thread                                                                   29.6x



                                                          24                                                                      13.7x
                                                                                                                                           q
                                                                                                                                  q    q   q
                                                                                                                              q
                                                                                                                                       q
                                                                                                                          q   q
                                                                                                                      q           q
                                                                                                                          q
                                                                                                                      q
                                                          23                                                     q
                                                                                                                 q
                                                                                                             q
                                                                                                       q
                                                                                                       q




                                                           2                              q
                                                          2                               q



                                                                         q
                                                                         q


                                                          21
                                                                   q




                                                               1   2     4                8            16        32               64

                                                                       Threads (OpenMP) / Processors (XMT)
                                                                                         Platform
                                                                             q   E7−8870 (10−core)   XMT2




MTAAP 2012—Scalable Community Detection—Jason Riedy                                                                                            31/35
Conclusions and plans

          • Code:
            http://www.cc.gatech.edu/~jriedy/community-detection/
          • Some low-hanging fruit remains:
              • Eliminate one unnecessary copy during contraction.
              • Deal with stars.
          • Then... Practical experiments.
              • How volatile are modularity and conductance to perturbations?
              • What matching schemes work well?
              • How do different metrics compare in applications?
          • Extending to streaming graph data!
              • Includes developing parallel refinement...
              • And possibly de-clustering or manipulating the dendogram....
              • Very much WIP, more tricky than anticipated.




MTAAP 2012—Scalable Community Detection—Jason Riedy                             32/35
Acknowledgment of support




MTAAP 2012—Scalable Community Detection—Jason Riedy   33/35
Bibliography I

            U. Brandes, D. Delling, M. Gaertler, R. G¨rke, M. Hoefer,
                                                     o
            Z. Nikoloski, and D. Wagner, “On modularity clustering,” IEEE
            Trans. Knowledge and Data Engineering, vol. 20, no. 2, pp.
            172–188, 2008.
            M. Newman, “Modularity and community structure in
            networks,” Proc. of the National Academy of Sciences, vol. 103,
            no. 23, pp. 8577–8582, 2006.
            A. Clauset, M. Newman, and C. Moore, “Finding community
            structure in very large networks,” Physical Review E, vol. 70,
            no. 6, p. 66111, 2004.
            K. Wakita and T. Tsurumi, “Finding community structure in
            mega-scale social networks,” CoRR, vol. abs/cs/0702048, 2007.


MTAAP 2012—Scalable Community Detection—Jason Riedy                           34/35
Bibliography II


            D. Chakrabarti, Y. Zhan, and C. Faloutsos, “R-MAT: A
            recursive model for graph mining,” in Proc. 4th SIAM Intl.
            Conf. on Data Mining (SDM). Orlando, FL: SIAM, Apr. 2004.
            D. Bader, J. Gilbert, J. Kepner, D. Koester, E. Loh,
            K. Madduri, W. Mann, and T. Meuse, HPCS SSCA#2 Graph
            Analysis Benchmark Specifications v1.1, Jul. 2005.
            J. Leskovec, “Stanford large network dataset collection,” At
            http://snap.stanford.edu/data/, Oct. 2011.
            P. Boldi, B. Codenotti, M. Santini, and S. Vigna, “Ubicrawler:
            A scalable fully distributed web crawler,” Software: Practice &
            Experience, vol. 34, no. 8, pp. 711–726, 2004.



MTAAP 2012—Scalable Community Detection—Jason Riedy                           35/35

Weitere ähnliche Inhalte

Andere mochten auch

Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detectionDavid Gleich
 
Community detection
Community detectionCommunity detection
Community detectionScott Pauls
 
Community detection algorithms
Community detection algorithmsCommunity detection algorithms
Community detection algorithmsAlireza Andalib
 
Human mobility,urban structure analysis,and spatial community detection from ...
Human mobility,urban structure analysis,and spatial community detection from ...Human mobility,urban structure analysis,and spatial community detection from ...
Human mobility,urban structure analysis,and spatial community detection from ...Song Gao
 
Μελέτη της οντολογίας FOAF (Friend of a Friend)
Μελέτη της οντολογίας FOAF (Friend of a Friend)Μελέτη της οντολογίας FOAF (Friend of a Friend)
Μελέτη της οντολογίας FOAF (Friend of a Friend)Giorgos Bamparopoulos
 
Community Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief OverviewCommunity Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief OverviewSatyaki Sikdar
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
 
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Paragon_Science_Inc
 
Community detection in graphs with NetworKit
Community detection in graphs with NetworKitCommunity detection in graphs with NetworKit
Community detection in graphs with NetworKitBenj Pettit
 
Community Detection with Networkx
Community Detection with NetworkxCommunity Detection with Networkx
Community Detection with NetworkxErika Fille Legara
 
Community detection from a computational social science perspective
Community detection from a computational social science perspectiveCommunity detection from a computational social science perspective
Community detection from a computational social science perspectiveDavide Bennato
 
Entropy based algorithm for community detection in augmented networks
Entropy based algorithm for community detection in augmented networksEntropy based algorithm for community detection in augmented networks
Entropy based algorithm for community detection in augmented networksJuan David Cruz-Gómez
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionSymeon Papadopoulos
 
Community detection in graphs
Community detection in graphsCommunity detection in graphs
Community detection in graphsNicola Barbieri
 
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Spark Summit
 
Advanced Methods in Network Science: Community Detection Algorithms
Advanced Methods in Network Science: Community Detection Algorithms Advanced Methods in Network Science: Community Detection Algorithms
Advanced Methods in Network Science: Community Detection Algorithms Daniel Katz
 
Community Detection in Social Media
Community Detection in Social MediaCommunity Detection in Social Media
Community Detection in Social MediaSymeon Papadopoulos
 

Andere mochten auch (18)

Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 
Community detection
Community detectionCommunity detection
Community detection
 
Community detection algorithms
Community detection algorithmsCommunity detection algorithms
Community detection algorithms
 
Human mobility,urban structure analysis,and spatial community detection from ...
Human mobility,urban structure analysis,and spatial community detection from ...Human mobility,urban structure analysis,and spatial community detection from ...
Human mobility,urban structure analysis,and spatial community detection from ...
 
Μελέτη της οντολογίας FOAF (Friend of a Friend)
Μελέτη της οντολογίας FOAF (Friend of a Friend)Μελέτη της οντολογίας FOAF (Friend of a Friend)
Μελέτη της οντολογίας FOAF (Friend of a Friend)
 
Community Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief OverviewCommunity Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief Overview
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
 
Community detection in graphs with NetworKit
Community detection in graphs with NetworKitCommunity detection in graphs with NetworKit
Community detection in graphs with NetworKit
 
Community Detection with Networkx
Community Detection with NetworkxCommunity Detection with Networkx
Community Detection with Networkx
 
Community detection from a computational social science perspective
Community detection from a computational social science perspectiveCommunity detection from a computational social science perspective
Community detection from a computational social science perspective
 
Entropy based algorithm for community detection in augmented networks
Entropy based algorithm for community detection in augmented networksEntropy based algorithm for community detection in augmented networks
Entropy based algorithm for community detection in augmented networks
 
Entropy based measures for graphs
Entropy based measures for graphsEntropy based measures for graphs
Entropy based measures for graphs
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
 
Community detection in graphs
Community detection in graphsCommunity detection in graphs
Community detection in graphs
 
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
 
Advanced Methods in Network Science: Community Detection Algorithms
Advanced Methods in Network Science: Community Detection Algorithms Advanced Methods in Network Science: Community Detection Algorithms
Advanced Methods in Network Science: Community Detection Algorithms
 
Community Detection in Social Media
Community Detection in Social MediaCommunity Detection in Social Media
Community Detection in Social Media
 

Ähnlich wie Scalable Multi-threaded Community Detection in Massive Social Networks

Parallel Community Detection for Massive Graphs
Parallel Community Detection for Massive GraphsParallel Community Detection for Massive Graphs
Parallel Community Detection for Massive GraphsJason Riedy
 
2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network Analysis2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network AnalysisMarc Smith
 
20121010 marc smith - mapping collections of connections in social media with...
20121010 marc smith - mapping collections of connections in social media with...20121010 marc smith - mapping collections of connections in social media with...
20121010 marc smith - mapping collections of connections in social media with...Marc Smith
 
20121001 pawcon 2012-marc smith - mapping collections of connections in socia...
20121001 pawcon 2012-marc smith - mapping collections of connections in socia...20121001 pawcon 2012-marc smith - mapping collections of connections in socia...
20121001 pawcon 2012-marc smith - mapping collections of connections in socia...Marc Smith
 
Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction CS, NcState
 
20120301 strata-marc smith-mapping social media networks with no coding using...
20120301 strata-marc smith-mapping social media networks with no coding using...20120301 strata-marc smith-mapping social media networks with no coding using...
20120301 strata-marc smith-mapping social media networks with no coding using...Marc Smith
 
Jürgens diata12-communities
Jürgens diata12-communitiesJürgens diata12-communities
Jürgens diata12-communitiesPascal Juergens
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraJason Riedy
 
LSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social MediaLSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social MediaLocal Social Summit
 
20111103 con tech2011-marc smith
20111103 con tech2011-marc smith20111103 con tech2011-marc smith
20111103 con tech2011-marc smithMarc Smith
 
2010 june - personal democracy forum - marc smith - mapping political socia...
2010   june - personal democracy forum - marc smith - mapping political socia...2010   june - personal democracy forum - marc smith - mapping political socia...
2010 june - personal democracy forum - marc smith - mapping political socia...Marc Smith
 
Network sampling, community detection
Network sampling, community detectionNetwork sampling, community detection
Network sampling, community detectionroberval mariano
 
20111123 mwa2011-marc smith
20111123 mwa2011-marc smith20111123 mwa2011-marc smith
20111123 mwa2011-marc smithMarc Smith
 
Community detection in social networks[1]
Community detection in social networks[1]Community detection in social networks[1]
Community detection in social networks[1]sdnumaygmailcom
 
Presentation Journée Thématique - Fouille de Grands Graphes
Presentation Journée Thématique - Fouille de Grands GraphesPresentation Journée Thématique - Fouille de Grands Graphes
Presentation Journée Thématique - Fouille de Grands GraphesJuan David Cruz-Gómez
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsJason Riedy
 
Group and Community Detection in Social Networks
Group and Community Detection in Social NetworksGroup and Community Detection in Social Networks
Group and Community Detection in Social NetworksKent State University
 
MobiCom CHANTS
MobiCom CHANTSMobiCom CHANTS
MobiCom CHANTSgsthakur
 
Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors ijbbjournal
 

Ähnlich wie Scalable Multi-threaded Community Detection in Massive Social Networks (20)

Parallel Community Detection for Massive Graphs
Parallel Community Detection for Massive GraphsParallel Community Detection for Massive Graphs
Parallel Community Detection for Massive Graphs
 
2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network Analysis2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network Analysis
 
20121010 marc smith - mapping collections of connections in social media with...
20121010 marc smith - mapping collections of connections in social media with...20121010 marc smith - mapping collections of connections in social media with...
20121010 marc smith - mapping collections of connections in social media with...
 
20121001 pawcon 2012-marc smith - mapping collections of connections in socia...
20121001 pawcon 2012-marc smith - mapping collections of connections in socia...20121001 pawcon 2012-marc smith - mapping collections of connections in socia...
20121001 pawcon 2012-marc smith - mapping collections of connections in socia...
 
Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction
 
20120301 strata-marc smith-mapping social media networks with no coding using...
20120301 strata-marc smith-mapping social media networks with no coding using...20120301 strata-marc smith-mapping social media networks with no coding using...
20120301 strata-marc smith-mapping social media networks with no coding using...
 
Jürgens diata12-communities
Jürgens diata12-communitiesJürgens diata12-communities
Jürgens diata12-communities
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
LSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social MediaLSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social Media
 
20111103 con tech2011-marc smith
20111103 con tech2011-marc smith20111103 con tech2011-marc smith
20111103 con tech2011-marc smith
 
2010 june - personal democracy forum - marc smith - mapping political socia...
2010   june - personal democracy forum - marc smith - mapping political socia...2010   june - personal democracy forum - marc smith - mapping political socia...
2010 june - personal democracy forum - marc smith - mapping political socia...
 
Presentation OOC2013
Presentation OOC2013Presentation OOC2013
Presentation OOC2013
 
Network sampling, community detection
Network sampling, community detectionNetwork sampling, community detection
Network sampling, community detection
 
20111123 mwa2011-marc smith
20111123 mwa2011-marc smith20111123 mwa2011-marc smith
20111123 mwa2011-marc smith
 
Community detection in social networks[1]
Community detection in social networks[1]Community detection in social networks[1]
Community detection in social networks[1]
 
Presentation Journée Thématique - Fouille de Grands Graphes
Presentation Journée Thématique - Fouille de Grands GraphesPresentation Journée Thématique - Fouille de Grands Graphes
Presentation Journée Thématique - Fouille de Grands Graphes
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
 
Group and Community Detection in Social Networks
Group and Community Detection in Social NetworksGroup and Community Detection in Social Networks
Group and Community Detection in Social Networks
 
MobiCom CHANTS
MobiCom CHANTSMobiCom CHANTS
MobiCom CHANTS
 
Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors
 

Mehr von Jason Riedy

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFJason Riedy
 
LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13Jason Riedy
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFJason Riedy
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architecturesJason Riedy
 
GraphBLAS and Emus
GraphBLAS and EmusGraphBLAS and Emus
GraphBLAS and EmusJason Riedy
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureJason Riedy
 
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...Jason Riedy
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureJason Riedy
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
 
Novel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondNovel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondJason Riedy
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksJason Riedy
 
CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateJason Riedy
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Jason Riedy
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesJason Riedy
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsJason Riedy
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsJason Riedy
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisA New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs Jason Riedy
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsJason Riedy
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsJason Riedy
 

Mehr von Jason Riedy (20)

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architectures
 
GraphBLAS and Emus
GraphBLAS and EmusGraphBLAS and Emus
GraphBLAS and Emus
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to Architecture
 
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
Novel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondNovel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and Beyond
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with Microbenchmarks
 
CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery Update
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New Architectures
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisA New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming Graphs
 

Kürzlich hochgeladen

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Kürzlich hochgeladen (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

Scalable Multi-threaded Community Detection in Massive Social Networks

  • 1. Scalable Multi-threaded Community Detection in Social Networks E. Jason Riedy1 , David A. Bader1 , and Henning Meyerhenke2 1 School of Comp. Science and Engineering, Georgia Inst. of Technology 2 Inst. of Theoretical Informatics, Karlsruhe Inst. of Technology (KIT) 25 May 2012
  • 2. Exascale data analysis Health care Finding outbreaks, population epidemiology Social networks Advertising, searching, grouping Intelligence Decisions at scale, regulating algorithms Systems biology Understanding interactions, drug design Power grid Disruptions, conservation Simulation Discrete events, cracking meshes • Graph clustering is common in all application areas. MTAAP 2012—Scalable Community Detection—Jason Riedy 2/35
  • 3. These are not easy graphs. Yifan Hu’s (AT&T) visualization of the in-2004 data set http://www2.research.att.com/~yifanhu/gallery.html MTAAP 2012—Scalable Community Detection—Jason Riedy 3/35
  • 4. But no shortage of structure... Protein interactions, Giot et al., “A Protein Interaction Map of Drosophila melanogaster”, Jason’s network via LinkedIn Labs Science 302, 1722-1736, 2003. • Locally, there are clusters or communities. • First pass over a massive social graph: • Find smaller communities of interest. • Analyze / visualize top-ranked communities. • Our part: Community detection at massive scale. (Or kinda large, given available data.) MTAAP 2012—Scalable Community Detection—Jason Riedy 4/35
  • 5. Outline Motivation Defining community detection and metrics Shooting for massive graphs Our parallel method Implementation and platform details Performance Conclusions and plans MTAAP 2012—Scalable Community Detection—Jason Riedy 5/35
  • 6. Community detection What do we mean? • Partition a graph’s vertices into disjoint communities. • A community locally maximizes some metric. • Modularity, conductance, ... • Trying to capture that vertices are more similar within one community than between communities. Jason’s network via LinkedIn Labs MTAAP 2012—Scalable Community Detection—Jason Riedy 6/35
  • 7. Community detection Assumptions • Disjoint partitioning of vertices. • There is no one unique answer. • Many metrics are NP-complete to optimize (Brandes, et al.[1]). • Graph is lossy representation. • Want an adaptable detection method. Jason’s network via LinkedIn Labs MTAAP 2012—Scalable Community Detection—Jason Riedy 7/35
  • 8. Common community metric: Modularity • Modularity: Deviation of connectivity in the community induced by a vertex set S from some expected background model of connectivity. • We take Newman [2]’s basic uniform model. • Let m count all edges in graph G, mS count of edges with both endpoints in S, and xS count the edges with any endpoint in S. Modularity QS : QS = (mS − x2 /4m)/m S • Total modularity: sum of modularities of disjoint subsets. • A sufficiently positive modularity implies some structure. • Known issues: Resolution limit, NP-complete opt. prob. MTAAP 2012—Scalable Community Detection—Jason Riedy 8/35
  • 9. Can we tackle massive graphs now? Parallel, of course... • Massive needs distributed memory, right? • Well... Not really. Can buy a 2 TiB Intel-based Dell server on-line for around $200k USD, a 1.5 TiB from IBM, etc. Image: dell.com. Not an endorsement, just evidence! • Publicly available “real-world” data fits... • Start with shared memory to see what needs done. • Specialized architectures provide larger shared-memory views over distributed implementations (e.g. Cray XMT). MTAAP 2012—Scalable Community Detection—Jason Riedy 9/35
  • 10. Multi-threaded algorithm design points A scalable multi-threaded graph analysis algorithm • ... avoids global locks and frequent global synchronization. • ... distributes computation over edges rather than only vertices. • ... works with data as local to an edge as possible. • ... uses compact data structures that agglomerate memory references. MTAAP 2012—Scalable Community Detection—Jason Riedy 10/35
  • 11. Sequential agglomerative method • A common method (e.g. Clauset, et al. [3]) agglomerates vertices into A C communities. • Each vertex begins in its own B community. D • An edge is chosen to contract. E • Merging maximally increases modularity. G • Priority queue. F • Known often to fall into an O(n2 ) performance trap with modularity (Wakita & Tsurumi [4]). MTAAP 2012—Scalable Community Detection—Jason Riedy 11/35
  • 12. Sequential agglomerative method • A common method (e.g. Clauset, et al. [3]) agglomerates vertices into A C communities. • Each vertex begins in its own B community. D • An edge is chosen to contract. E • Merging maximally increases modularity. G • Priority queue. F • Known often to fall into an O(n2 ) performance trap with modularity (Wakita & Tsurumi [4]). MTAAP 2012—Scalable Community Detection—Jason Riedy 12/35
  • 13. Sequential agglomerative method • A common method (e.g. Clauset, et al. [3]) agglomerates vertices into A C communities. • Each vertex begins in its own B community. D • An edge is chosen to contract. E • Merging maximally increases modularity. G • Priority queue. F • Known often to fall into an O(n2 ) performance trap with modularity (Wakita & Tsurumi [4]). MTAAP 2012—Scalable Community Detection—Jason Riedy 13/35
  • 14. Sequential agglomerative method • A common method (e.g. Clauset, et al. [3]) agglomerates vertices into A C communities. • Each vertex begins in its own B community. D • An edge is chosen to contract. E • Merging maximally increases modularity. G • Priority queue. F • Known often to fall into an O(n2 ) performance trap with modularity (Wakita & Tsurumi [4]). MTAAP 2012—Scalable Community Detection—Jason Riedy 14/35
  • 15. Parallel agglomerative method • We use a matching to avoid the queue. • Compute a heavy weight, large matching. • Simple greedy algorithm. A C • Maximal matching. • Within factor of 2 in weight. B • Merge all matched communities at D once. E • Maintains some balance. G • Produces different results. F • Agnostic to weighting, matching... • Can maximize modularity, minimize conductance. • Modifying matching permits easy exploration. MTAAP 2012—Scalable Community Detection—Jason Riedy 15/35
  • 16. Parallel agglomerative method • We use a matching to avoid the queue. • Compute a heavy weight, large matching. • Simple greedy algorithm. A C • Maximal matching. • Within factor of 2 in weight. B • Merge all matched communities at D once. E • Maintains some balance. G • Produces different results. F • Agnostic to weighting, matching... • Can maximize modularity, minimize conductance. • Modifying matching permits easy exploration. MTAAP 2012—Scalable Community Detection—Jason Riedy 16/35
  • 17. Parallel agglomerative method • We use a matching to avoid the queue. • Compute a heavy weight, large matching. • Simple greedy algorithm. A C • Maximal matching. • Within factor of 2 in weight. B • Merge all matched communities at D once. E • Maintains some balance. G • Produces different results. F • Agnostic to weighting, matching... • Can maximize modularity, minimize conductance. • Modifying matching permits easy exploration. MTAAP 2012—Scalable Community Detection—Jason Riedy 17/35
  • 18. Platform: Cray XMT2 Tolerates latency by massive multithreading. • Hardware: 128 threads per processor • Context switch on every cycle (500 MHz) • Many outstanding memory requests (180/proc) • “No” caches... • Flexibly supports dynamic load balancing • Globally hashed address space, no data cache • Support for fine-grained, word-level synchronization • Full/empty bit on with every memory word • 64 processor XMT2 at CSCS, the Swiss National Supercomputer Centre. • 500 MHz processors, 8192 threads, 2 TiB of shared memory Image: cray.com MTAAP 2012—Scalable Community Detection—Jason Riedy 18/35
  • 19. Platform: Intel R E7-8870-based server Tolerates some latency by hyperthreading. • “Westmere:” 2 threads / core, 10 cores / socket, four sockets. • Fast cores (2.4 GHz), fast memory (1 066 MHz). • Not so many outstanding memory requests (60/socket), but large caches (30 MiB L3 per socket). • Good system support • Transparent hugepages reduces TLB costs. • Fast, user-level locking. (HLE would be better...) • OpenMP, although I didn’t tune it... • mirasol, #17 on Graph500 (thanks to UCB) • Four processors (80 threads), 256 GiB memory • gcc 4.6.1, Linux kernel Image: Intel R press kit 3.2.0-rc5 MTAAP 2012—Scalable Community Detection—Jason Riedy 19/35
  • 20. Platform: Other Intel R -based servers Different design points • “Nehalem” X5570: 2.93 GHz, 2 threads/core, 4 cores/socket, 2 sockets, 8 MiB cache/socket • “Westmere” X5650: 2.66 GHz, 2 threads/core, 6 cores/socket, 2 sockets, 12 MiB cache/socket • All with 1 066 MHz memory. • Does the Westmere E7-8870’s scale affect performance? • Nodes in Georgia Tech CSE cluster jinx • 24-48 GiB memory, small tests Image: Intel R press kit MTAAP 2012—Scalable Community Detection—Jason Riedy 20/35
  • 21. Implementation: Data structures Extremely basic for graph G = (V, E) • An array of (i, j; w) weighted edge pairs, each i, j stored only once and packed, uses 3|E| space • An array to store self-edges, d(i) = w, |V | • A temporary floating-point array for scores, |E| • A additional temporary arrays using 4|V | + 2|E| to store degrees, matching choices, offsets... • Weights count number of agglomerated vertices or edges. • Scoring methods (modularity, conductance) need only vertex-local counts. • Storing an undirected graph in a symmetric manner reduces memory usage drastically and works with our simple matcher. MTAAP 2012—Scalable Community Detection—Jason Riedy 21/35
  • 22. Implementation: Data structures Extremely basic for graph G = (V, E) • An array of (i, j; w) weighted edge pairs, each i, j stored only once and packed, uses 3|E| space • An array to store self-edges, d(i) = w, |V | • A temporary floating-point array for scores, |E| • A additional temporary arrays using 4|V | + 2|E| to store degrees, matching choices, offsets... • Original ignored order in edge array, killed OpenMP. • New: Roughly bucket edge array by first stored index. Non-adjacent CSR-like structure. • New: Hash i, j to determine order. Scatter among buckets. MTAAP 2012—Scalable Community Detection—Jason Riedy 22/35
  • 23. Implementation: Routines Three primitives: Scoring, matching, contracting Scoring Trivial. Matching Repeat until no ready, unmatched vertex: 1 For each unmatched vertex in parallel, find the best unmatched neighbor in its bucket. 2 Try to point remote match at that edge (lock, check if best, unlock). 3 If pointing succeeded, try to point self-match at that edge. 4 If both succeeded, yeah! If not and there was some eligible neighbor, re-add self to ready, unmatched list. (Possibly too simple, but...) MTAAP 2012—Scalable Community Detection—Jason Riedy 23/35
  • 24. Implementation: Routines Contracting 1 Map each i, j to new vertices, re-order by hashing. 2 Accumulate counts for new i bins, prefix-sum for offset. 3 Copy into new bins. • Only synchronizing in the prefix-sum. That could be removed if I don’t re-order the i , j pair; haven’t timed the difference. • Actually, the current code copies twice... On short list for fixing. • Binning as opposed to original list-chasing enabled Intel/OpenMP support with reasonable performance. MTAAP 2012—Scalable Community Detection—Jason Riedy 24/35
  • 25. Performance summary Two moderate-sized graphs, one large Graph |V | |E| Reference rmat-24-16 15 580 378 262 482 711 [5, 6] soc-LiveJournal1 4 847 571 68 993 773 [7] uk-2007-05 105 896 555 3 301 876 564 [8] Peak processing rates in edges/second Platform rmat-24-16 soc-LiveJournal1 uk-2007-05 X5570 1.83 × 106 3.89 × 106 X5650 2.54 × 106 4.98 × 106 E7-8870 5.86 × 106 6.90 × 106 6.54 × 106 XMT 1.20 × 106 0.41 × 106 XMT2 2.11 × 106 1.73 × 106 3.11 × 106 MTAAP 2012—Scalable Community Detection—Jason Riedy 25/35
  • 26. Performance: Time to solution rmat−24−16 soc−LiveJournal1 3162 1000 q q q q q 316 q q q Intel q 100 q q q q q 32 q q q q 10 3 Time (s) 3162 1000 316 Cray XMT 100 32 10 3 1 2 4 8 16 32 64 128 1 2 4 8 16 32 64 128 Threads (OpenMP) / Processors (XMT) Platform q X5570 (4−core) X5650 (6−core) E7−8870 (10−core) XMT XMT2 MTAAP 2012—Scalable Community Detection—Jason Riedy 26/35
  • 27. Performance: Rate (edges/second) rmat−24−16 soc−LiveJournal1 107.5 107 q 106.5 Intel q q q q q q q q q 106 q q q q q q q Edges per second 105.5 q q 5 10 107.5 107 Cray XMT 106.5 106 105.5 105 1 2 4 8 16 32 64 128 1 2 4 8 16 32 64 128 Threads (OpenMP) / Processors (XMT) Platform q X5570 (4−core) X5650 (6−core) E7−8870 (10−core) XMT XMT2 MTAAP 2012—Scalable Community Detection—Jason Riedy 27/35
  • 28. Performance: Modularity 0.8 0.6 q q Modularity q 0.4 0.2 0.0 0 10 20 30 step Graph Termination metric q coAuthorsCiteseer q eu−2005 q uk−2002 q Coverage Max Average • Timing results: Stop when coverage ≥ 0.5 (communities cover 1/2 edges). • More work ⇒ higher modularity. Choice up to application. MTAAP 2012—Scalable Community Detection—Jason Riedy 28/35
  • 29. Performance: Small-scale speedup rmat−24−16 soc−LiveJournal1 Speed−up over one thread (OpenMP) or processor (XMT) 32 16 8 Intel q q q q 4 q q q q q q q q q q 2 q q q q q q 1 32 16 Cray XMT 8 4 2 1 1 2 4 8 16 32 64 128 1 2 4 8 16 32 64 128 Threads (OpenMP) / Processors (XMT) Platform q X5570 (4−core) X5650 (6−core) E7−8870 (10−core) XMT XMT2 MTAAP 2012—Scalable Community Detection—Jason Riedy 29/35
  • 30. Performance: Large-scale time 31568s 214 q 6917s Time (s) 12 q 2 q q q q q 1063s 210 q q q q q q q q q q q q 504.9s q q q q 1 2 4 8 16 32 64 Threads (OpenMP) / Processors (XMT) Platform q E7−8870 (10−core) XMT2 MTAAP 2012—Scalable Community Detection—Jason Riedy 30/35
  • 31. Performance: Large-scale speedup Speed up over single processor/thread 29.6x 24 13.7x q q q q q q q q q q q q 23 q q q q q 2 q 2 q q q 21 q 1 2 4 8 16 32 64 Threads (OpenMP) / Processors (XMT) Platform q E7−8870 (10−core) XMT2 MTAAP 2012—Scalable Community Detection—Jason Riedy 31/35
  • 32. Conclusions and plans • Code: http://www.cc.gatech.edu/~jriedy/community-detection/ • Some low-hanging fruit remains: • Eliminate one unnecessary copy during contraction. • Deal with stars. • Then... Practical experiments. • How volatile are modularity and conductance to perturbations? • What matching schemes work well? • How do different metrics compare in applications? • Extending to streaming graph data! • Includes developing parallel refinement... • And possibly de-clustering or manipulating the dendogram.... • Very much WIP, more tricky than anticipated. MTAAP 2012—Scalable Community Detection—Jason Riedy 32/35
  • 33. Acknowledgment of support MTAAP 2012—Scalable Community Detection—Jason Riedy 33/35
  • 34. Bibliography I U. Brandes, D. Delling, M. Gaertler, R. G¨rke, M. Hoefer, o Z. Nikoloski, and D. Wagner, “On modularity clustering,” IEEE Trans. Knowledge and Data Engineering, vol. 20, no. 2, pp. 172–188, 2008. M. Newman, “Modularity and community structure in networks,” Proc. of the National Academy of Sciences, vol. 103, no. 23, pp. 8577–8582, 2006. A. Clauset, M. Newman, and C. Moore, “Finding community structure in very large networks,” Physical Review E, vol. 70, no. 6, p. 66111, 2004. K. Wakita and T. Tsurumi, “Finding community structure in mega-scale social networks,” CoRR, vol. abs/cs/0702048, 2007. MTAAP 2012—Scalable Community Detection—Jason Riedy 34/35
  • 35. Bibliography II D. Chakrabarti, Y. Zhan, and C. Faloutsos, “R-MAT: A recursive model for graph mining,” in Proc. 4th SIAM Intl. Conf. on Data Mining (SDM). Orlando, FL: SIAM, Apr. 2004. D. Bader, J. Gilbert, J. Kepner, D. Koester, E. Loh, K. Madduri, W. Mann, and T. Meuse, HPCS SSCA#2 Graph Analysis Benchmark Specifications v1.1, Jul. 2005. J. Leskovec, “Stanford large network dataset collection,” At http://snap.stanford.edu/data/, Oct. 2011. P. Boldi, B. Codenotti, M. Santini, and S. Vigna, “Ubicrawler: A scalable fully distributed web crawler,” Software: Practice & Experience, vol. 34, no. 8, pp. 711–726, 2004. MTAAP 2012—Scalable Community Detection—Jason Riedy 35/35