SlideShare ist ein Scribd-Unternehmen logo
1 von 47
Community Detection
Ilio Catallo, catallo@elet.polimi.it
Politecnico di Milano
Outline
¡  Communities and Partitions
  ¡  What is a community?
  ¡  What is a partition?

¡  Partitioning algorithms
  ¡  Kerninghan and Lin, 1970
  ¡  Newman and Girvan, 2004
  ¡  Bagrow and Bollt, 2008

¡  Assess the quality of good partitions
  ¡  The impossibility theorem
  ¡  Quality functions
Communities
         and
   Partitions
4

What is a community?
Intuition
¡  Community: a set of tightly
    connected nodes

¡  Examples:
    ¡  People with common
        interests
  ¡  Papers on the same
      topics
  ¡  Scholars working on the
      same field
5

What is a community?
Local definitions (1/3)
clique (complete subgraph)
 ¡  Too strict definition (what to
     do if just one link is missing?)
 ¡  Cliques are hard to find
     (exponential complexity in
     the graph size)
6

   What is a community?
   Local definitions (2/3)
   Strong community: subgraph
   V ⊆ G such that each vertex
   has more connection within
   the community than with the
   rest of the graph


   in        out
  ki (V ) > ki (V )          8i 2 V


The number of edges     The number of
connecting node i to    connections toward
other nodes belonging   nodes in the rest of the
to V                    graph
7

    What is a community?
    Local definitions (3/3)
    ¡  Strong communitiy definition is too strict
      ¡  Unrealistic in many real cases

    ¡  Weak communities: subgraph V ⊆ G such that
        the sum of all degrees within V in greater than
        the sum of all degrees toward the rest of the
        network
      ¡  A strong community is also weak, while the converse
          is not generally true
                  P        in
                                       P          out
                      i2V ki (V   )>       i2V   ki (V )
                                                       number of edges connecting
number of edges connecting                             nodes in V toward nodes in the
nodes in V to other nodes                              rest of the graph
belonging to V
8

What is a community?
Global definitions (1/2)
¡  Idea: the graph has a community structure if it is
    different from the random graph

¡  Random graph: graph such that each pair of
    vertices is connected with equal probability p,
    independently on the other pairs
  ¡  Any two vertices have the same probability to be
      adjacent
  ¡  No preferential linking involving
9

What is a community?
Global definitions (2/2)
¡  The graph of interest is compared with the null
    model

¡  Null model: a graph which matches the original
    in some of its structural features, but which is
    otherwise a random graph
 ¡  Used as term of comparison to verify whether the
     graph of interest shows community structures
10

What is a community?
Vertex-based definitions
¡  Idea: communities are subgraphs of vertices similar
    to each other
  ¡  A measure of similarity needs to be defined

¡  If it is possible to embed the vertices in an n-
    dimensional Euclidian space, possible (dis)similarity
    measures are:                          q
                                       PN              2
    ¡  Euclidian distance      dA,B = j      (ak bk )
                                       PN             2
    ¡  Manhattan distance      dA,B = j |(ak bk ) |
                                        A·B
  ¡  Cosine similarity       dA,B =   kAkkBk

¡  With A = (a1, a2, …, aN) and B = (b1, b2, …, bN) vertex
    feature vectors
11

What is a community?
Vertex-based definitions
¡  If it is not possible to embed the vertices in
    Euclidian space the similarity must be inferred
    from the adjacency relationships
¡  Dissimilarity measure based on structural
    equivalence:
                       qP
                 dij =    k6=i,j (Aik Ajk )2

¡  Structural equivalence: two vertices are structural
    equivalent if they have the same neighbors,
    even if they are not adjacent themselves
  ¡  if i and j are structural equivalent then dij = 0
12



What is a partition?
¡  Partition: a division of a
    graph in clusters, such that
    each vertex belongs to one
    cluster

¡  If the vertices can be
    shared among different
    communities the division is
    called cover
13

How many partitions we
may have in a graph?
¡  Stirling number of second kind: the number of
    possible partitions in k clusters of a graph with n
    vertices
                ⇢
                                  1                  k = n, k = 1
    S(n, k) =
                    kS(n   1, k) + S(n   1, k   1)    otherwise

¡  Nth Bell number: the total number of possible
    partitions              n
                           X
                     Bn =      S(n, k)
                                k=1
¡  The nth Bell number is huge, even for relatively
    small graphs
Partitioning
 algorithms
15

Kernighan and Lin, 1970:
Basic concepts (1/2)
¡  Given:
  ¡  A graph G = (N,A) of n vertices of weights wi > 0
  ¡  p a positive number s.t. wi ≤ p
  ¡  C = (cij) the weighted adjacency matrix (cost matrix)

¡  A k-way partition 𝚪 of G is a set of non-empty,
    pairwise disjoint set 𝜐1, …, 𝜐k such that:
                          k
                          [
                                i   =G
                          i=1
                                                             The sum of weights of
¡  A partition is admissible if:                            vertices in 𝜐i is less or
                     X                                       equal to p
                              wj  p     8i = 1, . . . , k
                     j2   i
16

Kernighan and Lin, 1970:
Basic concepts (2/2)
¡  The cost T of a partition 𝚪 is the summation of cij over all i and j
    such that i and j are in different clusters


                                                     5
                             b       cb2
               a                                         1
                                             2
                                 f   cf 4
                   e
                         c                       4
                                                             3




                       T ( ) = cb2 + cf 4
17

Kernighan and Lin, 1970:
2-way uniform partitioning prob.
¡  2-way uniform partitioning problem: finding a minimal cost
    partition of a given graph of 2n vertices (of equal weights) into
    two subsets of n vertices

                                                     5
                              b       cb2
                  a                                      1
                                             2
                                  f   cf 4
                      e
                          c                      4
                                                             3




¡  The Kernighan and Lin algorithm is a heuristic for solving the
    2-way uniform partitioning problem
18

Kernighan and Lin, 1970:
Basic principle (1/2)
¡  Basic principle: starting with any arbitrary
    partition 𝛤 = {A, B} of N try to decrease the initial
    cost T by a series of interchanges of elements of
    A and B

¡  When no further improvement is possible, the
    resulting partition 𝛤’ is locally minimum with
    respect to the algorithm
19

Kernighan and Lin, 1970:
Basic principle (2/2)
¡  Given:
  ¡  𝛤* = {A*, B*} is a minimum cost 2-way uniform
     partition
  ¡  𝛤 = {A, B} is a arbitrary 2-way uniform partition

¡  There are subsets X⊂A, Y⊂B with |X| = |Y| such
    that interchanging X and Y produces A* and B*

                  X             Y

             A            B             A⇤ = A     X +Y
                                        B⇤ = B     Y +X
                   Y            X

             A⇤          B⇤
20

Kernighan and Lin, 1970:
Internal and external cost
¡  Let’s define for each a∈A :
                                X
  ¡  External cost:     Ea =         cay
                                y2B
                                X
  ¡  Internal cost:     Ia =         cax
                                x2A

  ¡  Cost difference:   D a = Ea           Ia

¡  Similarly, define Eb, Ib, Db for each b∈B
21

Kernighan and Lin, 1970:
Cost reduction
¡  Lemma 1: Consider any a∈A, b∈B. If a and b
    are interchanged, the reduction in cost (i.e., the
    gain) is
          g=T       T 0 = Da + Db       2cab
¡  Lemma 2: Consider any a∈A, b∈B. If a and b
    are interchanged, the variations in the cost
    difference for all the other nodes are
        0
       Dx = Dx + 2cxa       2cxb   x ⇥ A  {a}
         0
        Dy = Dy + 2cyb      2cya   y ⇥ B  {b}
22

Kernighan and Lin, 1970:
The algorithm
1. Compute the D values for all elements of N
2. A1       A, B1     B;   X1 = ;, Y1 = ;;         i      1
3. While i < n                                                   Lemma 1

   (a) arg maxai 2A,bi 2B gi = Dai + Dbi               2cai bi
   (b) Xi+1         Xi [ {ai }, Yi+1       Yi [ {bi };
                                                                           Lemma 2
    (c) Ai+1        Ai  {ai }, Bi+1       Bi  {bi }
   (d) Recalculate the D values for the elements of Ai+1 , Bi+1
    (e) i     i+1
                                   Pk
4. Choose k to maximize G =            i   gi   k = 1, . . . , n

5. If G > 0 then swap Xk , Yk and go back to 1; if G = 0 exit
23

Newman and Girvan, 2004:
Betweenness (1/2)
¡  All paths from any two
    vertices in different
    communities pass along the
    few inter-community edges

¡  Betweenness: a measure
                                          j
    that favors edges that lie        i

    between communities and
    disfavors those that lie inside
    communities                               Bij ≫ 0
24

Newman and Girvan, 2004:
Betweenness (2/2)
¡  Different implementation of betweenness:
 ¡  Shortest-path betweenness: find the shortest path
     between all pairs of vertices and count how many
     run along each edge
 ¡  Random-walk betweenness: expected number of
     times that a random walk between a particular pair
     of vertices will pass down a particular edge and sum
     over all vertex pairs
 ¡  Current-flow betweenness: absolute value of current
     along the edge summed over all source/sink pairs
25

Newman and Girvan, 2004:
Basic principle
¡  Algorithm based on a divisive approach

¡  Basic principle: removes links with the highest
    betweenness
26

Newman and Girvan, 2004:
Algorithm
1.  Calculate betweennes scores for all edges in
    the network

2.  Find the edge with the highest score and
    remove it from the network

3.  Recalculate betweennes for all remaining
    edges

4.  Repeat from step 2
27

Newman and Girvan, 2004:
Dendrogram
¡  The output of the algorithms
    is called dendrogram

¡  Cutting the diagram
    horizontally at some height
    displays a possible partition
    of the graph




                                FIG. 2: A hierarchical tree or dendrogram illustrating the
                                type of output generated by the algorithms described here.
                                The circles at the bottom of the figure represent the indi-              FIG. 3
                                vidual vertices of the network. As we move up the tree the              at disc
                                vertices join together to form larger and larger communities,           vertice
                                as indicated by the lines, until we reach the top, where all are        even w
                                joined together in a single community. Alternatively, we the            munity
28

Bagrow and Bollt, 2008:
L-shell
¡  L-shell: given a starting
    vertex i, the l-shell is the set
    of all the i’s neighbors within
    a shortest path distance           i
    d≤l

¡  Example: 1-shell from
    starting vertex i
29

Bagrow and Bollt, 2008:
Emerging degree (1/2)
                                         1
¡  Emerging degree kj(i) of            K0 = 6
    internal vertex j: the number                    0
    of edges that connect j to
                                    1
    vertices external to the l-
                                            2
    shell
                                                 3
¡  Total emerging degree Kjl:                               4
    the total number of
    emerging edges from that l-
    shell                                                k1 (0) = 1
                                                         k2 (0) = 2
¡  Leading edge Sil: the set of
    all vertices exactly l steps                         k3 (0) = 1
    away from vertex i                                   k4 (0) = 2
30

Bagrow and Bollt, 2008:
Emerging degree (2/2)
                                        1
¡  Change in the total                K0 = 6
    emerging degree: for a shell                    0
    at depth l starting from
                                   1
    vertex i is
                                           2
              l
        l   Ki                                  3
       Ki = l 1                                             4
           Ki
                                                        k1 (0) = 1
                                                        k2 (0) = 2
                                                        k3 (0) = 1
                                                        k4 (0) = 2
31

Bagrow and Bollt, 2008:
Basic principle
¡  Basic principle: expanding an l-shell outward from
    some starting vertex i and comparing the change in
    total emerging to some thresholdα
                          l
                         Ki < ↵
¡  There are many interconnections within a
    community
 ¡  The total emerging degree tends to increase

¡  The edges connecting the community to the rest of
    the graph are less in number
 ¡  The total emerging degree tends to decrease sharply
32

Bagrow and Bollt, 2008:
Algorithm
1. Select starting vertex i; l    0
2. CM = ;
            0
3. Compute Ki
             l
4. While    Ki < ↵

    (a) l    l+1
                 l                     l
    (b) Compute Si ; CM          CM [ Si
                 l            l
    (c) Compute Ki and       Ki
33

Bagrow and Bollt, 2008:
αas “Social acceptance”
¡  The performance of the algorithm is strictly
    dependent on the value of α

¡  αcan be thought as a measure of social
    acceptance
  ¡  α≪1 indicates people who are more welcoming of
      their neighbors (the l-shell will spread to much of the
      network)
  ¡  α≫1 indicates hermit-like people who are unwilling
      to accept even their immediate neighbors into their
      communities (the l-shell will stop growing
      immediately)
Assess the
quality of good
       partitions
35

Expected properties of a
good partition (1/3)
¡  Problem: How to say that the partition my
    algorithm found is good?

¡  Given:
  ¡  A set N of n ≥ 2 points
  ¡  A distance function d: N x N → ℝ
  ¡  A partitioning function f that takes a distance
      function d on N and returns a partition 𝚪 on N
36

Expected properties of a
good partition (2/3)
¡  A partition is “good” if it satisfies a set of basic
    properties:
  ¡  Scale invariance: for any distance function d and
      any α> 0, we have f(d) = f(α⋅d)
  ¡  Richness: every partition of N must be a possible
      output of f(d)
  ¡  Consistency: if we produce a d’ by reducing
      distances within the clusters and enlarging distance
      between the clusters, the same same partition 𝚪
      should arise from d’
37

Expected properties of a
good partition (3/3)
¡  The impossibility theorem: for each n ≥ 2, there’s
    no partitioning function f that satisfies Scale-
    Invariance, Richness and Consistency at the
    same time
38



Quality functions
¡  Problem: In practical situations, the communities
    are not know ahead of time.
 ¡  How to asses the quality of the partition the
     algorithm found?

¡  It may be convenient to have a quantitative
    criterion to assess the goodness of a graph
    partition

¡  Quality function: a function that assigns a number
    to each partition of a graph
 ¡  Partitions can be ranked
39

Modularity:
Trace as a metric (1/2)
¡  Given a partition 𝛤 of G =
    (V,E), the fraction of edges
    that fall within the same
    community is
P
     Aij (ci , cj )
    ij                 1 X
     P              =       Aij (ci , cj )
       ij Aij         2m ij
                                                         red    green         blue
¡  Where:                                     red   5         0          2
  ¡  A is the adjacency matrix              green   0         9          2          x(1/27)
  ¡  𝛿(ci, cj) equals 1 iff ci = cj,
      0 otherwise
                                              blue   2         2          11

                                                               matrix e
40

Modularity:
Trace as a metric (2/2)
¡  The trace Tr(e) gives the fraction of edges in the
    network that connect vertices in the same
    community

¡  A good division in communities should have a
    high value of trace

¡  Problem: the trace on its own it is not a good
    indicator of the quality of the division
  ¡  Example: placing all vertices in a single community
      would give maximal Tr(e) = 1
41

Modularity:
Founding principle
¡  Solution: random graph is not expected to have a
    cluster structure

¡  The possible existence of clusters is revealed by
    the comparison between:
  ¡  The actual density of edges in a subgraph
  ¡  The density one would expect in the subgraph if the
      vertices of the graph were attached randomly (null
      model)
42

Quality functions:
Modularity function
¡  The modularity is the number of edges falling
    within groups minus the expected value of the
    same quantity in the case of a randomized
    network
              1 X
          Q=       (Aij      Pij ) (ci , cj )
             2m ij

¡  Pij is the expected number of edges between
    vertices i and j in the null model
43

Quality functions:
Modularity’s null model (1/2)
 ¡  Modularity’s null model: the random graph has to
     keep the same degree distribution of the original
     graph
     ¡  A vertex can be attached to any other vertex
   ¡  It’s simple to compute Pij
44

Quality functions:
Modularity’s null model (2/2)
¡  What is the expected
    number of edges between i
    and j in the null model?

¡  Given:                        (i) = ki               (j) = kj
  ¡  Total number of edges m
  ¡  Degree of i   (i) = ki
  ¡  Degree of j   (j) = kj
  ¡  The number of possible
      edges kikj out of 2m

¡  Expected number:
                                         ✓               ◆
            ki kj                   1 X          ki kj
      Pij =                     Q=         Aij               (ci , cj )
                                   2m ij         2m
            2m
45

Quality functions:
Modularity function
¡  Modularity,
  ¡  It can be negative
  ¡  It equals to 0 if there’s no community division (i.e.,
      the whole graph is a single cluster)
  ¡  It is size-dependent: graphs of different size cannot
      be compared
46



Bibliography
¡  F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, D. Parisi - Defining and
    identifying communities in networks, Proc. Natl. Acad. Sci. USA, 2004

¡  P. Erdős , A Rényi, On the evolution of random graphs, publication of
    the mathematical institute of the Hungarian Academy of Sciences,
    1960

¡  R.S. Burt, Positions in networks, Social Forces, 1976

¡  Wikipedia contributors, Stirling numbers of the second kind, Wikipedia,
    The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 1 Aug.
    2012. Web. 19 Sep. 201

¡  B.W. Kernighan, S. Lin, An Efficient Heuristic Procedure for Partitioning
    Graphs, Bell System Tech Journal No. 49, 1970

¡  M.E. Newman, M. Girvan, Finding and evaluating community structure
    in networks, Physical Review E, Vol. 69, No. 2.,11 Aug 2003
47



Bibliography
¡  J.P. Bagrow, E.M. Bollt, Local method for detecting communities,
    Physical Review E, 2005

¡  J. Kleinberg. An Impossibility Theorem for Clustering. Advances in
    Neural Information Processing Systems (NIPS) 15, 2002

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (10)

Recomendation system: Community Detection Based Recomendation System using Hy...
Recomendation system: Community Detection Based Recomendation System using Hy...Recomendation system: Community Detection Based Recomendation System using Hy...
Recomendation system: Community Detection Based Recomendation System using Hy...
 
Detecting Community Structures in Social Networks by Graph Sparsification
Detecting Community Structures in Social Networks by Graph SparsificationDetecting Community Structures in Social Networks by Graph Sparsification
Detecting Community Structures in Social Networks by Graph Sparsification
 
Community detection in social networks[1]
Community detection in social networks[1]Community detection in social networks[1]
Community detection in social networks[1]
 
Community detection algorithms
Community detection algorithmsCommunity detection algorithms
Community detection algorithms
 
Entropy based algorithm for community detection in augmented networks
Entropy based algorithm for community detection in augmented networksEntropy based algorithm for community detection in augmented networks
Entropy based algorithm for community detection in augmented networks
 
Community Extracting Using Intersection Graph and Content Analysis in Complex...
Community Extracting Using Intersection Graph and Content Analysis in Complex...Community Extracting Using Intersection Graph and Content Analysis in Complex...
Community Extracting Using Intersection Graph and Content Analysis in Complex...
 
Michal Erel's SIFT presentation
Michal Erel's SIFT presentationMichal Erel's SIFT presentation
Michal Erel's SIFT presentation
 
Visual Cryptography
Visual CryptographyVisual Cryptography
Visual Cryptography
 
Community Detection in Social Media
Community Detection in Social MediaCommunity Detection in Social Media
Community Detection in Social Media
 
FaceNet: A Unified Embedding for Face Recognition and Clustering
FaceNet: A Unified Embedding for Face Recognition and ClusteringFaceNet: A Unified Embedding for Face Recognition and Clustering
FaceNet: A Unified Embedding for Face Recognition and Clustering
 

Andere mochten auch

Advanced Methods in Network Science: Community Detection Algorithms
Advanced Methods in Network Science: Community Detection Algorithms Advanced Methods in Network Science: Community Detection Algorithms
Advanced Methods in Network Science: Community Detection Algorithms
Daniel Katz
 

Andere mochten auch (19)

MTAAP12: Scalable Community Detection
MTAAP12: Scalable Community DetectionMTAAP12: Scalable Community Detection
MTAAP12: Scalable Community Detection
 
A Discrete Krill Herd Optimization Algorithm for Community Detection
A Discrete Krill Herd Optimization Algorithm for Community DetectionA Discrete Krill Herd Optimization Algorithm for Community Detection
A Discrete Krill Herd Optimization Algorithm for Community Detection
 
thesis
thesisthesis
thesis
 
DIMACS10: Parallel Community Detection for Massive Graphs
DIMACS10: Parallel Community Detection for Massive GraphsDIMACS10: Parallel Community Detection for Massive Graphs
DIMACS10: Parallel Community Detection for Massive Graphs
 
Physics Inspired Approaches to Community Detection
Physics Inspired Approaches to Community DetectionPhysics Inspired Approaches to Community Detection
Physics Inspired Approaches to Community Detection
 
Scalable community detection with the louvain algorithm
Scalable community detection with the louvain algorithmScalable community detection with the louvain algorithm
Scalable community detection with the louvain algorithm
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 
Community detection
Community detectionCommunity detection
Community detection
 
Human mobility,urban structure analysis,and spatial community detection from ...
Human mobility,urban structure analysis,and spatial community detection from ...Human mobility,urban structure analysis,and spatial community detection from ...
Human mobility,urban structure analysis,and spatial community detection from ...
 
Μελέτη της οντολογίας FOAF (Friend of a Friend)
Μελέτη της οντολογίας FOAF (Friend of a Friend)Μελέτη της οντολογίας FOAF (Friend of a Friend)
Μελέτη της οντολογίας FOAF (Friend of a Friend)
 
Community Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief OverviewCommunity Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief Overview
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
 
Community detection in graphs with NetworKit
Community detection in graphs with NetworKitCommunity detection in graphs with NetworKit
Community detection in graphs with NetworKit
 
Community detection from a computational social science perspective
Community detection from a computational social science perspectiveCommunity detection from a computational social science perspective
Community detection from a computational social science perspective
 
Entropy based measures for graphs
Entropy based measures for graphsEntropy based measures for graphs
Entropy based measures for graphs
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
 
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
 
Advanced Methods in Network Science: Community Detection Algorithms
Advanced Methods in Network Science: Community Detection Algorithms Advanced Methods in Network Science: Community Detection Algorithms
Advanced Methods in Network Science: Community Detection Algorithms
 

Ähnlich wie Community Detection

Combinatorial Conditions For The Rigidity Of Tensegrity Frameworks By Recski
Combinatorial Conditions For The Rigidity Of Tensegrity Frameworks By RecskiCombinatorial Conditions For The Rigidity Of Tensegrity Frameworks By Recski
Combinatorial Conditions For The Rigidity Of Tensegrity Frameworks By Recski
Tensegrity Wiki
 
alexbeloi_thesis_082715_final
alexbeloi_thesis_082715_finalalexbeloi_thesis_082715_final
alexbeloi_thesis_082715_final
Alex Beloi
 
icml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part Iicml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part I
zukun
 
Conference Poster: Discrete Symmetries of Symmetric Hypergraph States
Conference Poster: Discrete Symmetries of Symmetric Hypergraph StatesConference Poster: Discrete Symmetries of Symmetric Hypergraph States
Conference Poster: Discrete Symmetries of Symmetric Hypergraph States
Chase Yetter
 
Lattices, sphere packings, spherical codes
Lattices, sphere packings, spherical codesLattices, sphere packings, spherical codes
Lattices, sphere packings, spherical codes
wtyru1989
 
7 4 Notes A
7 4 Notes A7 4 Notes A
7 4 Notes A
mbetzel
 
Proportional and decentralized rule mcst games
Proportional and decentralized rule mcst gamesProportional and decentralized rule mcst games
Proportional and decentralized rule mcst games
vinnief
 
Connected Dominating Set and Short Cycles
Connected Dominating Set and Short CyclesConnected Dominating Set and Short Cycles
Connected Dominating Set and Short Cycles
Neeldhara Misra
 
Biased normalized cuts
Biased normalized cutsBiased normalized cuts
Biased normalized cuts
irisshicat
 
transplantation-isospectral-poster
transplantation-isospectral-postertransplantation-isospectral-poster
transplantation-isospectral-poster
Feynman Liang
 

Ähnlich wie Community Detection (20)

An O(log^2 k)-approximation algorithm for k-vertex connected spanning subgraph
An O(log^2 k)-approximation algorithm for k-vertex connected spanning subgraphAn O(log^2 k)-approximation algorithm for k-vertex connected spanning subgraph
An O(log^2 k)-approximation algorithm for k-vertex connected spanning subgraph
 
Combinatorial Conditions For The Rigidity Of Tensegrity Frameworks By Recski
Combinatorial Conditions For The Rigidity Of Tensegrity Frameworks By RecskiCombinatorial Conditions For The Rigidity Of Tensegrity Frameworks By Recski
Combinatorial Conditions For The Rigidity Of Tensegrity Frameworks By Recski
 
26 spanning
26 spanning26 spanning
26 spanning
 
graph theory
graph theorygraph theory
graph theory
 
alexbeloi_thesis_082715_final
alexbeloi_thesis_082715_finalalexbeloi_thesis_082715_final
alexbeloi_thesis_082715_final
 
icml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part Iicml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part I
 
Conference Poster: Discrete Symmetries of Symmetric Hypergraph States
Conference Poster: Discrete Symmetries of Symmetric Hypergraph StatesConference Poster: Discrete Symmetries of Symmetric Hypergraph States
Conference Poster: Discrete Symmetries of Symmetric Hypergraph States
 
AlgoPerm2012 - 10 Xavier Goaoc
AlgoPerm2012 - 10 Xavier GoaocAlgoPerm2012 - 10 Xavier Goaoc
AlgoPerm2012 - 10 Xavier Goaoc
 
Complex and Social Network Analysis in Python
Complex and Social Network Analysis in PythonComplex and Social Network Analysis in Python
Complex and Social Network Analysis in Python
 
Notes on Spectral Clustering
Notes on Spectral ClusteringNotes on Spectral Clustering
Notes on Spectral Clustering
 
Lattices, sphere packings, spherical codes
Lattices, sphere packings, spherical codesLattices, sphere packings, spherical codes
Lattices, sphere packings, spherical codes
 
7 4 Notes A
7 4 Notes A7 4 Notes A
7 4 Notes A
 
Proportional and decentralized rule mcst games
Proportional and decentralized rule mcst gamesProportional and decentralized rule mcst games
Proportional and decentralized rule mcst games
 
Deep learning book_chap_02
Deep learning book_chap_02Deep learning book_chap_02
Deep learning book_chap_02
 
Community structure in complex networks
Community structure in complex networksCommunity structure in complex networks
Community structure in complex networks
 
Connected Dominating Set and Short Cycles
Connected Dominating Set and Short CyclesConnected Dominating Set and Short Cycles
Connected Dominating Set and Short Cycles
 
A Study of Periodic Points and Their Stability on a One-Dimensional Chaotic S...
A Study of Periodic Points and Their Stability on a One-Dimensional Chaotic S...A Study of Periodic Points and Their Stability on a One-Dimensional Chaotic S...
A Study of Periodic Points and Their Stability on a One-Dimensional Chaotic S...
 
Biconnectivity
BiconnectivityBiconnectivity
Biconnectivity
 
Biased normalized cuts
Biased normalized cutsBiased normalized cuts
Biased normalized cuts
 
transplantation-isospectral-poster
transplantation-isospectral-postertransplantation-isospectral-poster
transplantation-isospectral-poster
 

Mehr von Ilio Catallo

Mehr von Ilio Catallo (20)

C++ Standard Template Library
C++ Standard Template LibraryC++ Standard Template Library
C++ Standard Template Library
 
Regular types in C++
Regular types in C++Regular types in C++
Regular types in C++
 
Resource wrappers in C++
Resource wrappers in C++Resource wrappers in C++
Resource wrappers in C++
 
Memory management in C++
Memory management in C++Memory management in C++
Memory management in C++
 
Operator overloading in C++
Operator overloading in C++Operator overloading in C++
Operator overloading in C++
 
Multidimensional arrays in C++
Multidimensional arrays in C++Multidimensional arrays in C++
Multidimensional arrays in C++
 
Arrays in C++
Arrays in C++Arrays in C++
Arrays in C++
 
Pointers & References in C++
Pointers & References in C++Pointers & References in C++
Pointers & References in C++
 
Spring MVC - Wiring the different layers
Spring MVC -  Wiring the different layersSpring MVC -  Wiring the different layers
Spring MVC - Wiring the different layers
 
Java and Java platforms
Java and Java platformsJava and Java platforms
Java and Java platforms
 
Spring MVC - Web Forms
Spring MVC  - Web FormsSpring MVC  - Web Forms
Spring MVC - Web Forms
 
Spring MVC - The Basics
Spring MVC -  The BasicsSpring MVC -  The Basics
Spring MVC - The Basics
 
Web application architecture
Web application architectureWeb application architecture
Web application architecture
 
Introduction To Spring
Introduction To SpringIntroduction To Spring
Introduction To Spring
 
Gestione della memoria in C++
Gestione della memoria in C++Gestione della memoria in C++
Gestione della memoria in C++
 
Array in C++
Array in C++Array in C++
Array in C++
 
Puntatori e Riferimenti
Puntatori e RiferimentiPuntatori e Riferimenti
Puntatori e Riferimenti
 
Java Persistence API
Java Persistence APIJava Persistence API
Java Persistence API
 
JSP Standard Tag Library
JSP Standard Tag LibraryJSP Standard Tag Library
JSP Standard Tag Library
 
Internationalization in Jakarta Struts 1.3
Internationalization in Jakarta Struts 1.3Internationalization in Jakarta Struts 1.3
Internationalization in Jakarta Struts 1.3
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Community Detection

  • 1. Community Detection Ilio Catallo, catallo@elet.polimi.it Politecnico di Milano
  • 2. Outline ¡  Communities and Partitions ¡  What is a community? ¡  What is a partition? ¡  Partitioning algorithms ¡  Kerninghan and Lin, 1970 ¡  Newman and Girvan, 2004 ¡  Bagrow and Bollt, 2008 ¡  Assess the quality of good partitions ¡  The impossibility theorem ¡  Quality functions
  • 3. Communities and Partitions
  • 4. 4 What is a community? Intuition ¡  Community: a set of tightly connected nodes ¡  Examples: ¡  People with common interests ¡  Papers on the same topics ¡  Scholars working on the same field
  • 5. 5 What is a community? Local definitions (1/3) clique (complete subgraph) ¡  Too strict definition (what to do if just one link is missing?) ¡  Cliques are hard to find (exponential complexity in the graph size)
  • 6. 6 What is a community? Local definitions (2/3) Strong community: subgraph V ⊆ G such that each vertex has more connection within the community than with the rest of the graph in out ki (V ) > ki (V ) 8i 2 V The number of edges The number of connecting node i to connections toward other nodes belonging nodes in the rest of the to V graph
  • 7. 7 What is a community? Local definitions (3/3) ¡  Strong communitiy definition is too strict ¡  Unrealistic in many real cases ¡  Weak communities: subgraph V ⊆ G such that the sum of all degrees within V in greater than the sum of all degrees toward the rest of the network ¡  A strong community is also weak, while the converse is not generally true P in P out i2V ki (V )> i2V ki (V ) number of edges connecting number of edges connecting nodes in V toward nodes in the nodes in V to other nodes rest of the graph belonging to V
  • 8. 8 What is a community? Global definitions (1/2) ¡  Idea: the graph has a community structure if it is different from the random graph ¡  Random graph: graph such that each pair of vertices is connected with equal probability p, independently on the other pairs ¡  Any two vertices have the same probability to be adjacent ¡  No preferential linking involving
  • 9. 9 What is a community? Global definitions (2/2) ¡  The graph of interest is compared with the null model ¡  Null model: a graph which matches the original in some of its structural features, but which is otherwise a random graph ¡  Used as term of comparison to verify whether the graph of interest shows community structures
  • 10. 10 What is a community? Vertex-based definitions ¡  Idea: communities are subgraphs of vertices similar to each other ¡  A measure of similarity needs to be defined ¡  If it is possible to embed the vertices in an n- dimensional Euclidian space, possible (dis)similarity measures are: q PN 2 ¡  Euclidian distance dA,B = j (ak bk ) PN 2 ¡  Manhattan distance dA,B = j |(ak bk ) | A·B ¡  Cosine similarity dA,B = kAkkBk ¡  With A = (a1, a2, …, aN) and B = (b1, b2, …, bN) vertex feature vectors
  • 11. 11 What is a community? Vertex-based definitions ¡  If it is not possible to embed the vertices in Euclidian space the similarity must be inferred from the adjacency relationships ¡  Dissimilarity measure based on structural equivalence: qP dij = k6=i,j (Aik Ajk )2 ¡  Structural equivalence: two vertices are structural equivalent if they have the same neighbors, even if they are not adjacent themselves ¡  if i and j are structural equivalent then dij = 0
  • 12. 12 What is a partition? ¡  Partition: a division of a graph in clusters, such that each vertex belongs to one cluster ¡  If the vertices can be shared among different communities the division is called cover
  • 13. 13 How many partitions we may have in a graph? ¡  Stirling number of second kind: the number of possible partitions in k clusters of a graph with n vertices ⇢ 1 k = n, k = 1 S(n, k) = kS(n 1, k) + S(n 1, k 1) otherwise ¡  Nth Bell number: the total number of possible partitions n X Bn = S(n, k) k=1 ¡  The nth Bell number is huge, even for relatively small graphs
  • 15. 15 Kernighan and Lin, 1970: Basic concepts (1/2) ¡  Given: ¡  A graph G = (N,A) of n vertices of weights wi > 0 ¡  p a positive number s.t. wi ≤ p ¡  C = (cij) the weighted adjacency matrix (cost matrix) ¡  A k-way partition 𝚪 of G is a set of non-empty, pairwise disjoint set 𝜐1, …, 𝜐k such that: k [ i =G i=1 The sum of weights of ¡  A partition is admissible if: vertices in 𝜐i is less or X equal to p wj  p 8i = 1, . . . , k j2 i
  • 16. 16 Kernighan and Lin, 1970: Basic concepts (2/2) ¡  The cost T of a partition 𝚪 is the summation of cij over all i and j such that i and j are in different clusters 5 b cb2 a 1 2 f cf 4 e c 4 3 T ( ) = cb2 + cf 4
  • 17. 17 Kernighan and Lin, 1970: 2-way uniform partitioning prob. ¡  2-way uniform partitioning problem: finding a minimal cost partition of a given graph of 2n vertices (of equal weights) into two subsets of n vertices 5 b cb2 a 1 2 f cf 4 e c 4 3 ¡  The Kernighan and Lin algorithm is a heuristic for solving the 2-way uniform partitioning problem
  • 18. 18 Kernighan and Lin, 1970: Basic principle (1/2) ¡  Basic principle: starting with any arbitrary partition 𝛤 = {A, B} of N try to decrease the initial cost T by a series of interchanges of elements of A and B ¡  When no further improvement is possible, the resulting partition 𝛤’ is locally minimum with respect to the algorithm
  • 19. 19 Kernighan and Lin, 1970: Basic principle (2/2) ¡  Given: ¡  𝛤* = {A*, B*} is a minimum cost 2-way uniform partition ¡  𝛤 = {A, B} is a arbitrary 2-way uniform partition ¡  There are subsets X⊂A, Y⊂B with |X| = |Y| such that interchanging X and Y produces A* and B* X Y A B A⇤ = A X +Y B⇤ = B Y +X Y X A⇤ B⇤
  • 20. 20 Kernighan and Lin, 1970: Internal and external cost ¡  Let’s define for each a∈A : X ¡  External cost: Ea = cay y2B X ¡  Internal cost: Ia = cax x2A ¡  Cost difference: D a = Ea Ia ¡  Similarly, define Eb, Ib, Db for each b∈B
  • 21. 21 Kernighan and Lin, 1970: Cost reduction ¡  Lemma 1: Consider any a∈A, b∈B. If a and b are interchanged, the reduction in cost (i.e., the gain) is g=T T 0 = Da + Db 2cab ¡  Lemma 2: Consider any a∈A, b∈B. If a and b are interchanged, the variations in the cost difference for all the other nodes are 0 Dx = Dx + 2cxa 2cxb x ⇥ A {a} 0 Dy = Dy + 2cyb 2cya y ⇥ B {b}
  • 22. 22 Kernighan and Lin, 1970: The algorithm 1. Compute the D values for all elements of N 2. A1 A, B1 B; X1 = ;, Y1 = ;; i 1 3. While i < n Lemma 1 (a) arg maxai 2A,bi 2B gi = Dai + Dbi 2cai bi (b) Xi+1 Xi [ {ai }, Yi+1 Yi [ {bi }; Lemma 2 (c) Ai+1 Ai {ai }, Bi+1 Bi {bi } (d) Recalculate the D values for the elements of Ai+1 , Bi+1 (e) i i+1 Pk 4. Choose k to maximize G = i gi k = 1, . . . , n 5. If G > 0 then swap Xk , Yk and go back to 1; if G = 0 exit
  • 23. 23 Newman and Girvan, 2004: Betweenness (1/2) ¡  All paths from any two vertices in different communities pass along the few inter-community edges ¡  Betweenness: a measure j that favors edges that lie i between communities and disfavors those that lie inside communities Bij ≫ 0
  • 24. 24 Newman and Girvan, 2004: Betweenness (2/2) ¡  Different implementation of betweenness: ¡  Shortest-path betweenness: find the shortest path between all pairs of vertices and count how many run along each edge ¡  Random-walk betweenness: expected number of times that a random walk between a particular pair of vertices will pass down a particular edge and sum over all vertex pairs ¡  Current-flow betweenness: absolute value of current along the edge summed over all source/sink pairs
  • 25. 25 Newman and Girvan, 2004: Basic principle ¡  Algorithm based on a divisive approach ¡  Basic principle: removes links with the highest betweenness
  • 26. 26 Newman and Girvan, 2004: Algorithm 1.  Calculate betweennes scores for all edges in the network 2.  Find the edge with the highest score and remove it from the network 3.  Recalculate betweennes for all remaining edges 4.  Repeat from step 2
  • 27. 27 Newman and Girvan, 2004: Dendrogram ¡  The output of the algorithms is called dendrogram ¡  Cutting the diagram horizontally at some height displays a possible partition of the graph FIG. 2: A hierarchical tree or dendrogram illustrating the type of output generated by the algorithms described here. The circles at the bottom of the figure represent the indi- FIG. 3 vidual vertices of the network. As we move up the tree the at disc vertices join together to form larger and larger communities, vertice as indicated by the lines, until we reach the top, where all are even w joined together in a single community. Alternatively, we the munity
  • 28. 28 Bagrow and Bollt, 2008: L-shell ¡  L-shell: given a starting vertex i, the l-shell is the set of all the i’s neighbors within a shortest path distance i d≤l ¡  Example: 1-shell from starting vertex i
  • 29. 29 Bagrow and Bollt, 2008: Emerging degree (1/2) 1 ¡  Emerging degree kj(i) of K0 = 6 internal vertex j: the number 0 of edges that connect j to 1 vertices external to the l- 2 shell 3 ¡  Total emerging degree Kjl: 4 the total number of emerging edges from that l- shell k1 (0) = 1 k2 (0) = 2 ¡  Leading edge Sil: the set of all vertices exactly l steps k3 (0) = 1 away from vertex i k4 (0) = 2
  • 30. 30 Bagrow and Bollt, 2008: Emerging degree (2/2) 1 ¡  Change in the total K0 = 6 emerging degree: for a shell 0 at depth l starting from 1 vertex i is 2 l l Ki 3 Ki = l 1 4 Ki k1 (0) = 1 k2 (0) = 2 k3 (0) = 1 k4 (0) = 2
  • 31. 31 Bagrow and Bollt, 2008: Basic principle ¡  Basic principle: expanding an l-shell outward from some starting vertex i and comparing the change in total emerging to some thresholdα l Ki < ↵ ¡  There are many interconnections within a community ¡  The total emerging degree tends to increase ¡  The edges connecting the community to the rest of the graph are less in number ¡  The total emerging degree tends to decrease sharply
  • 32. 32 Bagrow and Bollt, 2008: Algorithm 1. Select starting vertex i; l 0 2. CM = ; 0 3. Compute Ki l 4. While Ki < ↵ (a) l l+1 l l (b) Compute Si ; CM CM [ Si l l (c) Compute Ki and Ki
  • 33. 33 Bagrow and Bollt, 2008: αas “Social acceptance” ¡  The performance of the algorithm is strictly dependent on the value of α ¡  αcan be thought as a measure of social acceptance ¡  α≪1 indicates people who are more welcoming of their neighbors (the l-shell will spread to much of the network) ¡  α≫1 indicates hermit-like people who are unwilling to accept even their immediate neighbors into their communities (the l-shell will stop growing immediately)
  • 34. Assess the quality of good partitions
  • 35. 35 Expected properties of a good partition (1/3) ¡  Problem: How to say that the partition my algorithm found is good? ¡  Given: ¡  A set N of n ≥ 2 points ¡  A distance function d: N x N → ℝ ¡  A partitioning function f that takes a distance function d on N and returns a partition 𝚪 on N
  • 36. 36 Expected properties of a good partition (2/3) ¡  A partition is “good” if it satisfies a set of basic properties: ¡  Scale invariance: for any distance function d and any α> 0, we have f(d) = f(α⋅d) ¡  Richness: every partition of N must be a possible output of f(d) ¡  Consistency: if we produce a d’ by reducing distances within the clusters and enlarging distance between the clusters, the same same partition 𝚪 should arise from d’
  • 37. 37 Expected properties of a good partition (3/3) ¡  The impossibility theorem: for each n ≥ 2, there’s no partitioning function f that satisfies Scale- Invariance, Richness and Consistency at the same time
  • 38. 38 Quality functions ¡  Problem: In practical situations, the communities are not know ahead of time. ¡  How to asses the quality of the partition the algorithm found? ¡  It may be convenient to have a quantitative criterion to assess the goodness of a graph partition ¡  Quality function: a function that assigns a number to each partition of a graph ¡  Partitions can be ranked
  • 39. 39 Modularity: Trace as a metric (1/2) ¡  Given a partition 𝛤 of G = (V,E), the fraction of edges that fall within the same community is P Aij (ci , cj ) ij 1 X P = Aij (ci , cj ) ij Aij 2m ij red green blue ¡  Where: red 5 0 2 ¡  A is the adjacency matrix green 0 9 2 x(1/27) ¡  𝛿(ci, cj) equals 1 iff ci = cj, 0 otherwise blue 2 2 11 matrix e
  • 40. 40 Modularity: Trace as a metric (2/2) ¡  The trace Tr(e) gives the fraction of edges in the network that connect vertices in the same community ¡  A good division in communities should have a high value of trace ¡  Problem: the trace on its own it is not a good indicator of the quality of the division ¡  Example: placing all vertices in a single community would give maximal Tr(e) = 1
  • 41. 41 Modularity: Founding principle ¡  Solution: random graph is not expected to have a cluster structure ¡  The possible existence of clusters is revealed by the comparison between: ¡  The actual density of edges in a subgraph ¡  The density one would expect in the subgraph if the vertices of the graph were attached randomly (null model)
  • 42. 42 Quality functions: Modularity function ¡  The modularity is the number of edges falling within groups minus the expected value of the same quantity in the case of a randomized network 1 X Q= (Aij Pij ) (ci , cj ) 2m ij ¡  Pij is the expected number of edges between vertices i and j in the null model
  • 43. 43 Quality functions: Modularity’s null model (1/2) ¡  Modularity’s null model: the random graph has to keep the same degree distribution of the original graph ¡  A vertex can be attached to any other vertex ¡  It’s simple to compute Pij
  • 44. 44 Quality functions: Modularity’s null model (2/2) ¡  What is the expected number of edges between i and j in the null model? ¡  Given: (i) = ki (j) = kj ¡  Total number of edges m ¡  Degree of i (i) = ki ¡  Degree of j (j) = kj ¡  The number of possible edges kikj out of 2m ¡  Expected number: ✓ ◆ ki kj 1 X ki kj Pij = Q= Aij (ci , cj ) 2m ij 2m 2m
  • 45. 45 Quality functions: Modularity function ¡  Modularity, ¡  It can be negative ¡  It equals to 0 if there’s no community division (i.e., the whole graph is a single cluster) ¡  It is size-dependent: graphs of different size cannot be compared
  • 46. 46 Bibliography ¡  F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, D. Parisi - Defining and identifying communities in networks, Proc. Natl. Acad. Sci. USA, 2004 ¡  P. Erdős , A Rényi, On the evolution of random graphs, publication of the mathematical institute of the Hungarian Academy of Sciences, 1960 ¡  R.S. Burt, Positions in networks, Social Forces, 1976 ¡  Wikipedia contributors, Stirling numbers of the second kind, Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 1 Aug. 2012. Web. 19 Sep. 201 ¡  B.W. Kernighan, S. Lin, An Efficient Heuristic Procedure for Partitioning Graphs, Bell System Tech Journal No. 49, 1970 ¡  M.E. Newman, M. Girvan, Finding and evaluating community structure in networks, Physical Review E, Vol. 69, No. 2.,11 Aug 2003
  • 47. 47 Bibliography ¡  J.P. Bagrow, E.M. Bollt, Local method for detecting communities, Physical Review E, 2005 ¡  J. Kleinberg. An Impossibility Theorem for Clustering. Advances in Neural Information Processing Systems (NIPS) 15, 2002