The document proposes an entropy-based algorithm to detect communities in augmented social networks. It begins with an introduction that motivates using both the graph structure and node attributes to find communities. It then outlines the clustering algorithm, which first uses modularity optimization on the graph to generate an initial partition, and then performs entropy optimization on the partition using the node attributes. Experimental results on student networks show that using attributes leads to different community configurations than using the graph alone, and that the algorithm runs in linear time and memory usage.
Entropy based algorithm for community detection in augmented networks
1. mm 40 60 80 100 120
Entropy Based
40 Community Detection in
Augmented Social
Networks
60
J. Cruz1
1 LUSSI
C. Bothorel1 F. Poulet2
Department
Telecom – Bretagne
France
2 IRISA
80 Rennes 1 University
France
2. Outline
mm 40 60 80 100 120
1 Introduction
Motivation
Related Work
40
Augmented Networks
2 Clustering Algorithm
60
3 Experiments and Results
4 Conclusions
80
page 2 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
3. Motivation
Introduction Motivation
mm 40 60 80 100 120
A social network is composed of actors, An augmented network:
persons or organizations, and the
links between them.
Social networks have been “simplified”
40
to fit into graph structures, leaving
behind any additional information... Node Attributes
That information correspond to the
semantic, and yet social, aspects of
the 60
network
The question is:
How we can use both, the graph and the social information
80
to detect communities?
page 3 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
4. Related Work
Introduction Related Work
mm 40 60 80 100 120
Data Clustering
Unsupervised clustering algorithms using some (dis)similarity
measure between points in some n−dimensional space.
Hierarchical clustering [1].
40
k −means, fuzzy c−means [1].
Self–organizing maps [2]
Communities Detection
Algorithms designed to find community structures in graphs using
60
information from edges.
Modularity optimization: Newman [3], Blondel [4] . . .
Overlapping communities using GAs Pizzuti [5] . . .
Community detection using attributes and structural information
80
[6].
page 4 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
5. Quality Measures / Data Types
Introduction Related Work
mm
Type 40 Objective 60 80 100
Examples 120
Reduce the distance between Manhattan L1
the members of the same group Euclidean L2
Data
while the distance between Chebyshev L∞
40 groups is increased. Entropy H
Increase the number of edges Coverage γ
within each community while Conductance ϕ
Graphs
the number of edges between Performance perf
communities is reduced. Modularity Q
60
The selected measures:
Entropy measures the disorder of Modularity measures the fraction
each group: the more similar the of edges falling into the groups
objects, more ordered is the group minus the minimum number of
80
(a.k.a less entropy). expected edges between nodes
[7].
page 5 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
6. Semantic Information
Introduction Augmented Networks
mm 40 60 80 100 120
Given an augmented network G (V , E, FV ):
∗
Given a subset of features of the nodes: FV ∈ P (FV ),
Each node is associated with a vector ξ of f attributes.
ξ ∈ Rf
40
The union of all the vectors ξFV is the vectorial
∗
representation of the node set:
ASFV∗
60
Node Attr 1 Attr 2 . . . Attr f The attributes set AS is
1 ξ11 ξ12 ··· ξ1f the matricial representa-
2 ξ21 ξ22 ··· ξ2f tion of the augmented in-
.
. .
. .
. .. .
. formation from the net-
. . . . .
80 work.
n ξn1 ξn2 ··· ξnf
page 6 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
7. Data Entropy
Introduction Augmented Networks
mm 40 60 80 100 120
Given a group C of N = |C| elements, the entropy H (C) of the
group is given by:
N−1 N
40 H (C) = − sij ln sij + 1 − sij ln 1 − sij
i=1 j=i+1
where sij is a similarity measure of nodes i and j .
Similarity measures?
60
Entropy measures the (dis)order of a partition, however it is
necessary to calculate the distance between the nodes. This is
made using metrics like the Cosine distance and the Jaccard
distance among others.
80
page 7 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
8. General Architecture
Clustering Algorithm
mm 40 60 This80 the general architec-
is 100 120
Augmented
ture of the algorithm, which
Network
finds communities using struc-
tural and semantic criteria at
40 the same time extracted from
G (V , E, FV )
60
80
page 8 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
9. General Architecture
Clustering Algorithm
mm 40 60 80 100 120
Modularity
Augmented G
Optimization
Network Using the social graph G (V , E)
First Step
the algorithm finds a first parti-
40 First tion C0 with optimal modularity
Partition
C0
60
80
page 8 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
10. General Architecture
Clustering Algorithm
mm 40 60 80 100 120
.
40 .
.
60
Using the structure of The algorithm takes a ...the node is assigned
the social network in random node and put to that community; the
which each node is a it into a random node is returned
community. community, if the otherwise. The result
80
movement increases is the partition C0
the modularity...
page 8 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
11. General Architecture
Clustering Algorithm
mm 40 60
Modularity The 80entropy optimization 120
100
al-
Augmented G
Optimization gorithm uses the partition C0
Network
First Step as initial configuration and the
PoVFV from the augmented
∗
40 First network to move nodes across
Partition the groups.
C0
Entropy
60 ASFV
∗
Entropy Op-
Partition
timization
CH
80
page 8 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
12. General Architecture
Clustering Algorithm
mm 40 60 80 100 120
.
40 .
.
60
Given an initial Take a random point ...take the point back
partition C0 from the and insert it into a to its original group
first step of the random group. If the otherwise. The result
modularity entropy is reduced, is the partition CH
80
optimization... leave the point in its
new group...
page 8 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
13. General Architecture
Clustering Algorithm
mm 40 60 80 100 120
Modularity
Augmented G The partition CH has the same
Optimization
Network number of groups as C0 but
First Step
with a different configuration.
40 First The modularity optimization al-
Partition gorithm will continue with CH .
C0
Entropy
60 ASFV
∗
Entropy Op- Community
Partition
timization Aggregation
CH
80 Final Partition Ck
page 8 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
14. General Architecture
Clustering Algorithm
mm 40 60 80 100 120
Ent ropy
Opt imizat ion
40
Communit y
Det ect ion
Communit y
60 Aggregat ion
80
Adapted from [4]
page 8 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
15. Experimental Setup
Experiments and Results
mm 40 60 80 100 120
Data used:
Each graph in this data set The graph contains 6386 nodes
contains a set of semantic and 435324 edges. Has an initial
information for each node:
40 modularity of −2.8629 × 10−4 .
Student faculty In each case, the initial entropy
has been calculated using different
Gender criteria:
Major
60 AS Feature H0 Classes
1 Gender 0.2286 3
Second major/minor 2 Major 0.2318 77
House 30 executions of the experiments
were performed for each point of view.
Year
80
High school
page 9 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
16. Results
Experiments and Results
mm 40 60 80 100 120
There is a compromise between the entropy and the
modularity.
There are 7 communities for each attribute set AS:
40 From 3 classes in AS1
•
• From 77 classes in AS
2
Results – Measures
AS Exp. Average Q Average Entropy
60
AS1 CFU 0.4180 (±0) 0.2286 (±0)
CFU+Ent 0.2565 (±0.006065) 0.1381 (±0.0025741)
AS2 CFU 0.4180 (±0) 0.2318 (±0)
CFU+Ent 0.2440 (±0.004242) 0.1356 (±0.001493)
80
page 10 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
17. Results
Experiments and Results
mm 40 60 80 100 120
r
y
nde
use
cult
jor
or
ar
.
H. S
Min
Ye
Ge
Ma
Fa
Ho
Results – Rand Index
Pair Rand Index
AS∅ − ASGender 0.4232
40
AS∅ − ASMajor 0.3070
ASGender − ASMajor 0.3919
Each partition configuration is
r
different for each attribute set
nde
jor
Ma
Ge
60
Non–topological information
change the result of the
clustering process
80
page 10 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
18. Algorithm Complexity Considerations
Experiments and Results
mm 40 60 80 100 120
Algorithm Execution Time
The complexity of entropy 60000
calculation is, in general
50000
O n2 × f (n points and f
features).
Execution Time (ms)
40 40000
Using only the contribution of 30000
a point to the group entropy,
20000
the complexity is reduced to a
near–linear behavior. 10000
60
Using a fixed number of 0
0 20 40 60 80 100
nodes (6386) and varying Number of Features
only the number of features Simple Matching Coefficient
Cosine Distance
this linear behavior is
observed.
80
page 11 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
19. Algorithm Complexity Considerations
Experiments and Results
Algorithm Memory Usage
mm 40 60 900
80 100 120
800
In general, the memory
Memory Used (Mb)
700
usage is linear, however, the 600
SMS graph is stepper than 500
the cosine distance.
40 400
300
For the SMC near the 40 0 20 40 60 80 100
features, the memory used SMC
Number of Features
Memory Baseline
grows, coinciding with the
900
execution time increase.
60 800
The behavior of the graphs is
Memory Used (Mb)
700
due to the Java’s memory 600
management system. 500
Anyway, the usage never 400
explodes.
80 300
0 20 40 60 80 100
Number of Features
Cosine Distance Memory Baseline
page 11 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
20. Conclusions
Conclusions
mm 40 60 80 100 120
Each type of information in the augmented network has
different representations different and measures of
similarity: those measures behave oppositely.
A entropy based algorithm has been proposed to cluster
40
an augmented network.
Using different points of view it is possible to have different
partition configuration from the same social graph.
The overall complexity of the algorithm is linear on the
60
number of features used to calculate the entropy.
The memory used increases although it does not explode
when the number attributes is increased.
80
page 12 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
21. mm 40 60 80 100 120
Thank you.
40
Do you have any questions?
60
80
page 13 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
22. Bibliography I
Appendix Bibliography
mm 40 60 80 100 120
M. Kantardzic, Data Mining: Concepts, Models, Methods,
and Algorithms.
Wiley-IEEE Press, 1 ed., Oct. 2002.
40
T. Kohonen, Self-Organizing Maps.
Springer, 1997.
M. E. Newman, “Scientific collaboration networks. ii.
shortest paths, weighted networks, and centrality.,” Physical
60
Review. E, Statistical Nonliner and Soft Matter Physics,
vol. 64, p. 7, July 2001.
80
page 14 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
23. Bibliography II
Appendix Bibliography
mm 40 60 80 100 120
V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and
E. Lefebvre, “Fast unfolding of communities in large
networks,” Journal of Statistical Mechanics: Theory and
Experiment, vol. 2008, no. 10, p. P10008 (12pp), 2008.
40
C. Pizzuti, “Overlapped community detection in complex
networks,” in GECCO ’09: Proceedings of the 11th Annual
conference on Genetic and evolutionary computation, (New
York, NY, USA), pp. 859–866, ACM, 2009.
60
Y. Zhou, H. Cheng, and J. X. Yu, “Graph clustering based
on structural/attribute similarities,” Proc. VLDB Endow.,
vol. 2, pp. 718–729, August 2009.
80
page 15 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
24. Bibliography III
Appendix Bibliography
mm 40 60 80 100 120
M. E. J. Newman and M. Girvan, “Finding and evaluating
community structure in networks,” Physical Review. E,
Statistical Nonliner and Soft Matter Physics, vol. 69,
p. 026113, Feb 2004.
40
T. Li, S. Ma, and M. Ogihara, “Entropy-based criterion in
categorical clustering,” in Proceedings of the twenty-first
international conference on Machine learning, ICML ’04,
(New York, NY, USA), pp. 68–, ACM, 2004.
60
80
page 16 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
25. Entropy Minimization Algorithm [8]
Appendix Bibliography
mm 40 60 80 100 120
Given a partition C:
A
1. Calculate the set’s initial entropy
40
2. Take a random point from a random
group and insert it into other random B
cluster
3. Has the entropy improved?
60
3.1 Yes: leave the point in its new cluster
3.2 No: take back the point to its original
cluster
C
4. Go to 2 until no further changes can be
80
made
page 17 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
26. Entropy Minimization Algorithm [8]
Appendix Bibliography
mm 40 60 80 100 120
Given a partition C:
A
1. Calculate the set’s initial entropy
40
2. Take a random point from a random
group and insert it into other random B
cluster
3. Has the entropy improved?
60
3.1 Yes: leave the point in its new cluster
3.2 No: take back the point to its original
cluster
C
4. Go to 2 until no further changes can be
80
made
page 17 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
27. Entropy Minimization Algorithm [8]
Appendix Bibliography
mm 40 60 80 100 120
Given a partition C:
A
1. Calculate the set’s initial entropy
40
2. Take a random point from a random
group and insert it into other random B
cluster
3. Has the entropy improved?
60
3.1 Yes: leave the point in its new cluster
3.2 No: take back the point to its original
cluster
C
4. Go to 2 until no further changes can be
80
made
page 17 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
28. Entropy Minimization Algorithm [8]
Appendix Bibliography
mm 40 60 80 100 120
Given a partition C:
A
1. Calculate the set’s initial entropy
40
2. Take a random point from a random
group and insert it into other random B
cluster
3. Has the entropy improved?
60
3.1 Yes: leave the point in its new cluster
3.2 No: take back the point to its original
cluster
C
4. Go to 2 until no further changes can be
80
made
page 17 CRUZ, BOTHOREL, POULET Entropy Based Community Detection
29. Entropy Minimization Algorithm [8]
Appendix Bibliography
mm 40 60 80 100 120
Given a partition C:
A
1. Calculate the set’s initial entropy
40
2. Take a random point from a random
group and insert it into other random B
cluster
3. Has the entropy improved?
60
3.1 Yes: leave the point in its new cluster
3.2 No: take back the point to its original
cluster
C
4. Go to 2 until no further changes can be
80
made
page 17 CRUZ, BOTHOREL, POULET Entropy Based Community Detection