System Simulation and Modelling with types and Event Scheduling
Modelling the Clustering Coefficient of a Random graph
1. MODELLING THE CLUSTERING COEFFICIENT OF A
RANDOM GRAPH
GRAPH-TA. MARCH, 2016
A. Duarte-López, A. Prat-Pérez,
M. Pérez-Casany, J. Larriba-Pey
DAMA - UPC
2. Objectives
To create an algorithm that generates random graphs with:
An specific degree distribution.
An specific average clustering coefficient (ACC) [1].
For a given node i,
CCi
# of closed tringles
# of triples of a node
ACC
1
n
n
i
CCi
3. Motivation
Using graphs with realistic properties like datasets:
It is not always feasible to use real graphs (due to privacy
preserving concerns or technical issues).
They have a high importance for many research or
benchmarking applications.
Most of the random graph generators do not concern about
mimic characteristics of real graph.
4. Research steps
1) To focus on a single cluster and to model de CC of the node
with the largest degree.
2) To consider a single cluster and to adjust the ACC.
3) To generalize the theory to multiple clusters.
In all cases different degree distributions will be considered.
5. Step I
Given a degree sequence (d1, d2, ..., dn) from a MoeZip f (α, β)
[2].
N: Total number of nodes.
n: Total number of nodes into the cluster.
k: Maximum degree in the cluster.
p1: Probability of connecting two nodes that belong to the
same community.
p2: Probability of connecting one node of a community
with one node in the other community.
Goal: After connecting the graph get E[CCi1 ] equal to target
value.
6. Algorithm
Given a graphic [3] degree sequence and a target clustering
coefficient, the steps are:
1) To split the graph into two communities (C1 and C2).
2) To connect two nodes in C1 with probability p1.
3) To connect two nodes in different communities with
probability p2 (p1 > p2).
4) To connect two nodes into C2 with probability p1.
Repeat the procedure while it is possible.
Goal: To find the values of p1 and p2 that satisfy:
E[CCi1 ] targetCC.
8. Extended Hypergeometric Distribution
Let Xi1 and Yi1 be the number of connections of node i1 in the
communities C1 and C2 respectively. Xi1 ∼ Bin(n, p1) and
Yi1 ∼ Bin(N − n, p2) where N >> n.
By definition,
Xi1 |Xi1 +Yi1 k ∼ ExtHypDist(N, n, k, λ)
Pr(X x)
n
x
N−n
m−x exλ
j∈S
n
j
N−n
m−j ejλ
;
where λ
p1
p2
and max(0, n + m − N) ≤ x ≤ min(m; n). [4]
10. Bibliography
[1] Mark Newman. Networks: an introduction. OUP Oxford, 2010.
[2] Marta Pérez-Casany and Aina Casellas. Marshall-olkin extended zipf distribution.
arXiv preprint arXiv:1304.4540, 2013.
[3] Gerard Sierksma and Han Hoogeveen. Seven criteria for integer sequences being
graphic. Journal of Graph theory, 15(2):223–231, 1991.
[4] Daniel Zelterman. Models for Discreet Data. Oxford University Press, USA, 1999.