Modelling the Clustering Coefficient of a Random graph

•

0 gefällt mir•439 views

Graph-TA

by Ariel Duarte-López GRAPH-TA 2016

Ingenieurwesen

MODELLING THE CLUSTERING COEFFICIENT OF A
RANDOM GRAPH
GRAPH-TA. MARCH, 2016
A. Duarte-López, A. Prat-Pérez,
M. Pérez-Casany, J. Larriba-Pey
DAMA - UPC

Objectives
To create an algorithm that generates random graphs with:
An specific degree distribution.
An specific average clustering coefficient (ACC) [1].
For a given node i,
CCi
# of closed tringles
# of triples of a node
ACC
1
n
n
i
CCi

Motivation
Using graphs with realistic properties like datasets:
It is not always feasible to use real graphs (due to privacy
preserving concerns or technical issues).
They have a high importance for many research or
benchmarking applications.
Most of the random graph generators do not concern about
mimic characteristics of real graph.

Research steps
1) To focus on a single cluster and to model de CC of the node
with the largest degree.
2) To consider a single cluster and to adjust the ACC.
3) To generalize the theory to multiple clusters.
In all cases different degree distributions will be considered.

Step I
Given a degree sequence (d1, d2, ..., dn) from a MoeZip f (α, β)
[2].
N: Total number of nodes.
n: Total number of nodes into the cluster.
k: Maximum degree in the cluster.
p1: Probability of connecting two nodes that belong to the
same community.
p2: Probability of connecting one node of a community
with one node in the other community.
Goal: After connecting the graph get E[CCi1 ] equal to target
value.

Algorithm
Given a graphic [3] degree sequence and a target clustering
coefficient, the steps are:
1) To split the graph into two communities (C1 and C2).
2) To connect two nodes in C1 with probability p1.
3) To connect two nodes in different communities with
probability p2 (p1 > p2).
4) To connect two nodes into C2 with probability p1.
Repeat the procedure while it is possible.
Goal: To find the values of p1 and p2 that satisfy:
E[CCi1 ] targetCC.

Extended Hypergeometric Distribution
Let Xi1 and Yi1 be the number of connections of node i1 in the
communities C1 and C2 respectively. Xi1 ∼ Bin(n, p1) and
Yi1 ∼ Bin(N − n, p2) where N >> n.
By definition,
Xi1 |Xi1 +Yi1 k ∼ ExtHypDist(N, n, k, λ)
Pr(X x)
n
x
N−n
m−x exλ
j∈S
n
j
N−n
m−j ejλ
;
where λ
p1
p2
and max(0, n + m − N) ≤ x ≤ min(m; n). [4]

Expected clustering coefficient
E[CCi1 ]
1
ki1 (ki1 − 1)
x∈S
(x(x − 1)p1)
+ (x(ki1 − x)p2) + (ki1 − x)(ki1 − x − 1)p1 P(Xi1 x)
E[CCi1 ] = target value ⇒ λ

Bibliography
[1] Mark Newman. Networks: an introduction. OUP Oxford, 2010.
[2] Marta Pérez-Casany and Aina Casellas. Marshall-olkin extended zipf distribution.
arXiv preprint arXiv:1304.4540, 2013.
[3] Gerard Sierksma and Han Hoogeveen. Seven criteria for integer sequences being
graphic. Journal of Graph theory, 15(2):223–231, 1991.
[4] Daniel Zelterman. Models for Discreet Data. Oxford University Press, USA, 1999.

Weitere ähnliche Inhalte

Was ist angesagt?

Ddp Cs3.0 Solar Systemboonzaai

Lec4 ClusteringJeff Hammerbacher

matab no4moeen khan afridi

Domain Driven Design In C#3.0Pieter Joost van de Sande

Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...MLAI2

Better prime counting formulaChris De Corte

Approximating Value of pi(Π) using Monte Carlo Iterative MethodNischal Lal Shrestha

Clustering (from Google)Sri Prasanna

Programming Assignment HelpProgramming Homework Help

Ch8sadhanakumble

Weakly supervised semantic segmentation of 3D point cloudArithmer Inc.

Visualizing Data Using t-SNEDavid Khosid

Md2k 0219 shangBBKuhn

Alg2 Notes Unit 1 Day 5Kate Nowak

High Dimensional Data Visualization using t-SNEKai-Wen Zhao

Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...Association for Computational Linguistics

Presentation of my master thesis - Image ProcessingMichaelRra

Development InfographicRealMassive

Was ist angesagt? (18)

Ddp Cs3.0 Solar System

Lec4 Clustering

matab no4

Domain Driven Design In C#3.0

Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...

Better prime counting formula

Approximating Value of pi(Π) using Monte Carlo Iterative Method

Clustering (from Google)

Programming Assignment Help

Ch8

Weakly supervised semantic segmentation of 3D point cloud

Visualizing Data Using t-SNE

Md2k 0219 shang

Alg2 Notes Unit 1 Day 5

High Dimensional Data Visualization using t-SNE

Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...

Presentation of my master thesis - Image Processing

Development Infographic

Andere mochten auch

Holistic Benchmarking of Big Linked Data: HOBBITGraph-TA

Benchmarking Versioning for Big Linked DataGraph-TA

Graphalytics: A big data benchmark for graph-processing platformsGraph-TA

Reactive Databases for Big Data applicationsGraph-TA

Use of Graphs for Cloud Service Selection in Multi-Cloud EnvironmentsGraph-TA

Identifiability in Dynamic Casual NetworksGraph-TA

Polyglot Graph Databases using OCL as pivotGraph-TA

The scarcity of crossing dependencies: a direct outcome of a specific constra...Graph-TA

Using Evolutionary Computing for Feature-driven Graph generationGraph-TA

Synthetic Data Generation using exponential random Graph modelingGraph-TA

Computing on Event-sourced GraphsGraph-TA

Paul Biya - Président du Cameroun - Mot du capitaine des Lions Indomptables l...Paul Biya

Tutorial aprendiendo a programarEduardo Méndez

Per no. 913 th 2002 angka kecukupan giziPurwani Handayani

Professional photographyJude Smith

Invito incontro 06.10.12il Ciriaco

Unofficial henderson TranscriptClay White

Internship Summary PperJohnnie Ethington

Global CCS Institute - Day 2 - Keynote - CCUS in the United StatesGlobal CCS Institute

Real estateAsphri457

Andere mochten auch (20)

Holistic Benchmarking of Big Linked Data: HOBBIT

Benchmarking Versioning for Big Linked Data

Graphalytics: A big data benchmark for graph-processing platforms

Reactive Databases for Big Data applications

Use of Graphs for Cloud Service Selection in Multi-Cloud Environments

Identifiability in Dynamic Casual Networks

Polyglot Graph Databases using OCL as pivot

The scarcity of crossing dependencies: a direct outcome of a specific constra...

Using Evolutionary Computing for Feature-driven Graph generation

Synthetic Data Generation using exponential random Graph modeling

Computing on Event-sourced Graphs

Paul Biya - Président du Cameroun - Mot du capitaine des Lions Indomptables l...

Tutorial aprendiendo a programar

Per no. 913 th 2002 angka kecukupan gizi

Professional photography

Invito incontro 06.10.12

Unofficial henderson Transcript

Internship Summary Pper

Global CCS Institute - Day 2 - Keynote - CCUS in the United States

Real estate

Ähnlich wie Modelling the Clustering Coefficient of a Random graph

$Learning multifractal structure in large networks (KDD 2014)$ $Learning multifractal structure in large networks (KDD 2014)$

Learning multifractal structure in large networks (KDD 2014)Austin Benson

An Efficient Method of Partitioning High Volumes of Multidimensional Data for...IJERA Editor

Parallel Implementation of K Means Clustering on CUDAprithan

11 clusadvancedJoonyoungJayGwak

Chapter 11. Cluster Analysis Advanced Methods.pptSubrata Kumer Paul

K-means Clustering Algorithm with Matlab Source codegokulprasath06

T24144148IJERA Editor

Aaa ped-17-Unsupervised Learning: Dimensionality reductionAminaRepo

On Optimization of Network-coded Scalable Multimedia Service MulticastingAndrea Tassi

Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Salah Amean

Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Varad Meru

Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...ArchiLab 7

Project PPTDhaarna Singh

Information-theoretic clustering with applicationsFrank Nielsen

Computer Network Assignment HelpComputer Network Assignment Help

$Learning multifractal structure in large networks (Purdue ML Seminar)$ $Learning multifractal structure in large networks (Purdue ML Seminar)$

Learning multifractal structure in large networks (Purdue ML Seminar)Austin Benson

11ClusAdvanced.pptSueMiu

Principal Components Analysis, Calculation and VisualizationMarjan Sterjev

CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdfRajJain516913

kcdeCNP Slagle

Ähnlich wie Modelling the Clustering Coefficient of a Random graph (20)

$Learning multifractal structure in large networks (KDD 2014)$ $Learning multifractal structure in large networks (KDD 2014)$

Learning multifractal structure in large networks (KDD 2014)

An Efficient Method of Partitioning High Volumes of Multidimensional Data for...

Parallel Implementation of K Means Clustering on CUDA

11 clusadvanced

Chapter 11. Cluster Analysis Advanced Methods.ppt

K-means Clustering Algorithm with Matlab Source code

T24144148

Aaa ped-17-Unsupervised Learning: Dimensionality reduction

On Optimization of Network-coded Scalable Multimedia Service Multicasting

Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...

Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...

Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...

Project PPT

Information-theoretic clustering with applications

Computer Network Assignment Help

$Learning multifractal structure in large networks (Purdue ML Seminar)$ $Learning multifractal structure in large networks (Purdue ML Seminar)$

Learning multifractal structure in large networks (Purdue ML Seminar)

11ClusAdvanced.ppt

Principal Components Analysis, Calculation and Visualization

CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdf

kcde

Mehr von Graph-TA

RDF Graph Data Management in Oracle Database and NoSQL PlatformsGraph-TA

GRAPHITE — An Extensible Graph Traversal Framework for RDBMSGraph-TA

On the Discovery of Novel Drug-Target Interactions from Dense SubGraphsGraph-TA

Graphalytics: A big data benchmark for graph processing platformsGraph-TA

Autograph: an evolving lightweight graph toolGraph-TA

Understanding Graph Structure in Knowledge BasesGraph-TA

Finding patterns of chronic disease and medication prescriptions from a large...Graph-TA

Recent Updates on IBM System G — GraphBIG and Temporal DataGraph-TA

Analysing the degree distribution of real graphs by means of several probabil...Graph-TA

SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...Graph-TA

Generating synthetic online social network graph data and topologiesGraph-TA

Deriving an Emergent Relational Schema from RDF DataGraph-TA

Managing RDF data with graph databasesGraph-TA

Graph Based Word Spotting Approach for Large Document CollectionsGraph-TA

Use of graphs for political analysisGraph-TA

Graphium Chrysalis: Exploiting Graph DatabaseGraph-TA

Langford sequences through a product of labeled digraphsGraph-TA

Mehr von Graph-TA (17)

RDF Graph Data Management in Oracle Database and NoSQL Platforms

GRAPHITE — An Extensible Graph Traversal Framework for RDBMS

On the Discovery of Novel Drug-Target Interactions from Dense SubGraphs

Graphalytics: A big data benchmark for graph processing platforms

Autograph: an evolving lightweight graph tool

Understanding Graph Structure in Knowledge Bases

Finding patterns of chronic disease and medication prescriptions from a large...

Recent Updates on IBM System G — GraphBIG and Temporal Data

Analysing the degree distribution of real graphs by means of several probabil...

SPIMBENCH: A Scalable, Schema-Aware Instance Matching Benchmark for the Seman...

Generating synthetic online social network graph data and topologies

Deriving an Emergent Relational Schema from RDF Data

Managing RDF data with graph databases

Graph Based Word Spotting Approach for Large Document Collections

Use of graphs for political analysis

Graphium Chrysalis: Exploiting Graph Database

Langford sequences through a product of labeled digraphs

Kürzlich hochgeladen

young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst

TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1

complete construction, environmental and economics information of biomass com...asadnawaz62

Industrial Safety Unit-IV workplace health and safety.pptNarmatha D

National Level Hackathon Participation Certificate.pdfRajuKanojiya4

An experimental study in using natural admixture as an alternative for chemic...Chandu841456

young call girls in Green Park🔝 9953056974 🔝 escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N

Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis

Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423

welding defects observed during the weldingMuhammadUzairLiaqat

Internet of things -Arshdeep Bahga .pptxVelmuruganTECE

Past, Present and Future of Generative AIabhishek36461

Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...9953056974 Low Rate Call Girls In Saket, Delhi NCR

Industrial Safety Unit-I SAFETY TERMINOLOGIESNarmatha D

Design and analysis of solar grass cutter.pdfTagore Institute of Engineering And Technology

The SRE Report 2024 - Great Findings for the teamsDILIPKUMARMONDAL6

System Simulation and Modelling with types and Event SchedulingBootNeck1

Kürzlich hochgeladen (20)

young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service

IVE Industry Focused Event - Defence Sector 2024

TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers

complete construction, environmental and economics information of biomass com...

Industrial Safety Unit-IV workplace health and safety.ppt

National Level Hackathon Participation Certificate.pdf

An experimental study in using natural admixture as an alternative for chemic...

young call girls in Green Park🔝 9953056974 🔝 escort Service

UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)

Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction

Vishratwadi & Ghorpadi Bridge Tender documents

welding defects observed during the welding

Internet of things -Arshdeep Bahga .pptx

Past, Present and Future of Generative AI

Software and Systems Engineering Standards: Verification and Validation of Sy...

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...

Industrial Safety Unit-I SAFETY TERMINOLOGIES

Design and analysis of solar grass cutter.pdf

The SRE Report 2024 - Great Findings for the teams

System Simulation and Modelling with types and Event Scheduling

Modelling the Clustering Coefficient of a Random graph

1. MODELLING THE CLUSTERING COEFFICIENT OF A RANDOM GRAPH GRAPH-TA. MARCH, 2016 A. Duarte-López, A. Prat-Pérez, M. Pérez-Casany, J. Larriba-Pey DAMA - UPC

2. Objectives To create an algorithm that generates random graphs with: An specific degree distribution. An specific average clustering coefficient (ACC) [1]. For a given node i, CCi # of closed tringles # of triples of a node ACC 1 n n i CCi

3. Motivation Using graphs with realistic properties like datasets: It is not always feasible to use real graphs (due to privacy preserving concerns or technical issues). They have a high importance for many research or benchmarking applications. Most of the random graph generators do not concern about mimic characteristics of real graph.

4. Research steps 1) To focus on a single cluster and to model de CC of the node with the largest degree. 2) To consider a single cluster and to adjust the ACC. 3) To generalize the theory to multiple clusters. In all cases different degree distributions will be considered.

5. Step I Given a degree sequence (d1, d2, ..., dn) from a MoeZip f (α, β) [2]. N: Total number of nodes. n: Total number of nodes into the cluster. k: Maximum degree in the cluster. p1: Probability of connecting two nodes that belong to the same community. p2: Probability of connecting one node of a community with one node in the other community. Goal: After connecting the graph get E[CCi1 ] equal to target value.

6. Algorithm Given a graphic [3] degree sequence and a target clustering coefficient, the steps are: 1) To split the graph into two communities (C1 and C2). 2) To connect two nodes in C1 with probability p1. 3) To connect two nodes in different communities with probability p2 (p1 > p2). 4) To connect two nodes into C2 with probability p1. Repeat the procedure while it is possible. Goal: To find the values of p1 and p2 that satisfy: E[CCi1 ] targetCC.

7. Example

8. Extended Hypergeometric Distribution Let Xi1 and Yi1 be the number of connections of node i1 in the communities C1 and C2 respectively. Xi1 ∼ Bin(n, p1) and Yi1 ∼ Bin(N − n, p2) where N >> n. By definition, Xi1 |Xi1 +Yi1 k ∼ ExtHypDist(N, n, k, λ) Pr(X x) n x N−n m−x exλ j∈S n j N−n m−j ejλ ; where λ p1 p2 and max(0, n + m − N) ≤ x ≤ min(m; n). [4]

9. Expected clustering coefficient E[CCi1 ] 1 ki1 (ki1 − 1) x∈S (x(x − 1)p1) + (x(ki1 − x)p2) + (ki1 − x)(ki1 − x − 1)p1 P(Xi1 x) E[CCi1 ] = target value ⇒ λ

10. Bibliography [1] Mark Newman. Networks: an introduction. OUP Oxford, 2010. [2] Marta Pérez-Casany and Aina Casellas. Marshall-olkin extended zipf distribution. arXiv preprint arXiv:1304.4540, 2013. [3] Gerard Sierksma and Han Hoogeveen. Seven criteria for integer sequences being graphic. Journal of Graph theory, 15(2):223–231, 1991. [4] Daniel Zelterman. Models for Discreet Data. Oxford University Press, USA, 1999.

11. THANKS!!!

Modelling the Clustering Coefficient of a Random graph

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (18)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Modelling the Clustering Coefficient of a Random graph

Ähnlich wie Modelling the Clustering Coefficient of a Random graph (20)

Mehr von Graph-TA

Mehr von Graph-TA (17)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Modelling the Clustering Coefficient of a Random graph