This document summarizes a seminar presentation about compression-based graph mining that exploits basic structural primitives like triangles and stars. It discusses how social media graphs are very sparse, with only a small fraction of possible edges existing. The technique codes graphs by representing recurring substructures like hubs and meshes compactly. This clustering reveals the graph's transitivity and hubness. Outcomes include seeing which basic structure is most common and getting an overall minimum graph clustered by dense areas. While results seem good, the document notes critics that example codings are not shown and probabilities given are not replicable.
4. Social Media Data
Is Facebook sparse?
-> 1.4 x 10^9 nodes ¹
-> on average 340 friends² per node
-> 478 x 10^9 edges
-> possible edges: 0.9 x 10^18
=> only 0,000000156% of all possible edges exist
Yes Facebook is very sparse
¹https://en.wikipedia.org/wiki/Facebook ²http://www.statista.com/statistics/232499/americans-who-use-social-networking-sites-several-
times-per-day/
4
9. What is the benefit of knowing the
structure of a graph?
- deeper insights in Graph
- lossless compression is possible
- link prediction
- number of clusters
- graph partitioning
9
12. Characteristics of CXprime
(Compression-based eXploiting Primitives)
Minimum Description Length - based [3]¹
no Input parameters (unsupervised)
Clustering is k-means like
¹https://en.wikipedia.org/wiki/Minimum_description_length 12
20. Outcomes 1
After coding the graph in a star-coding and in a
triangle-coding you can see which one is the
smallest, so which basic structure is most
common.
20
25. Outcomes 2
If you always use the minimum of the three
possible codings you get an overall minimum
graph. This graph is now clustered in areas of
hubs and triangles.
25
29. Critics
- No example how the coding
actually looks like
- given probabilities are not
replicable
29
30. Summary
The mentioned results in the paper are really
good. The compression rate is extremely high
compared to other graph compression
algorithms. The clustering results look really
good.
30
31. Thanks for your attention
[1] FENG JING , XIAO HE , NINA HUBIG , CHRISTIAN BÖHM, CLAUDIA PLANT: Compression-based Graph Mining
Exploiting Structure Primitives. Data Mining (ICDM), 2013 IEEE 13th International Conference on, 181–190. IEEE,
2013
[2] T. Schank and D. Wagner, “Approximating clustering coefficient and transitivity,” J. Graph Algorithms Appl., vol. 9,
no. 2, pp. 265–275, 2005.
[3] J. Rissanen, “An introduction to the mdl principle,” Helsinki Institute for Information Technology, Tech. Rep., 2005.
[4] Python, Pyplot, Instagram API
31