Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
A Graph Summarization: A Survey | Summarizing and understanding large graphs
1. A Graph Summarization: A Survey
Liu, Y., Dighe, A., Safavi, T., & Koutra, D. (2017)
Summarizing and understanding large graphs
Koutra, D., Kang, U., Vreeken, J., & Faloutsos, C. (2015).
Statistical Analysis and Data Mining: The ASA Data Science Journal, 8(3), 183-202.
Aftab Alam
Department of Computer Engineering, Kyung Hee University
2. 1 - A Graph Summarization: A Survey
2 - Summarizing and understanding large graphs
Contents
Introduction (1)
Conclusions (2)
Experiments (2)
Organization (1)
VoG Steps (2)
7
6
5
2
1
4
3 Introduction (2)
Main Idea (2)
3. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
Introduction (1/2)
• Daily activities like
– social media interaction,
– web browsing,
– product and service purchases, etc.
• generate large amounts of data,
• The analysis of such data can impact
– the decision-making process and our lives.
• Volume of data and its velocity call for:
– data summarization,
– one of the main data mining tasks.
• Graphs are ubiquitous, representing a broad variety of natural
processes such
• friendships between people (Social network),
• communication patterns (traffic networks),
• interactions between chemical compounds and
• neurons in the brain (protein-protein interaction networks)
4. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
• Volume of the interconnected data increases -> summarization methods
• What is graph Summarization
– To find a short representation of the input graph,
– in the form of a summary or scarified graph,
– which reveals
o patterns in the original data and
o preserves specific structural or other properties,
o depending on the application domain.
Introduction (2/2)
Benefits:
• Reduction: volume and storage.
• Speedup: graph algorithms & queries.
• Interactive analysis.
• Noise elimination.
Applications
• Clustering, Classification, Community detection
• model order selection in matrix factorization
• outlier detection, pattern set mining,
• finding sources of infection in large graphs,
• understanding selected nodes in graphs
5. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
• A summary is application-dependent and can be defined with respect to various aspects:
– it can preserve specific structural patterns,
– focus on some entities in the network,
– preserve the answers to a specific set of queries,
– or maintain the distributions of some graph properties.
• Challenges
– Volume of data (Volume)
– Complexity of data (Variety)
– Definition of interestingness (Important and interesting information)
– Evaluation
– Change over time (Verity)
Definition and Challenges
6. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
1. Taxonomy
– static and dynamic graphs.
2. Existing methods while highlighting properties that are
– useful to researchers and practitioners,
o such as their input/output data types and end goal.
3. Connections b/w methods of graph summarization & related fields that have potential for
graph summarization, including:
– compression,
– scarification, and
– clustering and community detection.
4. Real world application
5. Open problems & opportunities for future research.
Contribution
7. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
Organization
8. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
Organization: STATIC GRAPH SUMMARIZATION: PLAIN NETWORKS
• The problem of
• summarization or
• aggregation or
• of static, plain graphs:
• Find a summary graph to concisely
describe the given graph.
9. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
Organization: STATIC GRAPH SUMMARIZATION: PLAIN NW (1/5)
1. Grouping-based methods
• Most popular techniques
• These methods aggregate nodes into
• super-nodes and connect them with
• super-edges, resulting in a
• super-graph.
10. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
Organization: STATIC GRAPH SUMMARIZATION: PLAIN NW (2/5)
2. Simplification-based methods
• Summarization method
• streamline an input graph
• by removing
• less “important” nodes or edges,
• resulting in a sparsified graph.
• Output graph consist of subset of the original
nodes and/or edges.
11. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
Organization: STATIC GRAPH SUMMARIZATION: PLAIN NW (3/5)
3. Compression-based methods
• The goal is to:
• minimize the number of bits
• needed to describe the input graph
• via its summary
• which can be seen as: (MDL)
• a model for the input graph, &
• its unmodeled parts.
12. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
Organization: STATIC GRAPH SUMMARIZATION: PLAIN NW (4/5)
4. Influence-based methods
• aim to discover a short representation of
the influence flow in large-scale graphs.
• some quantity related to information
influence is maintained
13. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
Organization: STATIC GRAPH SUMMARIZATION: PLAIN NW (5/5)
• Pattern-mining-based summarization
• Aim to summarize an input network via
structural patterns. i.e. Virtual Node Mining
14. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
Organization: STATIC GRAPH SUMMARIZATION: LABELED NW
• Given:
• a static graph G, &
• side information, such as node attributes
• Find:
• a summary graph or
• a set of labeled structures or
• a compressed data structure
• to concisely describe the given G.
15. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
Organization: DYNAMIC GRAPH SUMMARIZATION
16. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
Organization: STATIC GRAPH SUMMARIZATION: PLAIN NW (3/5)
3. Compression-based methods
• The goal is to:
• minimize the number of bits
• needed to describe the input graph
• via its summary
• which can be seen as: (MDL)
• a model for the input graph, &
• its unmodeled parts.
Graph Summarization with Bounded Error
Summarizing and understanding large graphs
Scalable Pattern Matching over Compressed G
Query Preserving Graph Compression
Neighbor Query Friendly Compression of Social NW
Community Preserving Lossy Compression Social NW
A Scalable and General Graph Management System
Compressing Graphs and Indexes with Recursive Graph B.
Compression of Graphical Structures
On Compressing Social Networks
17. 1 - A Graph Summarization: A Survey
2 - Summarizing and understanding large graphs
Contents
Introduction (1)
Conclusions (2)
Experiments (2)
Organization (1)
VoG Steps (2)
7
6
5
2
1
4
3 Introduction (2)
Main Idea (2)
18. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
• Real graphs often consist of:
– Stars
– Bipartite cores
– Cliques
– Chains
called “Vocabulary of graph (VoG)
Summarizing and understanding large graphs
• Describe a million-node graph with a few simple sentences?
• Given: a large graph,
– How can we find its most “important” structures”,
– so that we can summarize it and easily visualize it?
• How can we measure the “importance” of a set of discovered subgraphs in a
large graph?
Abstract
• Main idea
to find concise description of a graph in terms these “vocabulary”
19. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Contribution
• Our contributions are threefold:
1. Formulation:
– Provide a principled encoding scheme to identify
o the vocabulary type of a given subgraph for six structure types
2. Algorithm:
– Develop VoG, an efficient method to approximate the MDL-optimal summary of a
given graph in terms of local graph structures
3. Applicability:
– Report an extensive empirical evaluation on multimillion-edge real graphs
20. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Introduction
• Finding short summaries for large graphs,
– To gain a better understanding of their characteristics.
• Why not to apply community detection, clustering, or graph-cut algorithms
– and summarize the graph in terms of its communities?
• The answer is that these algorithms do not quite serve our goal.
– Typically they detect numerous communities without explicit ordering.
• A principled selection procedure of the most “important” subgraphs is still
needed.
• In addition to that, these methods merely return the discovered communities, without
characterizing them (e.g., clique, star), and, thus, do not help the user gain further
insights in the properties of the graph.
21. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Introduction: Reason of VoG
• The first insight
– describe the structures in a graph using an enriched set of “vocabulary” terms:
o cliques and
o near-cliques,
o stars,
o chains, and
o (near) bipartite cores.
• reasons we chose these “vocabulary” terms are:
– (i) (near-) cliques are included,
o and so our method works fine on “cavemen” graphs
– (ii) stars [8], chains [9], and bipartite cores [4,10]
o appear very often, and have semantic meaning (e.g., factions, bots) in the tens of real
networks
o we have seen in practice (e.g., IMDB movie-actor graph, co-authorship networks, netflix
movie recommendations, US Patent dataset, phone call networks).
22. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Introduction: Reason of VoG
• The second insight:
– is to formalize our goal using the minimum description length (MDL) principle [11]
o as a lossless compression problem.
– By MDL, we define the best summary of a graph as the set of subgraphs
– that describes the graph most succinctly,
o helps us to understand the main graph characteristics in a simple (non-redundant manner)
• The approach is parameter-free,
– as at any stage MDL identifies the best choice:
o the one by which we save most bits. Informally,
o Tackle the problem given on next slide:
23. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Introduction: Problem Definition
• Problem 1
– (Graph Summarization—Informal)
24. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Motivation behind VoG is Understanding Large Graphs
• Large graphs are difficult to understant that appear as a clutter of nodes and
edges when visualized.
• Simple structures are easily understood, and often meaningful.
25. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Application: Wikipedia controversy
• Wikipedia Controversy graph
• Fig (a) - no clear structures stand out.
– With out VoG
• Fig (b) – Wikipedia editors (Admins, Bots, Heavy Users)
– VoG spots stars
o Centers typically correspond to administrators who revert vandalisms and make corrections.
• Fig (c) & (d) reflecting “edit wars”,
– Editors reverting others’ edits.
– Bipartite graphs
o Manual inspection shows that these correspond to edit wars: two groups of editors reverting
each others’ changes.
Nodes: Wiki Editors
Edges: Editors share an edge if
they edited the same
part of the article
26. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Roadmap
27. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
• Use a graph vocabulary
• Shortest lossless description
– Optimal compression (MDL)
• Best graph summary
– Optimal compression (MDL)
Main Idea
28. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
• Given a set of models M
• The best model m belongs to M is
Minimum Description Length Principle
[28] J. Rissanen. Modeling by shortest data
description. Annals Stat., 11(2):416–431, 1983.
(MDL) Principle states that one should prefer the
model that yields the shortest description of the
data when the complexity of the model itself is
also accounted for
29. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
• Formally Minimum graph description:
Minimum Description Length Principle
30. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Roadmap
31. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Steps-1: Graph decomposition
Use any graph decomposition method : SlahBurn
32. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
• Now, how can we ‘Label’ them?
Step 2: Graph Labeling
argmin
33. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Step 2: Graph Labeling
Some criterion
34. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Step 3: Graph Labeling
Some criterion
35. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Summary encoding cost
36. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Roadmap
37. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
• Quantitative analysis of VOG with different heuristics
– PLAIN, TOP10, TOP100, and GREEDY’NFORGET.
Experiments
38. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Experiments
39. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Experiments
40. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
• Problem Formulation:
– proposed an information theoretic graph summarization technique that uses a
carefully chosen vocabulary of graph primitives.
• Effective and Scalable Algorithm:
– An effective method which is near-linear on the number of edges of the input graph
Conclusion