SlideShare a Scribd company logo
1 of 41
A Graph Summarization: A Survey
Liu, Y., Dighe, A., Safavi, T., & Koutra, D. (2017)
Summarizing and understanding large graphs
Koutra, D., Kang, U., Vreeken, J., & Faloutsos, C. (2015).
Statistical Analysis and Data Mining: The ASA Data Science Journal, 8(3), 183-202.
Aftab Alam
Department of Computer Engineering, Kyung Hee University
1 - A Graph Summarization: A Survey
2 - Summarizing and understanding large graphs
Contents
Introduction (1)
Conclusions (2)
Experiments (2)
Organization (1)
VoG Steps (2)
7
6
5
2
1
4
3 Introduction (2)
Main Idea (2)
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
Introduction (1/2)
• Daily activities like
– social media interaction,
– web browsing,
– product and service purchases, etc.
• generate large amounts of data,
• The analysis of such data can impact
– the decision-making process and our lives.
• Volume of data and its velocity call for:
– data summarization,
– one of the main data mining tasks.
• Graphs are ubiquitous, representing a broad variety of natural
processes such
• friendships between people (Social network),
• communication patterns (traffic networks),
• interactions between chemical compounds and
• neurons in the brain (protein-protein interaction networks)
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
• Volume of the interconnected data increases -> summarization methods
• What is graph Summarization
– To find a short representation of the input graph,
– in the form of a summary or scarified graph,
– which reveals
o patterns in the original data and
o preserves specific structural or other properties,
o depending on the application domain.
Introduction (2/2)
Benefits:
• Reduction: volume and storage.
• Speedup: graph algorithms & queries.
• Interactive analysis.
• Noise elimination.
Applications
• Clustering, Classification, Community detection
• model order selection in matrix factorization
• outlier detection, pattern set mining,
• finding sources of infection in large graphs,
• understanding selected nodes in graphs
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
• A summary is application-dependent and can be defined with respect to various aspects:
– it can preserve specific structural patterns,
– focus on some entities in the network,
– preserve the answers to a specific set of queries,
– or maintain the distributions of some graph properties.
• Challenges
– Volume of data (Volume)
– Complexity of data (Variety)
– Definition of interestingness (Important and interesting information)
– Evaluation
– Change over time (Verity)
Definition and Challenges
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
1. Taxonomy
– static and dynamic graphs.
2. Existing methods while highlighting properties that are
– useful to researchers and practitioners,
o such as their input/output data types and end goal.
3. Connections b/w methods of graph summarization & related fields that have potential for
graph summarization, including:
– compression,
– scarification, and
– clustering and community detection.
4. Real world application
5. Open problems & opportunities for future research.
Contribution
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
Organization
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
Organization: STATIC GRAPH SUMMARIZATION: PLAIN NETWORKS
• The problem of
• summarization or
• aggregation or
• of static, plain graphs:
• Find a summary graph to concisely
describe the given graph.
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
Organization: STATIC GRAPH SUMMARIZATION: PLAIN NW (1/5)
1. Grouping-based methods
• Most popular techniques
• These methods aggregate nodes into
• super-nodes and connect them with
• super-edges, resulting in a
• super-graph.
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
Organization: STATIC GRAPH SUMMARIZATION: PLAIN NW (2/5)
2. Simplification-based methods
• Summarization method
• streamline an input graph
• by removing
• less “important” nodes or edges,
• resulting in a sparsified graph.
• Output graph consist of subset of the original
nodes and/or edges.
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
Organization: STATIC GRAPH SUMMARIZATION: PLAIN NW (3/5)
3. Compression-based methods
• The goal is to:
• minimize the number of bits
• needed to describe the input graph
• via its summary
• which can be seen as: (MDL)
• a model for the input graph, &
• its unmodeled parts.
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
Organization: STATIC GRAPH SUMMARIZATION: PLAIN NW (4/5)
4. Influence-based methods
• aim to discover a short representation of
the influence flow in large-scale graphs.
• some quantity related to information
influence is maintained
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
Organization: STATIC GRAPH SUMMARIZATION: PLAIN NW (5/5)
• Pattern-mining-based summarization
• Aim to summarize an input network via
structural patterns. i.e. Virtual Node Mining
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
Organization: STATIC GRAPH SUMMARIZATION: LABELED NW
• Given:
• a static graph G, &
• side information, such as node attributes
• Find:
• a summary graph or
• a set of labeled structures or
• a compressed data structure
• to concisely describe the given G.
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
Organization: DYNAMIC GRAPH SUMMARIZATION
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
A Graph Summarization: A Survey
Organization: STATIC GRAPH SUMMARIZATION: PLAIN NW (3/5)
3. Compression-based methods
• The goal is to:
• minimize the number of bits
• needed to describe the input graph
• via its summary
• which can be seen as: (MDL)
• a model for the input graph, &
• its unmodeled parts.
Graph Summarization with Bounded Error
Summarizing and understanding large graphs
Scalable Pattern Matching over Compressed G
Query Preserving Graph Compression
Neighbor Query Friendly Compression of Social NW
Community Preserving Lossy Compression Social NW
A Scalable and General Graph Management System
Compressing Graphs and Indexes with Recursive Graph B.
Compression of Graphical Structures
On Compressing Social Networks
1 - A Graph Summarization: A Survey
2 - Summarizing and understanding large graphs
Contents
Introduction (1)
Conclusions (2)
Experiments (2)
Organization (1)
VoG Steps (2)
7
6
5
2
1
4
3 Introduction (2)
Main Idea (2)
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
• Real graphs often consist of:
– Stars
– Bipartite cores
– Cliques
– Chains
called “Vocabulary of graph (VoG)
Summarizing and understanding large graphs
• Describe a million-node graph with a few simple sentences?
• Given: a large graph,
– How can we find its most “important” structures”,
– so that we can summarize it and easily visualize it?
• How can we measure the “importance” of a set of discovered subgraphs in a
large graph?
Abstract
• Main idea
to find concise description of a graph in terms these “vocabulary”
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Contribution
• Our contributions are threefold:
1. Formulation:
– Provide a principled encoding scheme to identify
o the vocabulary type of a given subgraph for six structure types
2. Algorithm:
– Develop VoG, an efficient method to approximate the MDL-optimal summary of a
given graph in terms of local graph structures
3. Applicability:
– Report an extensive empirical evaluation on multimillion-edge real graphs
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Introduction
• Finding short summaries for large graphs,
– To gain a better understanding of their characteristics.
• Why not to apply community detection, clustering, or graph-cut algorithms
– and summarize the graph in terms of its communities?
• The answer is that these algorithms do not quite serve our goal.
– Typically they detect numerous communities without explicit ordering.
• A principled selection procedure of the most “important” subgraphs is still
needed.
• In addition to that, these methods merely return the discovered communities, without
characterizing them (e.g., clique, star), and, thus, do not help the user gain further
insights in the properties of the graph.
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Introduction: Reason of VoG
• The first insight
– describe the structures in a graph using an enriched set of “vocabulary” terms:
o cliques and
o near-cliques,
o stars,
o chains, and
o (near) bipartite cores.
• reasons we chose these “vocabulary” terms are:
– (i) (near-) cliques are included,
o and so our method works fine on “cavemen” graphs
– (ii) stars [8], chains [9], and bipartite cores [4,10]
o appear very often, and have semantic meaning (e.g., factions, bots) in the tens of real
networks
o we have seen in practice (e.g., IMDB movie-actor graph, co-authorship networks, netflix
movie recommendations, US Patent dataset, phone call networks).
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Introduction: Reason of VoG
• The second insight:
– is to formalize our goal using the minimum description length (MDL) principle [11]
o as a lossless compression problem.
– By MDL, we define the best summary of a graph as the set of subgraphs
– that describes the graph most succinctly,
o helps us to understand the main graph characteristics in a simple (non-redundant manner)
• The approach is parameter-free,
– as at any stage MDL identifies the best choice:
o the one by which we save most bits. Informally,
o Tackle the problem given on next slide:
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Introduction: Problem Definition
• Problem 1
– (Graph Summarization—Informal)
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Motivation behind VoG is Understanding Large Graphs
• Large graphs are difficult to understant that appear as a clutter of nodes and
edges when visualized.
• Simple structures are easily understood, and often meaningful.
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Application: Wikipedia controversy
• Wikipedia Controversy graph
• Fig (a) - no clear structures stand out.
– With out VoG
• Fig (b) – Wikipedia editors (Admins, Bots, Heavy Users)
– VoG spots stars
o Centers typically correspond to administrators who revert vandalisms and make corrections.
• Fig (c) & (d) reflecting “edit wars”,
– Editors reverting others’ edits.
– Bipartite graphs
o Manual inspection shows that these correspond to edit wars: two groups of editors reverting
each others’ changes.
Nodes: Wiki Editors
Edges: Editors share an edge if
they edited the same
part of the article
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Roadmap
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
• Use a graph vocabulary
• Shortest lossless description
– Optimal compression (MDL)
• Best graph summary
– Optimal compression (MDL)
Main Idea
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
• Given a set of models M
• The best model m belongs to M is
Minimum Description Length Principle
[28] J. Rissanen. Modeling by shortest data
description. Annals Stat., 11(2):416–431, 1983.
(MDL) Principle states that one should prefer the
model that yields the shortest description of the
data when the complexity of the model itself is
also accounted for
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
• Formally Minimum graph description:
Minimum Description Length Principle
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Roadmap
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Steps-1: Graph decomposition
Use any graph decomposition method : SlahBurn
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
• Now, how can we ‘Label’ them?
Step 2: Graph Labeling
argmin
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Step 2: Graph Labeling
Some criterion
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Step 3: Graph Labeling
Some criterion
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Summary encoding cost
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Roadmap
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
• Quantitative analysis of VOG with different heuristics
– PLAIN, TOP10, TOP100, and GREEDY’NFORGET.
Experiments
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Experiments
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
Experiments
Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea.
Summarizing and understanding large graphs
• Problem Formulation:
– proposed an information theoretic graph summarization technique that uses a
carefully chosen vocabulary of graph primitives.
• Effective and Scalable Algorithm:
– An effective method which is near-linear on the number of edges of the input graph
Conclusion
Your Logo
THANK YOU!
?

More Related Content

What's hot

Digital Preservation Standards
Digital Preservation StandardsDigital Preservation Standards
Digital Preservation StandardsBetsy Fanning
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)9866825059
 
Information Retrieval Methods in Libraries and Information Centers
Information Retrieval Methods in Libraries and Information CentersInformation Retrieval Methods in Libraries and Information Centers
Information Retrieval Methods in Libraries and Information CentersEdeama Onwuchekwa
 
The Origins of Information Science and the International Institute of Bibliog...
The Origins of Information Science and the International Institute of Bibliog...The Origins of Information Science and the International Institute of Bibliog...
The Origins of Information Science and the International Institute of Bibliog...Charlley Luz
 
Bibliographic description an overview
Bibliographic description an overviewBibliographic description an overview
Bibliographic description an overviewDr. Utpal Das
 
Pandora FMS: Monitorización de servidores MySQL
Pandora FMS: Monitorización de servidores MySQLPandora FMS: Monitorización de servidores MySQL
Pandora FMS: Monitorización de servidores MySQLPandora FMS
 
A Graduate Critical Appraisal Assignment for Athletic Training
A Graduate Critical Appraisal Assignment for Athletic TrainingA Graduate Critical Appraisal Assignment for Athletic Training
A Graduate Critical Appraisal Assignment for Athletic TrainingJohn Parsons
 
Aula 01 - Recuperação da Informação
Aula 01 - Recuperação da InformaçãoAula 01 - Recuperação da Informação
Aula 01 - Recuperação da InformaçãoNilton Heck
 
Intro to the semantic web (for libraries)
Intro to the semantic web (for libraries) Intro to the semantic web (for libraries)
Intro to the semantic web (for libraries) robin fay
 
INFORMATION RETRIEVAL ‎AND DISSEMINATION
INFORMATION RETRIEVAL ‎AND DISSEMINATIONINFORMATION RETRIEVAL ‎AND DISSEMINATION
INFORMATION RETRIEVAL ‎AND DISSEMINATIONLibcorpio
 
Cloud computing and library services
Cloud computing and library servicesCloud computing and library services
Cloud computing and library servicesErik Mitchell
 
Archival resources in libraries: significance, sources and set-ups
Archival resources in libraries: significance, sources and set-upsArchival resources in libraries: significance, sources and set-ups
Archival resources in libraries: significance, sources and set-upsFe Angela Verzosa
 

What's hot (20)

Sistemas CRIS para el monitoreo de publicaciones e investigadores: El caso de...
Sistemas CRIS para el monitoreo de publicaciones e investigadores: El caso de...Sistemas CRIS para el monitoreo de publicaciones e investigadores: El caso de...
Sistemas CRIS para el monitoreo de publicaciones e investigadores: El caso de...
 
Digital Preservation Standards
Digital Preservation StandardsDigital Preservation Standards
Digital Preservation Standards
 
hypotheses.pptx
hypotheses.pptxhypotheses.pptx
hypotheses.pptx
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
 
Altmetrics
Altmetrics Altmetrics
Altmetrics
 
INFORMATION SEEKING BEHAVIOUR OF ENGINEERING COLLEGE STUDENT IN INDORE CITY
INFORMATION SEEKING BEHAVIOUR OF ENGINEERING COLLEGE STUDENT IN INDORE CITY INFORMATION SEEKING BEHAVIOUR OF ENGINEERING COLLEGE STUDENT IN INDORE CITY
INFORMATION SEEKING BEHAVIOUR OF ENGINEERING COLLEGE STUDENT IN INDORE CITY
 
RDA Background
RDA BackgroundRDA Background
RDA Background
 
Research design
Research designResearch design
Research design
 
Information Retrieval Methods in Libraries and Information Centers
Information Retrieval Methods in Libraries and Information CentersInformation Retrieval Methods in Libraries and Information Centers
Information Retrieval Methods in Libraries and Information Centers
 
The Origins of Information Science and the International Institute of Bibliog...
The Origins of Information Science and the International Institute of Bibliog...The Origins of Information Science and the International Institute of Bibliog...
The Origins of Information Science and the International Institute of Bibliog...
 
Bibliographic description an overview
Bibliographic description an overviewBibliographic description an overview
Bibliographic description an overview
 
Pandora FMS: Monitorización de servidores MySQL
Pandora FMS: Monitorización de servidores MySQLPandora FMS: Monitorización de servidores MySQL
Pandora FMS: Monitorización de servidores MySQL
 
Introductionto bibliometrics
Introductionto bibliometricsIntroductionto bibliometrics
Introductionto bibliometrics
 
A Graduate Critical Appraisal Assignment for Athletic Training
A Graduate Critical Appraisal Assignment for Athletic TrainingA Graduate Critical Appraisal Assignment for Athletic Training
A Graduate Critical Appraisal Assignment for Athletic Training
 
Aula 01 - Recuperação da Informação
Aula 01 - Recuperação da InformaçãoAula 01 - Recuperação da Informação
Aula 01 - Recuperação da Informação
 
Intro to the semantic web (for libraries)
Intro to the semantic web (for libraries) Intro to the semantic web (for libraries)
Intro to the semantic web (for libraries)
 
INFORMATION RETRIEVAL ‎AND DISSEMINATION
INFORMATION RETRIEVAL ‎AND DISSEMINATIONINFORMATION RETRIEVAL ‎AND DISSEMINATION
INFORMATION RETRIEVAL ‎AND DISSEMINATION
 
Cloud computing and library services
Cloud computing and library servicesCloud computing and library services
Cloud computing and library services
 
Rda policy statement and guidelines for phil libraries mila ramos
Rda policy statement and guidelines for phil libraries   mila ramosRda policy statement and guidelines for phil libraries   mila ramos
Rda policy statement and guidelines for phil libraries mila ramos
 
Archival resources in libraries: significance, sources and set-ups
Archival resources in libraries: significance, sources and set-upsArchival resources in libraries: significance, sources and set-ups
Archival resources in libraries: significance, sources and set-ups
 

Similar to A Graph Summarization: A Survey | Summarizing and understanding large graphs

Locally densest subgraph discovery
Locally densest subgraph discoveryLocally densest subgraph discovery
Locally densest subgraph discoveryaftab alam
 
Volume 2-issue-6-2200-2204
Volume 2-issue-6-2200-2204Volume 2-issue-6-2200-2204
Volume 2-issue-6-2200-2204Editor IJARCET
 
Volume 2-issue-6-2200-2204
Volume 2-issue-6-2200-2204Volume 2-issue-6-2200-2204
Volume 2-issue-6-2200-2204Editor IJARCET
 
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...theijes
 
Visual analysis of large graphs state of the art and future research challenges
Visual analysis of large graphs state of the art and future research challengesVisual analysis of large graphs state of the art and future research challenges
Visual analysis of large graphs state of the art and future research challengesAsliza Hamzah
 
High dimensionality reduction on graphical data
High dimensionality reduction on graphical dataHigh dimensionality reduction on graphical data
High dimensionality reduction on graphical dataeSAT Journals
 
Carved visual hulls for image based modeling
Carved visual hulls for image based modelingCarved visual hulls for image based modeling
Carved visual hulls for image based modelingaftab alam
 
CNN MODEL FOR TRAFFIC SIGN RECOGNITION
CNN MODEL FOR TRAFFIC SIGN RECOGNITIONCNN MODEL FOR TRAFFIC SIGN RECOGNITION
CNN MODEL FOR TRAFFIC SIGN RECOGNITIONIRJET Journal
 
vnc.pptx
vnc.pptxvnc.pptx
vnc.pptxPigPug1
 
vnc_1660543731.pptx
vnc_1660543731.pptxvnc_1660543731.pptx
vnc_1660543731.pptxPigPug1
 
Intelligent Career Guidance System.pptx
Intelligent Career Guidance System.pptxIntelligent Career Guidance System.pptx
Intelligent Career Guidance System.pptxAnonymous366406
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processingjins0618
 
Overview1) Overview – The continued discussion of .docx
Overview1)             Overview – The continued discussion of .docxOverview1)             Overview – The continued discussion of .docx
Overview1) Overview – The continued discussion of .docxalfred4lewis58146
 
Pm0015 quantitative methods in project management
Pm0015   quantitative methods in project managementPm0015   quantitative methods in project management
Pm0015 quantitative methods in project managementsmumbahelp
 
Overview1) Overview – The continued discussion of project implem.docx
Overview1) Overview – The continued discussion of project implem.docxOverview1) Overview – The continued discussion of project implem.docx
Overview1) Overview – The continued discussion of project implem.docxalfred4lewis58146
 

Similar to A Graph Summarization: A Survey | Summarizing and understanding large graphs (20)

Locally densest subgraph discovery
Locally densest subgraph discoveryLocally densest subgraph discovery
Locally densest subgraph discovery
 
Chapter 3.pptx
Chapter 3.pptxChapter 3.pptx
Chapter 3.pptx
 
Volume 2-issue-6-2200-2204
Volume 2-issue-6-2200-2204Volume 2-issue-6-2200-2204
Volume 2-issue-6-2200-2204
 
Volume 2-issue-6-2200-2204
Volume 2-issue-6-2200-2204Volume 2-issue-6-2200-2204
Volume 2-issue-6-2200-2204
 
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
 
Visual analysis of large graphs state of the art and future research challenges
Visual analysis of large graphs state of the art and future research challengesVisual analysis of large graphs state of the art and future research challenges
Visual analysis of large graphs state of the art and future research challenges
 
High dimensionality reduction on graphical data
High dimensionality reduction on graphical dataHigh dimensionality reduction on graphical data
High dimensionality reduction on graphical data
 
Carved visual hulls for image based modeling
Carved visual hulls for image based modelingCarved visual hulls for image based modeling
Carved visual hulls for image based modeling
 
Geometric Deep Learning
Geometric Deep Learning Geometric Deep Learning
Geometric Deep Learning
 
algorithms
algorithmsalgorithms
algorithms
 
CNN MODEL FOR TRAFFIC SIGN RECOGNITION
CNN MODEL FOR TRAFFIC SIGN RECOGNITIONCNN MODEL FOR TRAFFIC SIGN RECOGNITION
CNN MODEL FOR TRAFFIC SIGN RECOGNITION
 
vnc.pptx
vnc.pptxvnc.pptx
vnc.pptx
 
vnc_1660543731.pptx
vnc_1660543731.pptxvnc_1660543731.pptx
vnc_1660543731.pptx
 
Intelligent Career Guidance System.pptx
Intelligent Career Guidance System.pptxIntelligent Career Guidance System.pptx
Intelligent Career Guidance System.pptx
 
FDS_dept_ppt.pptx
FDS_dept_ppt.pptxFDS_dept_ppt.pptx
FDS_dept_ppt.pptx
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processing
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
 
Overview1) Overview – The continued discussion of .docx
Overview1)             Overview – The continued discussion of .docxOverview1)             Overview – The continued discussion of .docx
Overview1) Overview – The continued discussion of .docx
 
Pm0015 quantitative methods in project management
Pm0015   quantitative methods in project managementPm0015   quantitative methods in project management
Pm0015 quantitative methods in project management
 
Overview1) Overview – The continued discussion of project implem.docx
Overview1) Overview – The continued discussion of project implem.docxOverview1) Overview – The continued discussion of project implem.docx
Overview1) Overview – The continued discussion of project implem.docx
 

More from aftab alam

Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sqlaftab alam
 
Distributed graph summarization
Distributed graph summarizationDistributed graph summarization
Distributed graph summarizationaftab alam
 
Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection aftab alam
 
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATIONSCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATIONaftab alam
 
Writing for computer science: Fourteen steps to a clearly written technical p...
Writing for computer science: Fourteen steps to a clearly written technical p...Writing for computer science: Fourteen steps to a clearly written technical p...
Writing for computer science: Fourteen steps to a clearly written technical p...aftab alam
 
Writing for Computer Science: Design an article
Writing for Computer Science: Design an articleWriting for Computer Science: Design an article
Writing for Computer Science: Design an articleaftab alam
 
Efficient aggregation for graph summarization
Efficient aggregation for graph summarizationEfficient aggregation for graph summarization
Efficient aggregation for graph summarizationaftab alam
 

More from aftab alam (7)

Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sql
 
Distributed graph summarization
Distributed graph summarizationDistributed graph summarization
Distributed graph summarization
 
Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection
 
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATIONSCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
 
Writing for computer science: Fourteen steps to a clearly written technical p...
Writing for computer science: Fourteen steps to a clearly written technical p...Writing for computer science: Fourteen steps to a clearly written technical p...
Writing for computer science: Fourteen steps to a clearly written technical p...
 
Writing for Computer Science: Design an article
Writing for Computer Science: Design an articleWriting for Computer Science: Design an article
Writing for Computer Science: Design an article
 
Efficient aggregation for graph summarization
Efficient aggregation for graph summarizationEfficient aggregation for graph summarization
Efficient aggregation for graph summarization
 

Recently uploaded

Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 

Recently uploaded (20)

Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 

A Graph Summarization: A Survey | Summarizing and understanding large graphs

  • 1. A Graph Summarization: A Survey Liu, Y., Dighe, A., Safavi, T., & Koutra, D. (2017) Summarizing and understanding large graphs Koutra, D., Kang, U., Vreeken, J., & Faloutsos, C. (2015). Statistical Analysis and Data Mining: The ASA Data Science Journal, 8(3), 183-202. Aftab Alam Department of Computer Engineering, Kyung Hee University
  • 2. 1 - A Graph Summarization: A Survey 2 - Summarizing and understanding large graphs Contents Introduction (1) Conclusions (2) Experiments (2) Organization (1) VoG Steps (2) 7 6 5 2 1 4 3 Introduction (2) Main Idea (2)
  • 3. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. A Graph Summarization: A Survey Introduction (1/2) • Daily activities like – social media interaction, – web browsing, – product and service purchases, etc. • generate large amounts of data, • The analysis of such data can impact – the decision-making process and our lives. • Volume of data and its velocity call for: – data summarization, – one of the main data mining tasks. • Graphs are ubiquitous, representing a broad variety of natural processes such • friendships between people (Social network), • communication patterns (traffic networks), • interactions between chemical compounds and • neurons in the brain (protein-protein interaction networks)
  • 4. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. A Graph Summarization: A Survey • Volume of the interconnected data increases -> summarization methods • What is graph Summarization – To find a short representation of the input graph, – in the form of a summary or scarified graph, – which reveals o patterns in the original data and o preserves specific structural or other properties, o depending on the application domain. Introduction (2/2) Benefits: • Reduction: volume and storage. • Speedup: graph algorithms & queries. • Interactive analysis. • Noise elimination. Applications • Clustering, Classification, Community detection • model order selection in matrix factorization • outlier detection, pattern set mining, • finding sources of infection in large graphs, • understanding selected nodes in graphs
  • 5. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. A Graph Summarization: A Survey • A summary is application-dependent and can be defined with respect to various aspects: – it can preserve specific structural patterns, – focus on some entities in the network, – preserve the answers to a specific set of queries, – or maintain the distributions of some graph properties. • Challenges – Volume of data (Volume) – Complexity of data (Variety) – Definition of interestingness (Important and interesting information) – Evaluation – Change over time (Verity) Definition and Challenges
  • 6. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. A Graph Summarization: A Survey 1. Taxonomy – static and dynamic graphs. 2. Existing methods while highlighting properties that are – useful to researchers and practitioners, o such as their input/output data types and end goal. 3. Connections b/w methods of graph summarization & related fields that have potential for graph summarization, including: – compression, – scarification, and – clustering and community detection. 4. Real world application 5. Open problems & opportunities for future research. Contribution
  • 7. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. A Graph Summarization: A Survey Organization
  • 8. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. A Graph Summarization: A Survey Organization: STATIC GRAPH SUMMARIZATION: PLAIN NETWORKS • The problem of • summarization or • aggregation or • of static, plain graphs: • Find a summary graph to concisely describe the given graph.
  • 9. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. A Graph Summarization: A Survey Organization: STATIC GRAPH SUMMARIZATION: PLAIN NW (1/5) 1. Grouping-based methods • Most popular techniques • These methods aggregate nodes into • super-nodes and connect them with • super-edges, resulting in a • super-graph.
  • 10. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. A Graph Summarization: A Survey Organization: STATIC GRAPH SUMMARIZATION: PLAIN NW (2/5) 2. Simplification-based methods • Summarization method • streamline an input graph • by removing • less “important” nodes or edges, • resulting in a sparsified graph. • Output graph consist of subset of the original nodes and/or edges.
  • 11. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. A Graph Summarization: A Survey Organization: STATIC GRAPH SUMMARIZATION: PLAIN NW (3/5) 3. Compression-based methods • The goal is to: • minimize the number of bits • needed to describe the input graph • via its summary • which can be seen as: (MDL) • a model for the input graph, & • its unmodeled parts.
  • 12. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. A Graph Summarization: A Survey Organization: STATIC GRAPH SUMMARIZATION: PLAIN NW (4/5) 4. Influence-based methods • aim to discover a short representation of the influence flow in large-scale graphs. • some quantity related to information influence is maintained
  • 13. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. A Graph Summarization: A Survey Organization: STATIC GRAPH SUMMARIZATION: PLAIN NW (5/5) • Pattern-mining-based summarization • Aim to summarize an input network via structural patterns. i.e. Virtual Node Mining
  • 14. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. A Graph Summarization: A Survey Organization: STATIC GRAPH SUMMARIZATION: LABELED NW • Given: • a static graph G, & • side information, such as node attributes • Find: • a summary graph or • a set of labeled structures or • a compressed data structure • to concisely describe the given G.
  • 15. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. A Graph Summarization: A Survey Organization: DYNAMIC GRAPH SUMMARIZATION
  • 16. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. A Graph Summarization: A Survey Organization: STATIC GRAPH SUMMARIZATION: PLAIN NW (3/5) 3. Compression-based methods • The goal is to: • minimize the number of bits • needed to describe the input graph • via its summary • which can be seen as: (MDL) • a model for the input graph, & • its unmodeled parts. Graph Summarization with Bounded Error Summarizing and understanding large graphs Scalable Pattern Matching over Compressed G Query Preserving Graph Compression Neighbor Query Friendly Compression of Social NW Community Preserving Lossy Compression Social NW A Scalable and General Graph Management System Compressing Graphs and Indexes with Recursive Graph B. Compression of Graphical Structures On Compressing Social Networks
  • 17. 1 - A Graph Summarization: A Survey 2 - Summarizing and understanding large graphs Contents Introduction (1) Conclusions (2) Experiments (2) Organization (1) VoG Steps (2) 7 6 5 2 1 4 3 Introduction (2) Main Idea (2)
  • 18. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. • Real graphs often consist of: – Stars – Bipartite cores – Cliques – Chains called “Vocabulary of graph (VoG) Summarizing and understanding large graphs • Describe a million-node graph with a few simple sentences? • Given: a large graph, – How can we find its most “important” structures”, – so that we can summarize it and easily visualize it? • How can we measure the “importance” of a set of discovered subgraphs in a large graph? Abstract • Main idea to find concise description of a graph in terms these “vocabulary”
  • 19. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Summarizing and understanding large graphs Contribution • Our contributions are threefold: 1. Formulation: – Provide a principled encoding scheme to identify o the vocabulary type of a given subgraph for six structure types 2. Algorithm: – Develop VoG, an efficient method to approximate the MDL-optimal summary of a given graph in terms of local graph structures 3. Applicability: – Report an extensive empirical evaluation on multimillion-edge real graphs
  • 20. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Summarizing and understanding large graphs Introduction • Finding short summaries for large graphs, – To gain a better understanding of their characteristics. • Why not to apply community detection, clustering, or graph-cut algorithms – and summarize the graph in terms of its communities? • The answer is that these algorithms do not quite serve our goal. – Typically they detect numerous communities without explicit ordering. • A principled selection procedure of the most “important” subgraphs is still needed. • In addition to that, these methods merely return the discovered communities, without characterizing them (e.g., clique, star), and, thus, do not help the user gain further insights in the properties of the graph.
  • 21. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Summarizing and understanding large graphs Introduction: Reason of VoG • The first insight – describe the structures in a graph using an enriched set of “vocabulary” terms: o cliques and o near-cliques, o stars, o chains, and o (near) bipartite cores. • reasons we chose these “vocabulary” terms are: – (i) (near-) cliques are included, o and so our method works fine on “cavemen” graphs – (ii) stars [8], chains [9], and bipartite cores [4,10] o appear very often, and have semantic meaning (e.g., factions, bots) in the tens of real networks o we have seen in practice (e.g., IMDB movie-actor graph, co-authorship networks, netflix movie recommendations, US Patent dataset, phone call networks).
  • 22. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Summarizing and understanding large graphs Introduction: Reason of VoG • The second insight: – is to formalize our goal using the minimum description length (MDL) principle [11] o as a lossless compression problem. – By MDL, we define the best summary of a graph as the set of subgraphs – that describes the graph most succinctly, o helps us to understand the main graph characteristics in a simple (non-redundant manner) • The approach is parameter-free, – as at any stage MDL identifies the best choice: o the one by which we save most bits. Informally, o Tackle the problem given on next slide:
  • 23. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Summarizing and understanding large graphs Introduction: Problem Definition • Problem 1 – (Graph Summarization—Informal)
  • 24. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Summarizing and understanding large graphs Motivation behind VoG is Understanding Large Graphs • Large graphs are difficult to understant that appear as a clutter of nodes and edges when visualized. • Simple structures are easily understood, and often meaningful.
  • 25. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Summarizing and understanding large graphs Application: Wikipedia controversy • Wikipedia Controversy graph • Fig (a) - no clear structures stand out. – With out VoG • Fig (b) – Wikipedia editors (Admins, Bots, Heavy Users) – VoG spots stars o Centers typically correspond to administrators who revert vandalisms and make corrections. • Fig (c) & (d) reflecting “edit wars”, – Editors reverting others’ edits. – Bipartite graphs o Manual inspection shows that these correspond to edit wars: two groups of editors reverting each others’ changes. Nodes: Wiki Editors Edges: Editors share an edge if they edited the same part of the article
  • 26. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Summarizing and understanding large graphs Roadmap
  • 27. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Summarizing and understanding large graphs • Use a graph vocabulary • Shortest lossless description – Optimal compression (MDL) • Best graph summary – Optimal compression (MDL) Main Idea
  • 28. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Summarizing and understanding large graphs • Given a set of models M • The best model m belongs to M is Minimum Description Length Principle [28] J. Rissanen. Modeling by shortest data description. Annals Stat., 11(2):416–431, 1983. (MDL) Principle states that one should prefer the model that yields the shortest description of the data when the complexity of the model itself is also accounted for
  • 29. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Summarizing and understanding large graphs • Formally Minimum graph description: Minimum Description Length Principle
  • 30. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Summarizing and understanding large graphs Roadmap
  • 31. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Summarizing and understanding large graphs Steps-1: Graph decomposition Use any graph decomposition method : SlahBurn
  • 32. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Summarizing and understanding large graphs • Now, how can we ‘Label’ them? Step 2: Graph Labeling argmin
  • 33. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Summarizing and understanding large graphs Step 2: Graph Labeling Some criterion
  • 34. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Summarizing and understanding large graphs Step 3: Graph Labeling Some criterion
  • 35. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Summarizing and understanding large graphs Summary encoding cost
  • 36. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Summarizing and understanding large graphs Roadmap
  • 37. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Summarizing and understanding large graphs • Quantitative analysis of VOG with different heuristics – PLAIN, TOP10, TOP100, and GREEDY’NFORGET. Experiments
  • 38. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Summarizing and understanding large graphs Experiments
  • 39. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Summarizing and understanding large graphs Experiments
  • 40. Data & Knowledge Engineering Lab, Department of Computer Engineering, Kyung Hee University, Korea. Summarizing and understanding large graphs • Problem Formulation: – proposed an information theoretic graph summarization technique that uses a carefully chosen vocabulary of graph primitives. • Effective and Scalable Algorithm: – An effective method which is near-linear on the number of edges of the input graph Conclusion