5. What is semantic data summarization? Why?
1. Summarizing entity descriptions
(a.k.a. entity summarization)
6. What is semantic data summarization? Why?
2. Summarizing entity associations
Alice Bob
article-A
paper-A AAAI
IJCAI
paper-B
paper-C
paper-D
inProcOf
secondAuthor reviewer
chair
firstAuthor
firstAuthor inProcOf
citessecondAuthor
cites
extends
firstAuthor
7. What is semantic data summarization? Why?
3. Summarizing semantic datasets
8. Two types of summaries
• Extractive methods
• summary = a subset of data
• summarization = ranking and selection
• Abstractive methods (a.k.a. non-extractive methods)
• summary = a high-level abstraction of data
• summarization = a more complex process
9.
10. Outline of this talk
• Summarizing entity descriptions
• Summarizing entity associations
• Summarizing semantic datasets
• Summarizing ontologies (if time permits)
11. Outline of this talk
• Summarizing entity descriptions
• Summarizing entity associations
• Summarizing semantic datasets
12. Summarizing entity descriptions
• Extractive methods
(summary = a subset of property-value pairs)
• Metrics for ranking property-value pairs
• Intrinsic metrics
• Extrinsic metrics
• Structures for combining metrics
• Abstractive methods
• Not known yet
Property Value
name Leonardo da Vinci
type Person
type Artist
dateOfBirth 1452-04-15
creates Mona Lisa
creates Lady with an Ermine
knownFor Mona Lisa
influenced Richard Feynman
…
14. Intrinsic metrics (1): frequency
• Frequency of property
• Frequency in the dataset
• Frequency among entities of the same type
• Frequency in this entity description
• Frequency in the ontology (i.e., richness of definition)
Property Value
name Leonardo da Vinci
type Person
type Artist
dateOfBirth 1452-04-15
creates Mona Lisa
creates Lady with an Ermine
knownFor Mona Lisa
influenced Richard Feynman
…
Property Value
…
influenced …
…
Property Value
…
type Artist
creates …
…
15. Intrinsic metrics (1): frequency
• Frequency of property value
• Frequency in the dataset
(note: entities in text)
• Frequency in this entity description
(note: indirect relations)
Property Value
name Leonardo da Vinci
type Person
type Artist
dateOfBirth 1452-04-15
creates Mona Lisa
creates Lady with an Ermine
knownFor Mona Lisa
influenced Richard Feynman
…
Property Value
…
… Mona Lisa
…
… Lady with an Ermine
…
Property Value
…
… …Mona Lisa…
…
Indirect relations
may also be counted.
16. Intrinsic metrics (1): frequency
• Frequency of property-value pair
• Frequency among similar entities
• Frequency in the dataset (why not?)
Property Value
name Leonardo da Vinci
type Person
type Artist
dateOfBirth 1452-04-15
creates Mona Lisa
creates Lady with an Ermine
knownFor Mona Lisa
influenced Richard Feynman
…
Property Value
…
type Artist
…
influenced Richard Feynman
…
(a similar entity)
17. Intrinsic metrics (2): centrality
• Centrality of property value
• Within the dataset: (weighted) PageRank
• On the Web: authority of datasets referencing it
Property Value
name Leonardo da Vinci
type Person
type Artist
dateOfBirth 1452-04-15
creates Mona Lisa
creates Lady with an Ermine
knownFor Mona Lisa
influenced Richard Feynman
…
18. Intrinsic metrics (2): centrality
• Centrality of property-value pair
• PageRank, weighted by inverse Google distance[Cheng et al., ISWC’11]
Property Value
name Leonardo da Vinci
type Person
type Artist
dateOfBirth 1452-04-15
creates Mona Lisa
creates Lady with an Ermine
knownFor Mona Lisa
influenced Richard Feynman
…
name: Leonardo da Vinci
type: Person
creates: Mona Lisa
…
19. Intrinsic metrics (3): informativeness
• Informativeness of property-value pair
• Self-information of property-value pair[Cheng et al., ISWC’11]
• Depth of class
Property Value
name Leonardo da Vinci
type Person
type Artist
dateOfBirth 1452-04-15
creates Mona Lisa
creates Lady with an Ermine
knownFor Mona Lisa
influenced Richard Feynman
…
Property Value
…
type Person
type Scientist
…
Person
Artist Scientist
20. Intrinsic metrics (4): diversity
• Diversity of properties
• To avoid common properties
• To avoid properties having similar values
Property Value
name Leonardo da Vinci
type Person
type Artist
dateOfBirth 1452-04-15
creates Mona Lisa
creates Lady with an Ermine
knownFor Mona Lisa
influenced Richard Feynman
…
21. Intrinsic metrics (4): diversity
• Diversity of property-value pairs[Cheng et al., JoWS’15, WWW’15]
• Similarity between text: string-based, word-based
• Similarity between numbers
• Semantic similarity: reasoning-based
Property Value
name Leonardo da Vinci
type Person
type Artist
dateOfBirth 1452-04-15
creates Mona Lisa
creates Lady with an Ermine
knownFor Mona Lisa
influenced Richard Feynman
…
Person
Artist Scientist
type:Artist ⇒ type:Person
23. Extrinsic metrics (1): using external knowledge
• Using domain knowledge
• Certain properties are known to be important.
• Using indicators on the Web
• Search engine hits
• Bidirectional links in Wikipedia
• Using user feedback
• User clicks
Property Value
name Leonardo da Vinci
type Person
type Artist
dateOfBirth 1452-04-15
creates Mona Lisa
creates Lady with an Ermine
knownFor Mona Lisa
influenced Richard Feynman
…
25. Extrinsic metrics (2): context-based
• Entities in a document
• context = contents of the document
• solution: Class Vector Model[Cheng et al., WWW’15]
Property Value
name Leonardo da Vinci
type Person
type Artist
dateOfBirth 1452-04-15
creates Mona Lisa
creates Lady with an Ermine
knownFor Mona Lisa
influenced Richard Feynman
…
vector = {Painting}
… The Starry Night, from MoMA’s
collection, reminds us of some work
painted by Leonardo da Vinci. ...
Property Value
…
type Painting
…
vector(context) = {Painting}
vector = {Artist}
26. Extrinsic metrics (2): context-based
• Co-summarization
• context = other entities
• solution:
• difference from other entities[Cheng et al., WWW’15]
(for entity linking)
• similarity with other entities[Cheng et al., JoWS’15]
(for entity coreference resolution)
27. Structures for combining metrics
1. Result combination
5
1
3
2
4
5
2
4
1
3
5
1
2
4
3
Ranked by
Metric A
Ranked by
Metric B
Ranked by
Metric C
Summary
28. Structures for combining metrics
1. Result combination (cont.)
Ranked by
Metric A
Ties broken
by Metric B
30. Structures for combining metrics
• e.g., combinatorial optimization
• Quadratic Knapsack Problem[Cheng et al., JoWS’15]
• Quadratic Multidimensional Knapsack Problem[Cheng et al., WWW’15]
Length constraint
Similarity with and
difference from
other entities
Inverse
similarity
Diagonal:
informativeness
One entity The other entity
Inverse
similarity
31. Structures for combining metrics
• e.g., weighted PageRank[Cheng et al., ISWC’11]
Property Value
name Leonardo da Vinci
type Person
type Artist
dateOfBirth 1452-04-15
creates Mona Lisa
creates Lady with an Ermine
knownFor Mona Lisa
influenced Richard Feynman
…
name: Leonardo da Vinci
type: Person
creates: Mona Lisa
…
Probability of jumpingProbability of following edges
Inverse Google distance Informativeness
33. Structures for combining metrics
4. Complex combinations
• Result combination + arithmetic combination
• Machine learning + arithmetic combination
34.
35. Outline of this talk
• Summarizing entity descriptions
• Summarizing entity associations
• Summarizing semantic datasets
36. Summarizing entity associations
• Extractive methods
• Finding and ranking associations between two entities
(summary = a subset of paths)
• Path finding and filtering
• Intrinsic and extrinsic metrics for ranking paths
• Structures for combining metrics
• Finding and ranking associations between multiple entities
(summary = a subset of subgraphs)
• Abstractive methods
• Ranking association patterns
• Hierarchically organizing association patterns
Alice Bob
article-A
paper-A AAAI
IJCAI
paper-B
paper-C
paper-D
inProcOf
secondAuthor reviewer
chair
firstAuthor
firstAuthor inProcOf
citessecondAuthor
cites
extends
firstAuthor
37. Finding associations between two entities
• Path finding
• Dijkstra or A*
• Bidirectional breadth-first search (bi-BFS)
• Schema-based performance optimization
Alice Bob
article-A
paper-
A
AAAI
IJCAI
paper-B
paper-C
paper-D
inProcOf
secondAuthor reviewer
chair
firstAuthor
firstAuthor inProcOf
citessecondAuthor
cites
extends
firstAuthor
Paper
Person
Conference
inProcOf
cites,
extends
O(Δd) O(Δd/2)
38. Finding associations between two entities
• Path filtering
• By length
• By entities, classes, relations
• By keywords
Alice Bob
article-A
paper-
A
AAAI
IJCAI
paper-B
paper-C
paper-D
inProcOf
secondAuthor reviewer
chair
firstAuthor
firstAuthor inProcOf
citessecondAuthor
cites
extends
firstAuthor
39. Ranking associations between two entities
• Intrinsic metrics
• Frequency
• Centrality
• Informativeness
• Diversity
• Length
• Conformity
• Extrinsic metrics
• Using external knowledge
• Context-based
• Structures for combining metrics
40. Intrinsic metrics: frequency, centrality, diversity, length
• Property frequency
• Degree centrality
• Diverse relations
• Length
Alice Bob
article-A
paper-
A
AAAI
IJCAI
paper-B
paper-C
paper-D
inProcOf
secondAuthor reviewer
chair
firstAuthor
firstAuthor inProcOf
citessecondAuthor
cites
extends
firstAuthor
41. Intrinsic metrics: informativeness
• Informativeness
• Data-based informativeness: inverse relation frequency
• Schema-based informativeness: depth of class/relation
Alice Bob
article-A
paper-
A
AAAI
IJCAI
paper-B
paper-C
paper-D
inProcOf
secondAuthor reviewer
chair
firstAuthor
firstAuthor inProcOf
citessecondAuthor
cites
extends
firstAuthor
42. Intrinsic metrics: conformity
• Conformity to schema
Alice Bob
article-A
paper-
A
AAAI
IJCAI
paper-B
paper-C
paper-D
inProcOf
secondAuthor reviewer
chair
firstAuthor
firstAuthor inProcOf
citessecondAuthor
cites
extends
firstAuthor
Paper
Person
Conference
inProcOf
cites,
extends
43. Extrinsic metrics
• Using external knowledge
• Explicit: user-defined weights
• Implicit: user’s Web browsing history
• Context-based
• Query relevance
Alice Bob
article-A
paper-A AAAI
IJCAI
paper-B
paper-C
paper-D
inProcOf
secondAuthor reviewer
chair
firstAuthor
firstAuthor inProcOf
citessecondAuthor
cites
extends
firstAuthor
44. Finding and ranking associations between multiple entities
• association = a size-constrained connected subgraph
(size = number of other entities)
3 associations
via 2 other
entities
45. Finding and ranking associations between multiple entities
• association = a size-constrained connected subgraph
(size = diameter)[Cheng et al., ISWC’16]
3 associations
having a
diameter of 3
46. Finding and ranking associations between multiple entities
• Subgraph finding
• n-directional breadth-first search
• Distance-based performance optimization[Cheng et al., ISWC’16]
47. Finding and ranking associations between multiple entities
• Subgraph ranking (based on entity ranking)
• PageRank
• Query relevance
• Number of short paths
• Random walk with restart
48. Finding and ranking associations between multiple entities
• association = a Steiner tree
(size-unconstrained, weight-minimized)
49. Abstractive methods
• Association pattern[Cheng et al., ISWC’14]
paper-A conf-A
inProcOfsecondAuthor reviewer
paper-B conf-B
inProcOffirstAuthor chair
Paper Conference
inProcOfauthor role
Patterns
Associations
51. Ranking association patterns
• Metrics
• Frequency
• Informativeness
• Diversity
• Structures for combining metrics
Paper Conference
inProcOfauthor role
52. Metrics: frequency
• frequency = occurrences of canonical code[Cheng et al., ISWC’16]
=
isomorphic?
eq
1r1C1r2C2r3eq
2$r4eq
3$$$$
(when T=e)
53. Metrics: frequency
• frequency = occurrences of canonical code[Cheng et al., ISWC’16]
?
Solution: using query entities as proxies for classes to be ordered
54. Hierarchically organizing association patterns
• subClassOf/subPropertyOf subPatternOf[Zhang et al., JIST’13]
Paper Conference
inProcOfauthor role
Demo Conference
inProcOfauthor reviewer
Poster Conference
inProcOfauthor chair
55.
56. Outline of this talk
• Summarizing entity descriptions
• Summarizing entity associations
• Summarizing semantic datasets
58. Extractive methods
• Triple ranking (based on entity ranking)
• Centrality: degree, PageRank
Alice Bob
article-A
paper-
A
AAAI
IJCAI
paper-B
paper-C
paper-D
inProcOf
secondAuthor reviewer
chair
firstAuthor
firstAuthor inProcOf
citessecondAuthor
cites
extends
firstAuthor
59. Abstractive methods (1): inferred schema
• summary = a graph-structured (sub-)schema inferred from data
(grouping entities by classes)
Alice Bob
article-A
paper-
A
AAAI
IJCAI
paper-B
paper-C
paper-D
inProcOf
secondAuthor reviewer
chair
firstAuthor
firstAuthor inProcOf
citessecondAuthor
cites
extends
firstAuthor
Paper
Person
Conference
inProcOf
cites,
extends
60. Abstractive methods (1): inferred schema
• Metrics for ranking classes and properties
• Frequency
• Centrality
Alice Bob
article-A
paper-
A
AAAI
IJCAI
paper-B
paper-C
paper-D
inProcOf
secondAuthor reviewer
chair
firstAuthor
firstAuthor inProcOf
citessecondAuthor
cites
extends
firstAuthor
Paper
Person
Conference
inProcOf
cites,
extends
61. Abstractive methods (2): flat partitioning
• summary = entity partitions connected by relations
• partitioning by shared classes (= inferred schema)
• partitioning by shared attributes
• partitioning by shared paths (a.k.a. bisimulation)
Alice Bob
article-A
paper-
A
AAAI
IJCAI
paper-B
paper-C
paper-D
inProcOf
secondAuthor reviewer
chair
firstAuthor
firstAuthor inProcOf
citessecondAuthor
cites
extends
firstAuthor
Paper
Person
Conference
inProcOf
cites,
extends
62. Abstractive methods (3): hierarchical grouping[Cheng et al., IJCAI’16]
• summary = a hierarchical grouping of entities
• identified by property-value pairs
• connected by relations
A hierarchical grouping of entities Relations connecting sibling groups
63. • Metrics for choosing groups (i.e., property-value pairs)
• Coverage of data large subgroups
• Height of hierarchy moderate-sized subgroups
• Cohesion within groups informative property-value pairs
• Overlap between groups controllable overlap
• Homogeneity of groups different values of the same property
A hierarchical grouping of entities Relations connecting sibling groups
Abstractive methods (3): hierarchical grouping[Cheng et al., IJCAI’16]
64. • Combining metrics by combinatorial optimization
(formulated as a multidimensional knapsack problem)
maximizing moderateness of each subgroup
maximizing cohesion
within each subgroup
disallowing large overlap
between subgroups
selecting ≤k subgroups
(optionally) disallowing different properties
Abstractive methods (3): hierarchical grouping[Cheng et al., IJCAI’16]
65. Concluding remarks
• Research
• More application scenarios are to be identified.
• New applications may promote new metrics.
• More benchmarks are needed for evaluation.
• Practice
• Handy tools for semantic data summarization are missing.
The 2016 ENtity Summarization Evaluation Campaign (ENSEC 2016)
http://km.aifb.kit.edu/ws/sumpre2016/challenge.html
66. Papers on summarizing entity descriptions
• Gong Cheng, Danyun Xu, Yuzhong Qu.
Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking.
(WWW'15)
• Gong Cheng, Danyun Xu, Yuzhong Qu.
C3D+P: A Summarization Method for Interactive Entity Resolution.
(JoWS’15)
• Gong Cheng, Thanh Tran, Yuzhong Qu.
RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization.
(ISWC’11)
• Gong Cheng, Yuzhong Qu.
Searching Linked Objects with Falcons: Approach, Implementation and Evaluation.
(IJSWIS’09)
67. Papers on summarizing entity associations
• Gong Cheng, Daxin Liu, Yuzhong Qu.
Efficient Algorithms for Association Finding and Frequent Association Pattern Mining.
(ISWC'16)
• Gong Cheng, Yanan Zhang, Yuzhong Qu.
Explass: Exploring Associations between Entities via Top-K Ontological Patterns and Facets.
(ISWC’14)
• Yanan Zhang, Gong Cheng, Yuzhong Qu.
Towards Exploratory Relationship Search: A Clustering-based Approach
(JIST’13)
68. Papers on summarizing semantic datasets
• Gong Cheng, Cheng Jin, Yuzhong Qu.
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization.
(IJCAI’16)
69.
70. Ontology
• Terms
• Publication
• Paper
• Conference
• title
• inProc
• Term descriptions
• SubClassOf(Paper, Publication)
• SubClassOf(Paper, DataExactCardinality(1, title))
• ObjectPropertyDomain(inProc, Paper)
• ObjectPropertyRange(inProc, Conference)
72. Summarizing ontologies
• Extractive methods
1. Ranking terms
(summary = a subset of terms)
2. Ranking term descriptions
(summary = a subset of term descriptions)
3. Ranking subgraphs
(summary = a subgraph)
• Abstractive methods
• Not known yet
75. Intrinsic metrics (2): centrality
• Middleness in the hierarchy
• Degree
• Betweenness
• PageRank
Paper
Publication
title
inProc
Conference
Publication
Paper Book
Article Poster
76. Intrinsic metrics (3): diversity
• Coverage of hierarchy
Publication
Paper Book
Article Poster
77. Intrinsic metrics (4): simplicity
• Number of words in the name of a term
Paper vs. PaperPublishedAtCCKS2016
78. Extrinsic metrics
• Using external knowledge
• Search engine hits
• Personalization (e.g., spreading activation)
• Context-based
• Query relevance Paper
Publication
title
inProc
Conference
79. Extractive methods (2): ranking term descriptions
• Graph representation of term descriptions
1. Description graph
2. Term-description graph
• Ranking term descriptions
• Intrinsic metrics
• Extrinsic metrics