SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Gleaning Types for Literals
in RDF Triples with Application to
Entity Summarization
1 Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis),
Wright State University, USA
2 National Key Laboratory for Novel Software Technology, Nanjing University, China
13th Extended Semantic Web Conference (ESWC ) 2016
Greece, 05.31.2016
Kalpa Gunaratna 1 Krishnaprasad Thirunarayan 1 Amit Sheth 1 Gong Cheng 2
o Literals and background of Entity Summarization
o Typing literals in knowledge graphs
o Entity Summarization (FACES-E)
o Evaluation
– Typing
– Entity Summarization with datatype properties
o Conclusion and Future Work
2
Talk Overview
o Considerable amount of information captured in datatype
properties.
– 1600 datatype properties vs. 1079 object properties in DBpedia
o Many literals can be “easily typed” for proper interpretation
and use.
– Example: in DBpedia, http://dbpedia.org/property/location has
~1,00,000 unique simple literals that can be directly mapped to
entities.
o Added semantics can be used in practical and useful
applications like (i) entity summarization, (ii) property
alignment, (iii) data integration, and (iv) dataset profiling.
3
Motivating Facts – Literals and Semantics
o Datasets and knowledge graphs on the web continue to grow
in number and size.
– DBpedia (3.9) has around 200 triples on average per entity.
o All the facts of an entity are difficult to process when browsing.
o Better presentation is required. Good quality summaries can
help!
4
Lets Focus on Entity Summarization now …..
5
Importance of Entities and Summaries
Google has its own knowledge graph called
Google Knowledge Graph (GKG) to facilitate
search.
Google made summarization their second
priority in building GKG*.
* Singhal, A. 2012. Introducing the knowledge graph: things, not strings. Official Google Blog, May.
o Introduced FACES (FACeted Entity Summaries) approach *.
o FACES follows two main steps.
o First, it groups “conceptually” similar features.
– Two groups will have different facts from each other.
o Second, it picks features (property-value pairs) from these
groups, improving diversity, for the summaries.
6
Diversity-Aware Entity Summaries (FACES approach) - Background
* Kalpa Gunaratna, Krishnaprasad Thirunarayan, and Amit Sheth. 'FACES: Diversity-Aware Entity Summarization using
Incremental Hierarchical Conceptual Clustering'. 29th AAAI Conference on Artificial Intelligence (AAAI 2015), AAAI, 2015.
7
Faceted Entity Summary - Example
Marie Curie
Pierre Curie Warsaw Passy,_Haute-
Savoie
ESPCI_ParisTechUniversity_of_Paris
Radioactivity
Chemistry
Birth
Place
Field
Concise and comprehensive summary
could be: {f1,f2, f6}
Another summary could be: {f4, f6, f7}
o FACES utilizes type semantics of objects in grouping features.
o Literals in RDF triples do not have “semantic” types. They only
have primary data types (e.g., date, integer, string, etc).
o Can we try to add semantic types to literals? How?
8
Information coming from literals???
o FACES can only handle object property based features.
o Why? – any specific reason???
– Values of the features are not URIs and have no “semantic”
types.
– Hence, the adapted algorithm (Cobweb) for grouping features
can not get types for property object values.
– It can not create the partitions for faceted entity summaries.
o Our contributions are to:
– First compute types for the values of datatype property based
features (data enrichment).
– Then, adapt and improve ranking algorithms (summarization).
FACES-E system.
9
Typing Literals in RDF Triples for Entity Summarization
10
Typing Datatype Property Values - Example
dbr:Barack_Obama dbo:Politician
dbo:Politician
dbp:vicePresident
dbr:Joe_Biden
rdf:type
dbr:Barack_Obama “44th President of the United States”^^xsd:string
dbp:shortDescription
dbr:Calvin_Coolidge “48th Governor of Massachusetts”^^xsd:string
dbp:orderInOffice
dbo:President
dbo:Governor
rdf:subClassOf
rdf:subClassOf
o Focus of the literal is not clear unlike URIs.
o May contain several entities or labels matching ontology
classes.
o The literal can be long.
– In this work, we focus on one sentence long literals.
– For a paragraph like text, finding a single focus is hard and needs
different techniques.
11
Why is it Hard?
44th President of the United States
option 1
option 2 option 3
o We expect the focus of the sentence or phrase leads to the
representative entity/type of the sentence.
o There are prominent works on identifying head word of a
sentence/phrase.
– Example: member of committee
o We use existing head word detection algorithms to identify the
focus term.
– Collins’ head word detection algorithm
12
Focus term identification
We filter out date and numeric values.
1. Exact matching of focus term to class labels.
– E.g., “48th Governor of Massachusetts”  Governor (class)
2. Get the n-grams and see for a matching class using n-gram and
focus term overlap (maximal match).
I. Check for a matching class for an overlapping n-gram.
II. If a type not found, spot entities in the n-grams and get their types.
• “United States Senate”  “United State Senate” 3-gram matches the
entity in DBpedia.
3. Semantic matching of focus term to class labels.
– We compare pairwise similarity of the focus term with all the class
labels and pick the highest (we utilize UMBC similarity service).
13
Deriving type (class) from head word
14
Process Flow
N-grams
Extractor
Head-word
Detector
Entity Spotter
Phrase
Identifier
Primary Type
Filter
N-grams + Head-word
to Class Label Matcher
Head-word
Semantic
Matcher
Types for the literal
Pre-processing
Type processing
Head-word to
Class Label
Matcher
15
Typing Literals Algorithm Outline
If you really wanted to know how ….
o Ranking mechanism for objects (in the FACES) do not work.
– Why? Two literals can be unique even if their types and the
main entities are the same.
• Example, “United States President” Vs. “President of the United
States” (counting is affected).
• Not desirable to search using the whole phrase.
– Hence, use entities.
– A literal can have several entities. Which one to choose?
16
Ranking Datatype Property Features
o We observe humans recognize popular entities.
– Entities can be in literals with variations.
o We use the popular entities in literals and not the literals
themselves for ranking.
o Functions
– Function ES(v) returns all entities present in the value v.
– Function max(ES(v)) returns the most popular entity in ES(v).
17
Idea for Ranking
v = “44th President of the United States”
ES(v) = {db:President, db:United States}
max(ES(v)) = db:United States
Remember: our goal and objective of ranking is disjoint with typing mechanism
18
Modified Ranking Equations
If you really wanted to know …
informativeness is inversely proportional to the number of entities that are associated with
overlapping values containing the most popular entity of feature f.
Frequency of the most popular entity in v.
tf-idf based ranking score.
o Aggregate feature ranking scores for each facet.
o Rank facets based on the aggregated scores.
19
Facet Ranking
Rank(f) is the original function and Rank(f)’ is the modified one for datatype property based features.
1. Extract features for the entity e.
2. Enrich each feature and get the WordSet WS(f).
3. Enriched feature set FS(e) is input to the partitioning algorithm and get facet set
F(e).
4. First get the feature ranking scores (R(f)) and then compute the facet ranking
scores for each facet (FacetRank(F(e)).
5. Top ranked features from top ranked facets in the order are picked to form the
faceted entity summary. The constraints defined in the definition for the faceted
entity summary hold.
20
FACES-E Entity Summary Generation
(1) (2) (3) (4) (5)
Enriching Literals Modified Ranking
Literal Types
United States Ambassador to the United Nations Agent, Ambassador, Person
Chairman of the Republican National Committee Agent, Politician, Person, President,
United States Navy Agent, Organisation, Military Unit
Member of the New York State Senate Agent, OrganisationMember, Person
Senate Minority Leader Agent, Politician, Person, President
United States Senate Agent, Organisation, Legislature
from Virginia Administrative Region, Place, Region,
Populated Place
Denison, Texas, U.S. Administrative Region, Place, Country,
Region,Populated Place
21
Type Computation Samples
with super types excluding owl:Thing
o Type Set TS(v) is the generated set of types for the value v.
22
Evaluation – Type Generation Metrics
n is the total number of features.
o DBpedia Spotlight is used as the baseline and had 1117 unique
property-value pairs (features).
o 118 pairs (consisting of labelling properties and noisy features)
were removed.
o Results convey that special care should be taken in deciding
types for literals.
23
Evaluation – Type Generation
Mean Precision (MP) Any Mean Precision (AMP) Coverage
Our approach 0.8290 0.8829 0.8529
Baseline 0.4867 0.5825 0.5533
24
Evaluation – Summarization Metrics
Average pairwise agreement of the ideal summaries
Average summary overlap between system generated and ideal summaries.
o The gold standard consists of 20 random entities used in FACES
taken from DBpedia 3.9 and 60 random entities taken from
DBpedia 2015-04.
o 17 human users created ideal summaries (total of 900). Each
entity received at least 4 ideal summaries for each length.
25
Evaluation – FACES-E Summary Generation
System k = 5 k = 10
Avg. Quality % Increase Avg. Quality % Increase
FACES-E 1.5308 - 4.5320 -
RELIN 0.9611 59 % 3.0988 46 %
RELINM 1.0251 49 % 3.6514 24 %
Avg. Agreement 2.1168 5.4363
k is the summary length
o Consider meaning of the property name to compute types.
o Literals and properties are noisy.
– Identify those automatically to filter out.
– Filter out labelling properties (automatic identification). This is
hard.
o A formal model to capture the semantic types in RDF for
literals.
– Without changing their original representation (literals).
26
Future Work
27
Thank You
http://knoesis.wright.edu/researchers/kalpa
kalpa@knoesis.org
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, Ohio, USA
Questions ?
FACES project page: http://wiki.knoesis.org/index.php/FACES
28
Appendix
o Entities are described by features.
o Feature: A property-value pair is called a feature.
o Feature Set: All the features that describe an entity.
o Entity Summary of size k: A subset of the feature set for an entity,
constrained by size k.
29
Preliminaries
Entity summaries for k=3:
{f1,f2,f5}, {f4, f6, f7}, {f3,f4,f5}, …
Entity – Marie Curie
Feature Set Features Property Value
FS
f1 spouse Pierre_Curie
f2 birthPlace Warsaw
f3 deathPlace Passy_Haute-Savoie
f4 almaMater ESPI_ParisTech
f5 workInstitutions University_of_Paris
f6 knownFor Radioactivity
f7 field Chemistry
Facets (partition)
Given an entity e, a set of facets F(e) of e is a partition of the feature set FS(e). That is, F(e) =
{C1, C2, ..Cn} such that F(e) satisfies:
(i) Non-empty: ∅ ∉ F(e).
(ii) Collectively exhaustive: C1 ∪ C2 ∪…Cn = FS(e).
(iii) Mutually (pairwise) disjoint: Ci ≠ Cj then Ci ∩ Cj = ∅.
Faceted entity summary
Given an entity e and a positive integer k < |FS(e)|, faceted entity summary of e of size k,
FSumm(e,k), is a collection of features such that FSumm(e,k) ⊂ FS(e), |FSumm(e,k)| = k.
Further, either (i) k > |F(e)| and ∀X ∈ F(e), X ∩ FSumm(e,k) ≠ ∅ or (ii) k ≤ |F(e)| and ∀X ∈ F(e),
| X ∩ FSumm(e,k)| ≤ 1 holds, where F(e) is a set of facets of FS(e).
30
Faceted Entity Summary
Faceted entity summary,
k=2: {f1, f6}
k=3: {f1, f2, f6}

Weitere ähnliche Inhalte

Was ist angesagt?

Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Searchkrisztianbalog
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISrathnaarul
 
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)krisztianbalog
 
Different Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsDifferent Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsAndre Freitas
 
Harnessing Linked Knowledge Sources for Topic Classification in Social Media
Harnessing Linked Knowledge Sources for Topic Classification in Social MediaHarnessing Linked Knowledge Sources for Topic Classification in Social Media
Harnessing Linked Knowledge Sources for Topic Classification in Social MediaAmparo Elizabeth Cano Basave
 
Ontology mapping for the semantic web
Ontology mapping for the semantic webOntology mapping for the semantic web
Ontology mapping for the semantic webWorawith Sangkatip
 
Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Andre Freitas
 
Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalSudarsun Santhiappan
 
Data Science as a Career and Intro to R
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to RAnshik Bansal
 
Categorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary DefinitionsCategorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary DefinitionsAndre Freitas
 
Word Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesWord Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesAndre Freitas
 
Introduction to Ontology Engineering with Fluent Editor 2014
Introduction to Ontology Engineering with Fluent Editor 2014Introduction to Ontology Engineering with Fluent Editor 2014
Introduction to Ontology Engineering with Fluent Editor 2014Cognitum
 
Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
Schema-Agnostic Queries (SAQ-2015): Semantic Web ChallengeSchema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
Schema-Agnostic Queries (SAQ-2015): Semantic Web ChallengeAndre Freitas
 
Semantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional ApproachSemantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional ApproachAndre Freitas
 
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...Iman Mirrezaei
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Textbutest
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining Rupak Roy
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment AnalysisRupak Roy
 

Was ist angesagt? (20)

Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Search
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
 
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
 
Different Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsDifferent Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering Systems
 
Harnessing Linked Knowledge Sources for Topic Classification in Social Media
Harnessing Linked Knowledge Sources for Topic Classification in Social MediaHarnessing Linked Knowledge Sources for Topic Classification in Social Media
Harnessing Linked Knowledge Sources for Topic Classification in Social Media
 
Ontology mapping for the semantic web
Ontology mapping for the semantic webOntology mapping for the semantic web
Ontology mapping for the semantic web
 
Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
 
Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information Retrieval
 
Data Science as a Career and Intro to R
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to R
 
Categorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary DefinitionsCategorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary Definitions
 
Word Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesWord Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology Classes
 
Introduction to Ontology Engineering with Fluent Editor 2014
Introduction to Ontology Engineering with Fluent Editor 2014Introduction to Ontology Engineering with Fluent Editor 2014
Introduction to Ontology Engineering with Fluent Editor 2014
 
Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
Schema-Agnostic Queries (SAQ-2015): Semantic Web ChallengeSchema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
 
Semantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional ApproachSemantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional Approach
 
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
 
Ontology
OntologyOntology
Ontology
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Text
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment Analysis
 

Andere mochten auch

magnet gecko revised
magnet gecko revisedmagnet gecko revised
magnet gecko revisedsam wong
 
Steiner banc de jardin schönbrunn, 2 places blanc
Steiner banc de jardin schönbrunn, 2 places blancSteiner banc de jardin schönbrunn, 2 places blanc
Steiner banc de jardin schönbrunn, 2 places blancsabarongafs
 
What Are Radiologists?
What Are Radiologists?What Are Radiologists?
What Are Radiologists?Rose Radiology
 
Media Deck - New Mexico Department of Tourism
Media Deck - New Mexico Department of TourismMedia Deck - New Mexico Department of Tourism
Media Deck - New Mexico Department of TourismSean Arthur
 
Blogher: Information Overload and Finding Blog Community
Blogher: Information Overload and Finding Blog CommunityBlogher: Information Overload and Finding Blog Community
Blogher: Information Overload and Finding Blog CommunityBeth Kanter
 
Las nuevas tecnologías en el mundo actual 14
Las  nuevas  tecnologías  en  el mundo actual 14Las  nuevas  tecnologías  en  el mundo actual 14
Las nuevas tecnologías en el mundo actual 14giancarlo reyes
 
Electronicocorreo electronico
Electronicocorreo electronicoElectronicocorreo electronico
Electronicocorreo electronicopamelaquinga
 
Logomarca
LogomarcaLogomarca
Logomarcacrioula
 
Trabalho Introdução à Multimédia - Carolina Cerejo
Trabalho Introdução à Multimédia - Carolina CerejoTrabalho Introdução à Multimédia - Carolina Cerejo
Trabalho Introdução à Multimédia - Carolina Cerejocarolinacerejo
 
Choosing the Right WordPress Theme
Choosing the Right WordPress ThemeChoosing the Right WordPress Theme
Choosing the Right WordPress ThemeChris Burgess
 

Andere mochten auch (17)

magnet gecko revised
magnet gecko revisedmagnet gecko revised
magnet gecko revised
 
Steiner banc de jardin schönbrunn, 2 places blanc
Steiner banc de jardin schönbrunn, 2 places blancSteiner banc de jardin schönbrunn, 2 places blanc
Steiner banc de jardin schönbrunn, 2 places blanc
 
What Are Radiologists?
What Are Radiologists?What Are Radiologists?
What Are Radiologists?
 
Media Deck - New Mexico Department of Tourism
Media Deck - New Mexico Department of TourismMedia Deck - New Mexico Department of Tourism
Media Deck - New Mexico Department of Tourism
 
La lechuza y la paloma (1)
La lechuza y la paloma (1)La lechuza y la paloma (1)
La lechuza y la paloma (1)
 
Blogher: Information Overload and Finding Blog Community
Blogher: Information Overload and Finding Blog CommunityBlogher: Information Overload and Finding Blog Community
Blogher: Information Overload and Finding Blog Community
 
Las nuevas tecnologías en el mundo actual 14
Las  nuevas  tecnologías  en  el mundo actual 14Las  nuevas  tecnologías  en  el mundo actual 14
Las nuevas tecnologías en el mundo actual 14
 
Electronicocorreo electronico
Electronicocorreo electronicoElectronicocorreo electronico
Electronicocorreo electronico
 
Teoría cognitiva.
Teoría cognitiva.Teoría cognitiva.
Teoría cognitiva.
 
Teclado
TecladoTeclado
Teclado
 
Logomarca
LogomarcaLogomarca
Logomarca
 
Trabalho Introdução à Multimédia - Carolina Cerejo
Trabalho Introdução à Multimédia - Carolina CerejoTrabalho Introdução à Multimédia - Carolina Cerejo
Trabalho Introdução à Multimédia - Carolina Cerejo
 
Anandakumar-T-1
Anandakumar-T-1Anandakumar-T-1
Anandakumar-T-1
 
Gross Receipts Tax Power Point
Gross Receipts Tax Power PointGross Receipts Tax Power Point
Gross Receipts Tax Power Point
 
English Presentation
English PresentationEnglish Presentation
English Presentation
 
Choosing the Right WordPress Theme
Choosing the Right WordPress ThemeChoosing the Right WordPress Theme
Choosing the Right WordPress Theme
 
Galápagos
GalápagosGalápagos
Galápagos
 

Ähnlich wie Gleaning Types for Literals in RDF with Application to Entity Summarization

Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...CITE
 
2015 07-tuto2-clus type
2015 07-tuto2-clus type2015 07-tuto2-clus type
2015 07-tuto2-clus typejins0618
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsDerek Kane
 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAijistjournal
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations onijistjournal
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsAndre Freitas
 
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Cuong Tran Van
 
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...Holistic Benchmarking of Big Linked Data
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similaritypathsproject
 
Doc format.
Doc format.Doc format.
Doc format.butest
 
58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-Language
58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-Language58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-Language
58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-LanguageMarius Corici
 

Ähnlich wie Gleaning Types for Literals in RDF with Application to Entity Summarization (20)

B017441015
B017441015B017441015
B017441015
 
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
 
D017422528
D017422528D017422528
D017422528
 
2015 07-tuto2-clus type
2015 07-tuto2-clus type2015 07-tuto2-clus type
2015 07-tuto2-clus type
 
Eurolan 2005 Pedersen
Eurolan 2005 PedersenEurolan 2005 Pedersen
Eurolan 2005 Pedersen
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text Analytics
 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on
 
Unit 2 DBMS
Unit 2 DBMSUnit 2 DBMS
Unit 2 DBMS
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP Systems
 
Data modeling
Data modelingData modeling
Data modeling
 
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...
 
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
 
Eacl 2006 Pedersen
Eacl 2006 PedersenEacl 2006 Pedersen
Eacl 2006 Pedersen
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
 
Doc format.
Doc format.Doc format.
Doc format.
 
58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-Language
58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-Language58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-Language
58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-Language
 
Ijcai 2007 Pedersen
Ijcai 2007 PedersenIjcai 2007 Pedersen
Ijcai 2007 Pedersen
 
rdbmsdol.pptx
rdbmsdol.pptxrdbmsdol.pptx
rdbmsdol.pptx
 
rdbmsdol.pptx
rdbmsdol.pptxrdbmsdol.pptx
rdbmsdol.pptx
 

Kürzlich hochgeladen

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Kürzlich hochgeladen (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Gleaning Types for Literals in RDF with Application to Entity Summarization

  • 1. Gleaning Types for Literals in RDF Triples with Application to Entity Summarization 1 Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State University, USA 2 National Key Laboratory for Novel Software Technology, Nanjing University, China 13th Extended Semantic Web Conference (ESWC ) 2016 Greece, 05.31.2016 Kalpa Gunaratna 1 Krishnaprasad Thirunarayan 1 Amit Sheth 1 Gong Cheng 2
  • 2. o Literals and background of Entity Summarization o Typing literals in knowledge graphs o Entity Summarization (FACES-E) o Evaluation – Typing – Entity Summarization with datatype properties o Conclusion and Future Work 2 Talk Overview
  • 3. o Considerable amount of information captured in datatype properties. – 1600 datatype properties vs. 1079 object properties in DBpedia o Many literals can be “easily typed” for proper interpretation and use. – Example: in DBpedia, http://dbpedia.org/property/location has ~1,00,000 unique simple literals that can be directly mapped to entities. o Added semantics can be used in practical and useful applications like (i) entity summarization, (ii) property alignment, (iii) data integration, and (iv) dataset profiling. 3 Motivating Facts – Literals and Semantics
  • 4. o Datasets and knowledge graphs on the web continue to grow in number and size. – DBpedia (3.9) has around 200 triples on average per entity. o All the facts of an entity are difficult to process when browsing. o Better presentation is required. Good quality summaries can help! 4 Lets Focus on Entity Summarization now …..
  • 5. 5 Importance of Entities and Summaries Google has its own knowledge graph called Google Knowledge Graph (GKG) to facilitate search. Google made summarization their second priority in building GKG*. * Singhal, A. 2012. Introducing the knowledge graph: things, not strings. Official Google Blog, May.
  • 6. o Introduced FACES (FACeted Entity Summaries) approach *. o FACES follows two main steps. o First, it groups “conceptually” similar features. – Two groups will have different facts from each other. o Second, it picks features (property-value pairs) from these groups, improving diversity, for the summaries. 6 Diversity-Aware Entity Summaries (FACES approach) - Background * Kalpa Gunaratna, Krishnaprasad Thirunarayan, and Amit Sheth. 'FACES: Diversity-Aware Entity Summarization using Incremental Hierarchical Conceptual Clustering'. 29th AAAI Conference on Artificial Intelligence (AAAI 2015), AAAI, 2015.
  • 7. 7 Faceted Entity Summary - Example Marie Curie Pierre Curie Warsaw Passy,_Haute- Savoie ESPCI_ParisTechUniversity_of_Paris Radioactivity Chemistry Birth Place Field Concise and comprehensive summary could be: {f1,f2, f6} Another summary could be: {f4, f6, f7}
  • 8. o FACES utilizes type semantics of objects in grouping features. o Literals in RDF triples do not have “semantic” types. They only have primary data types (e.g., date, integer, string, etc). o Can we try to add semantic types to literals? How? 8 Information coming from literals???
  • 9. o FACES can only handle object property based features. o Why? – any specific reason??? – Values of the features are not URIs and have no “semantic” types. – Hence, the adapted algorithm (Cobweb) for grouping features can not get types for property object values. – It can not create the partitions for faceted entity summaries. o Our contributions are to: – First compute types for the values of datatype property based features (data enrichment). – Then, adapt and improve ranking algorithms (summarization). FACES-E system. 9 Typing Literals in RDF Triples for Entity Summarization
  • 10. 10 Typing Datatype Property Values - Example dbr:Barack_Obama dbo:Politician dbo:Politician dbp:vicePresident dbr:Joe_Biden rdf:type dbr:Barack_Obama “44th President of the United States”^^xsd:string dbp:shortDescription dbr:Calvin_Coolidge “48th Governor of Massachusetts”^^xsd:string dbp:orderInOffice dbo:President dbo:Governor rdf:subClassOf rdf:subClassOf
  • 11. o Focus of the literal is not clear unlike URIs. o May contain several entities or labels matching ontology classes. o The literal can be long. – In this work, we focus on one sentence long literals. – For a paragraph like text, finding a single focus is hard and needs different techniques. 11 Why is it Hard? 44th President of the United States option 1 option 2 option 3
  • 12. o We expect the focus of the sentence or phrase leads to the representative entity/type of the sentence. o There are prominent works on identifying head word of a sentence/phrase. – Example: member of committee o We use existing head word detection algorithms to identify the focus term. – Collins’ head word detection algorithm 12 Focus term identification
  • 13. We filter out date and numeric values. 1. Exact matching of focus term to class labels. – E.g., “48th Governor of Massachusetts”  Governor (class) 2. Get the n-grams and see for a matching class using n-gram and focus term overlap (maximal match). I. Check for a matching class for an overlapping n-gram. II. If a type not found, spot entities in the n-grams and get their types. • “United States Senate”  “United State Senate” 3-gram matches the entity in DBpedia. 3. Semantic matching of focus term to class labels. – We compare pairwise similarity of the focus term with all the class labels and pick the highest (we utilize UMBC similarity service). 13 Deriving type (class) from head word
  • 14. 14 Process Flow N-grams Extractor Head-word Detector Entity Spotter Phrase Identifier Primary Type Filter N-grams + Head-word to Class Label Matcher Head-word Semantic Matcher Types for the literal Pre-processing Type processing Head-word to Class Label Matcher
  • 15. 15 Typing Literals Algorithm Outline If you really wanted to know how ….
  • 16. o Ranking mechanism for objects (in the FACES) do not work. – Why? Two literals can be unique even if their types and the main entities are the same. • Example, “United States President” Vs. “President of the United States” (counting is affected). • Not desirable to search using the whole phrase. – Hence, use entities. – A literal can have several entities. Which one to choose? 16 Ranking Datatype Property Features
  • 17. o We observe humans recognize popular entities. – Entities can be in literals with variations. o We use the popular entities in literals and not the literals themselves for ranking. o Functions – Function ES(v) returns all entities present in the value v. – Function max(ES(v)) returns the most popular entity in ES(v). 17 Idea for Ranking v = “44th President of the United States” ES(v) = {db:President, db:United States} max(ES(v)) = db:United States Remember: our goal and objective of ranking is disjoint with typing mechanism
  • 18. 18 Modified Ranking Equations If you really wanted to know … informativeness is inversely proportional to the number of entities that are associated with overlapping values containing the most popular entity of feature f. Frequency of the most popular entity in v. tf-idf based ranking score.
  • 19. o Aggregate feature ranking scores for each facet. o Rank facets based on the aggregated scores. 19 Facet Ranking Rank(f) is the original function and Rank(f)’ is the modified one for datatype property based features.
  • 20. 1. Extract features for the entity e. 2. Enrich each feature and get the WordSet WS(f). 3. Enriched feature set FS(e) is input to the partitioning algorithm and get facet set F(e). 4. First get the feature ranking scores (R(f)) and then compute the facet ranking scores for each facet (FacetRank(F(e)). 5. Top ranked features from top ranked facets in the order are picked to form the faceted entity summary. The constraints defined in the definition for the faceted entity summary hold. 20 FACES-E Entity Summary Generation (1) (2) (3) (4) (5) Enriching Literals Modified Ranking
  • 21. Literal Types United States Ambassador to the United Nations Agent, Ambassador, Person Chairman of the Republican National Committee Agent, Politician, Person, President, United States Navy Agent, Organisation, Military Unit Member of the New York State Senate Agent, OrganisationMember, Person Senate Minority Leader Agent, Politician, Person, President United States Senate Agent, Organisation, Legislature from Virginia Administrative Region, Place, Region, Populated Place Denison, Texas, U.S. Administrative Region, Place, Country, Region,Populated Place 21 Type Computation Samples with super types excluding owl:Thing
  • 22. o Type Set TS(v) is the generated set of types for the value v. 22 Evaluation – Type Generation Metrics n is the total number of features.
  • 23. o DBpedia Spotlight is used as the baseline and had 1117 unique property-value pairs (features). o 118 pairs (consisting of labelling properties and noisy features) were removed. o Results convey that special care should be taken in deciding types for literals. 23 Evaluation – Type Generation Mean Precision (MP) Any Mean Precision (AMP) Coverage Our approach 0.8290 0.8829 0.8529 Baseline 0.4867 0.5825 0.5533
  • 24. 24 Evaluation – Summarization Metrics Average pairwise agreement of the ideal summaries Average summary overlap between system generated and ideal summaries.
  • 25. o The gold standard consists of 20 random entities used in FACES taken from DBpedia 3.9 and 60 random entities taken from DBpedia 2015-04. o 17 human users created ideal summaries (total of 900). Each entity received at least 4 ideal summaries for each length. 25 Evaluation – FACES-E Summary Generation System k = 5 k = 10 Avg. Quality % Increase Avg. Quality % Increase FACES-E 1.5308 - 4.5320 - RELIN 0.9611 59 % 3.0988 46 % RELINM 1.0251 49 % 3.6514 24 % Avg. Agreement 2.1168 5.4363 k is the summary length
  • 26. o Consider meaning of the property name to compute types. o Literals and properties are noisy. – Identify those automatically to filter out. – Filter out labelling properties (automatic identification). This is hard. o A formal model to capture the semantic types in RDF for literals. – Without changing their original representation (literals). 26 Future Work
  • 27. 27 Thank You http://knoesis.wright.edu/researchers/kalpa kalpa@knoesis.org Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, Ohio, USA Questions ? FACES project page: http://wiki.knoesis.org/index.php/FACES
  • 29. o Entities are described by features. o Feature: A property-value pair is called a feature. o Feature Set: All the features that describe an entity. o Entity Summary of size k: A subset of the feature set for an entity, constrained by size k. 29 Preliminaries Entity summaries for k=3: {f1,f2,f5}, {f4, f6, f7}, {f3,f4,f5}, … Entity – Marie Curie Feature Set Features Property Value FS f1 spouse Pierre_Curie f2 birthPlace Warsaw f3 deathPlace Passy_Haute-Savoie f4 almaMater ESPI_ParisTech f5 workInstitutions University_of_Paris f6 knownFor Radioactivity f7 field Chemistry
  • 30. Facets (partition) Given an entity e, a set of facets F(e) of e is a partition of the feature set FS(e). That is, F(e) = {C1, C2, ..Cn} such that F(e) satisfies: (i) Non-empty: ∅ ∉ F(e). (ii) Collectively exhaustive: C1 ∪ C2 ∪…Cn = FS(e). (iii) Mutually (pairwise) disjoint: Ci ≠ Cj then Ci ∩ Cj = ∅. Faceted entity summary Given an entity e and a positive integer k < |FS(e)|, faceted entity summary of e of size k, FSumm(e,k), is a collection of features such that FSumm(e,k) ⊂ FS(e), |FSumm(e,k)| = k. Further, either (i) k > |F(e)| and ∀X ∈ F(e), X ∩ FSumm(e,k) ≠ ∅ or (ii) k ≤ |F(e)| and ∀X ∈ F(e), | X ∩ FSumm(e,k)| ≤ 1 holds, where F(e) is a set of facets of FS(e). 30 Faceted Entity Summary Faceted entity summary, k=2: {f1, f6} k=3: {f1, f2, f6}

Hinweis der Redaktion

  1. Property facts taken from DBpedia 3.9 statistics (I believe) Easily types literal example: “California”, “Greece” etc., related to location property We talk about entity summarization as a usecase from here onwards together with Typing.
  2. DBpedia 2.0 had 1.95 million things and DBpedia 2015-04 has 5.9 million things. LOD had 295 datasets in 2011 and in 2014 it had 1014 datasets. Entity – A real world thing (e.g., person, book, place) at the data level that encapsulates facts and is represented by a URI. Entity summary is a subset of facts that represent the entity.
  3. Explain conceptual idea -> we want facts talking about one “theme” to be grouped together.
  4. Conceptually similar features are colored in the same color.
  5. Refer to previous slide in grouping using types. Explain with example: almaMater and workInstitute properties are both talking about places.
  6. Possible reasons for creating most of the literals instead of URI resources: (i) the creator was unable to find a suitable entity URI for the object value, and hence chose to use a literal instead, (ii) the creator of the triple did not want to attach more details to the value and hence represented it in plain text, (iii) the value contains only basic implementation types like integer, boolean, and date, and hence not meaningful to create an entity, or (iv) the value has a lengthy description spanning several sentences (e.g., dbo:abstract property in DBpedia) that covers a diverse set of entities and facts.
  7. Option 1 and 2 seems to be the right pick (one of them). Another example, 48th Governor of Massachusetts  person and populated place Another one United States Ambassador to the United Nations
  8. We used Dbpedia 2015-04 dataset at the time of processing Numeric and Date, can not do more than these types. ( Governor  Matches class Governor (entity Governor of owl:Thing) ) Case 2: Senate  matches to an entity of class Thing, so it didn’t get a type from step 1. United States Senate  matches to Legislature Harvard Law School  focus term is school. 3-gram leads to Hardvard_Law_School entitity  Educational Institute
  9. Head word detection – Colin’s Head Word Detection algorithm. Directly matches head word to class Matches N-grams and head word to class label or else, match entities to N-grams and head word and then get the types. Semantic matcher of head word using UMBC matching service.
  10. Inf(f)’ – count # of entities having the feature. Property should match but value has to contain the popular entity of the input feature’s value. Po(v)’ – count the number of triples that have the matching feature with most popular entity of the value.
  11. F(e) is the facet set
  12. Colored parts are the new additions/modifications
  13. We got super classes other than Thing in this evaluation for both baseline and our approach. Recall is not measured because it is hard to do so (check so many pairs). For DBpedia 2015-04 version
  14. Summ(e) is the system generated summary SummI(e) is the ideal summary
  15. For this evaluation we used both DBpedia 3.9 and DBpedia 2015-04 versions.
  16. Labelling properties should not be typed, they are probably there to just referencing as a label. For example, “name” property value typing as a Person looks odd.