SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
Metrics for Evaluating Quality of
Embeddings for Ontological Concepts
Faisal Alshargi
University of Leipzig
Saeedeh Shekarpour
University of Dayton
Tommaso Soru
University of Leipzig
Amit Sheth
Kno.e.sis Center
1
Motivation: the growth of Linked Open Data
2
March 2018
~3000 Datasets
>190 Triples
Motivation: Knowledge Transformation
It is necessary to dispose this valuable knowledge for extrinsic tasks such as natural language processing
or data mining.
➔ the knowledge (i.e. schema level and instance level) has to be transformed from discrete
representations to numerical representations (called embeddings).
3
Research Gap and our Contribution
Existing deficiency: is the lack of an evaluation framework for comprehensive and fair judgment on the
quality of the embeddings of concepts
Evaluation can be intrinsic or extrinsic
Our Contribution: concerned with evaluating the quality of embeddings for concepts. It extends the
state of the art by providing several intrinsic metrics for evaluating the quality of the embedding of
concepts on three aspects:
i. the categorization aspect,
ii. the hierarchical aspect, and
iii. the relational aspect.
4
Ontological Concept
Ontological concepts play a crucial role in
➔ capturing the semantics of a particular domain
➔ typing entities which bridge a schema level and
an instance level
➔ determining valid types of sources and
destinations for relations in a knowledge graph.
Thus, the embeddings of the concepts are expected to
truly reflect characteristics of ontological concepts in
the embedding space.
5
Contribution
contributing in providing several metrics for evaluating the quality of the embedding of concepts from
three perspectives:
(i) how the embedding of concepts behaves for categorizing their instantiated entities;
(ii) how the embedding of concepts behaves with respect to hierarchical semantics described in the
underlying ontology; and
(iii) how the embedding of concepts behaves with respect to relations
6
Embedding Models (1)
➔ Word2vec: Skip-Gram Model and Continuous Bag of Words (CBOW) Model. learns two separate
embeddings for each target word wi , (i) the word embedding and (ii) the context embedding.
➔ GloVe Model: The GloVe model (Pennington, Socher, and Manning 2014) is a global log-bilinear
regression model for the unsupervised learning of word embeddings. It captures global statistics of
words in a corpus and benefits the advantages of the other two models: (i) global matrix
factorization and (ii) local context window methods.
➔ RDF2Vec Model: RDF2Vec (Ristoski and Paulheim 2016) is an approach for learning embeddings
of entities in RDF graphs. It initially converts the RDF graphs into a set of sequences using two
strategies: (i) Weisfeiler-Lehman Subtree RDF Graph Kernels, and (ii) graph random walks. Then,
word2vec is employed for learning embeddings over these produced sequences.
7
Embedding Models (2)
➔ Translation-based Models: The TransE and TransH models assume that the embeddings of both
the entities and relations of a knowledge graph are represented in the same semantic space,
whereas the TransR considers two separate embedding spaces for entities and relations. All three
approaches share the same principle, summing the vectors of the subject and the predicate, one
can obtain an approximation of the vectors of the objects.
➔ Other Knowledge Graph Embedding (KGE) Models.
◆ HolE (Holographic Embeddings)
◆ DistMult
◆ ComplEx
◆ ...
All approaches above have shown to reach state-of-the-art performances on link prediction and triplet
classification.
8
Excluding of non-scalable KGE Approaches
➔ We selected the knowledge graph embedding approaches for the evaluation of our metrics among
RDF2Vec, TransE and three of the methods described in the previous subsection (i.e., HolE,
DistMult, and ComplEx)
➔ Differently from RDF2Vec, we could not find DBpedia embeddings pre-trained using any of the
other approaches online, thus we conducted a scalability test on them to verify their ability to
handle the size of DBpediawe decided to select only the more scalable RDF2Vec approach in our
evaluation
9
Data Preparation
➔ Corpus: DBpedia + Wikipedia
➔ From the DBpedia ontology, we selected 12 concepts, which are positioned in various levels of the
hierarchy. Furthermore, for each concept, we retrieved 10,000 entities typed by it (in case of
unavailability, all existing entities were retrieved).
➔ For each concept class, we retrieved 10,000 instances and their respective labels; in case of
unavailability, all existing instances were retrieved.
➔ Then, the embeddings of these concepts as well as their associated instances were computed from
the embedding models: (i) skip-gram, and (ii) CBOW and (iii) GloVe trained on Wikipedia and
DBpedia
10
Task 1: Evaluating the Categorization Aspect of
Concepts in Embeddings
➔ Ontological concepts C categorize entities by typing them, mainly using rdf:type. In other words, all
the entities with a common type share specific characteristics. For example, all the entities with
the type dbo:Country have common characteristics distinguishing them from the entities with the
type dbo:Person.
➔ Research Question: How far is the categorization aspect of concepts captured (i.e., encoded) by an
embedding model? In other words, we aim to measure the quality of the embeddings for concepts
via observing their behaviour in categorization tasks.
11
Categorization metric
➔ In the context of unstructured data, this metric aligns a clustering of words into different
categories
➔ how well the embedding of a concept performs as the background concept of the entities typed by
it.
➔ To quantify this metric, we compute the averaged vector of the embeddings of all the entities
having type c and then compute the cosine similarity of this averaged vector and the embedding of
the concept Vc
12
Experiment
13
Coherence metric
It measures whether or not a group of words adjacent in the embedding space are mutually related.
Commonly, this relatedness task has been evaluated in a subjective manner (i.e. using a human judge).
However, in the context of structured data we define the concept of relatedness as the related entities
which share a background concept, a background concept is the concept from which a given entity is
typed (i.e. inherited). For example, the entities dbr:Berlin and dbr:Sana’a are related because both are
typed by the concept dbo:City
14
Coherence metric
A. Quantitative Evaluation: For the given concept c and the given radius n, we find the n-top similar
entities from the pool (having the highest cosine similarity with Vc ). Then, the coherence metric
for the given concept c with the radius n is computed as the number of entities having the same
background concept as the given concept;
B. Qualitative Evaluation: visualization approach
15
Experiment (radius 20)
16
Task 2: Evaluating Hierarchical Aspect of
Concepts in Embeddings
There is a relatively long standing research for measuring the similarity of two given concepts s(ci , cj )
either across ontologies or inside a common ontology.
Typically, the similarity of concepts is calculated at the lexical level and at the conceptual level.
We present three metrics which can be employed for evaluating the embeddings of concepts with
respect to the hierarchical structure and the semantics.
17
Absolute Error
We introduce the metric absolute semantic error which quantitatively measures the quality of
embeddings for concepts against their semantic similarity.
18
Semantic Relatedness metric
We tune this metric from word embedding for knowledge graphs by exchanging words for concepts.
Typically, this metric represents the relatedness score of two given words. In the context of a knowledge
graph, we give a pair of concepts to human judges (usually domain experts) to rate the relatedness score
on a predefined scale, then, the correlation of the cosine similarity of the embeddings for concepts is
measured with human judge scores using Spearman or Pearson.
19
Visualization
The embeddings of all concepts of the knowledge graph can be represented in a two-dimensional
visualization. This approach is an appropriate means for qualitative evaluation of the hierarchical aspect
of concepts. The visualizations are given to a human who judges them to recognize patterns revealing the
hierarchical structure and the semantics.
20
Experiment
21
Task 3: Evaluating Relational Aspect of
Concepts in Embedding
A. Relation Validation: whether or not the inferred relation is compatible with the type of entities
engaged. For example, the relation capital is valid if it is recognized between entities with the types
country and city.
B. Domain and Range of Relations: This validation process in a knowledge graph is eased by
considering the axioms rdfs:domain and rdfs:range of the schema properties and rdf:type of entities.
The expectation from embeddings generated for relations is to truly reflect compatibility with the
embeddings of the concepts asserted in the domain and range
22
Metrics for evaluating the quality of the
embeddings for concepts and relations
A. Selectional preference: This metric presented in assesses the relevance of a given noun as a
subject or object of a given verb (e.g. people-eat or city-talk). We tune this metric for knowledge
graphs as pairs of concept-relation which are represented to a human judge for the approval or
disapproval of their compatibility
B. Semantic transition distance: The inspiration for this metric comes from where Mikolov
demonstrated that capital cities and their corresponding countries follow the same distance. We
introduce this metric relying on an objective assessment. This metric considers the relational
axioms (i.e. rdfs:domain and rdfs:range) in a knowledge graph.
23
Discussion
➔ there is no single embedding model which shows superior performance in every scenario. For
example, while the skip-gram model performs better in the categorization task, the GloVe and
CBOW model perform better for the hierarchical task. Thus, one conclusion is that each of these
models is suited for a specific scenario.
➔ The other conclusion is that it seems that each embedding model captures specific features of the
ontological concepts, so integrating or aligning these embeddings can be a solution for fully
capturing all of these features.
➔ Although our initial expectation was that the embeddings learned from the knowledge graph (i.e.
DBpedia) should have higher quality in comparison to the embeddings learned from unstructured
data (i.e. Wikipedia), in practice we did not observe that as a constant behaviour.
24
Conclusion
➔ Required to generate embeddings with respect to the these three aspects
➔ Improving and extending the evaluation scenario
25
26

Weitere ähnliche Inhalte

Was ist angesagt?

Présentation de ECMAScript 6
Présentation de ECMAScript 6Présentation de ECMAScript 6
Présentation de ECMAScript 6
Julien CROUZET
 

Was ist angesagt? (20)

Rapport pfe- Refonte et déploiement d’une solution de messagerie en utilisant...
Rapport pfe- Refonte et déploiement d’une solution de messagerie en utilisant...Rapport pfe- Refonte et déploiement d’une solution de messagerie en utilisant...
Rapport pfe- Refonte et déploiement d’une solution de messagerie en utilisant...
 
Spring security
Spring securitySpring security
Spring security
 
Dossier de competences MA
Dossier de competences MADossier de competences MA
Dossier de competences MA
 
Prise en main de Jhipster
Prise en main de JhipsterPrise en main de Jhipster
Prise en main de Jhipster
 
La persistance des données : ORM et hibernate
La persistance des données : ORM et hibernateLa persistance des données : ORM et hibernate
La persistance des données : ORM et hibernate
 
Présentation de ECMAScript 6
Présentation de ECMAScript 6Présentation de ECMAScript 6
Présentation de ECMAScript 6
 
APACHE HTTP
APACHE HTTPAPACHE HTTP
APACHE HTTP
 
Programmation réseau en JAVA
Programmation réseau en JAVAProgrammation réseau en JAVA
Programmation réseau en JAVA
 
Support NodeJS avec TypeScript Express MongoDB
Support NodeJS avec TypeScript Express MongoDBSupport NodeJS avec TypeScript Express MongoDB
Support NodeJS avec TypeScript Express MongoDB
 
Cours design pattern m youssfi partie 1 introduction et pattern strategy
Cours design pattern m youssfi partie 1 introduction et pattern strategyCours design pattern m youssfi partie 1 introduction et pattern strategy
Cours design pattern m youssfi partie 1 introduction et pattern strategy
 
eServices-Chp5: Microservices et API Management
eServices-Chp5: Microservices et API ManagementeServices-Chp5: Microservices et API Management
eServices-Chp5: Microservices et API Management
 
Reporting avec JasperServer & iReport
Reporting avec JasperServer & iReportReporting avec JasperServer & iReport
Reporting avec JasperServer & iReport
 
Un exemple élémentaire d'application MVC en PHP
Un exemple élémentaire d'application MVC en PHPUn exemple élémentaire d'application MVC en PHP
Un exemple élémentaire d'application MVC en PHP
 
Traitement distribue en BIg Data - KAFKA Broker and Kafka Streams
Traitement distribue en BIg Data - KAFKA Broker and Kafka StreamsTraitement distribue en BIg Data - KAFKA Broker and Kafka Streams
Traitement distribue en BIg Data - KAFKA Broker and Kafka Streams
 
pfe book 2023 2024.pdf
pfe book 2023 2024.pdfpfe book 2023 2024.pdf
pfe book 2023 2024.pdf
 
eServices-Tp5: api management
eServices-Tp5: api managementeServices-Tp5: api management
eServices-Tp5: api management
 
Architecture jee principe de inversion de controle et injection des dependances
Architecture jee principe de inversion de controle et injection des dependancesArchitecture jee principe de inversion de controle et injection des dependances
Architecture jee principe de inversion de controle et injection des dependances
 
Uml examen
Uml  examenUml  examen
Uml examen
 
Présentation PFE "Refonte et déploiement d’une solution de messagerie en util...
Présentation PFE "Refonte et déploiement d’une solution de messagerie en util...Présentation PFE "Refonte et déploiement d’une solution de messagerie en util...
Présentation PFE "Refonte et déploiement d’une solution de messagerie en util...
 
Tp1 - WS avec JAXWS
Tp1 - WS avec JAXWSTp1 - WS avec JAXWS
Tp1 - WS avec JAXWS
 

Ähnlich wie Metrics for Evaluating Quality of Embeddings for Ontological Concepts

Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...
Andre Freitas
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
dannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
IJwest
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
ijceronline
 
[Emnlp] what is glo ve part ii - towards data science
[Emnlp] what is glo ve  part ii - towards data science[Emnlp] what is glo ve  part ii - towards data science
[Emnlp] what is glo ve part ii - towards data science
Nikhil Jaiswal
 

Ähnlich wie Metrics for Evaluating Quality of Embeddings for Ontological Concepts (20)

Marvin_Capstone
Marvin_CapstoneMarvin_Capstone
Marvin_Capstone
 
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
 
G04124041046
G04124041046G04124041046
G04124041046
 
Barzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentationBarzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentation
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
 
SEMANTICS GRAPH MINING FOR TOPIC DISCOVERY AND WORD ASSOCIATIONS
SEMANTICS GRAPH MINING FOR TOPIC DISCOVERY AND WORD ASSOCIATIONSSEMANTICS GRAPH MINING FOR TOPIC DISCOVERY AND WORD ASSOCIATIONS
SEMANTICS GRAPH MINING FOR TOPIC DISCOVERY AND WORD ASSOCIATIONS
 
Context Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsContext Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word Pairs
 
Effect of word embedding vector dimensionality on sentiment analysis through ...
Effect of word embedding vector dimensionality on sentiment analysis through ...Effect of word embedding vector dimensionality on sentiment analysis through ...
Effect of word embedding vector dimensionality on sentiment analysis through ...
 
Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
factorization methods
factorization methodsfactorization methods
factorization methods
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
1808.10245v1 (1).pdf
1808.10245v1 (1).pdf1808.10245v1 (1).pdf
1808.10245v1 (1).pdf
 
Secured Ontology Mapping
Secured Ontology Mapping Secured Ontology Mapping
Secured Ontology Mapping
 
[Emnlp] what is glo ve part ii - towards data science
[Emnlp] what is glo ve  part ii - towards data science[Emnlp] what is glo ve  part ii - towards data science
[Emnlp] what is glo ve part ii - towards data science
 
Semantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional ApproachSemantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional Approach
 

Mehr von Saeedeh Shekarpour

CEVO: Comprehensive EVent Ontology Enhancing Cognitive Annotation on Relations
CEVO: Comprehensive EVent Ontology  Enhancing Cognitive Annotation on RelationsCEVO: Comprehensive EVent Ontology  Enhancing Cognitive Annotation on Relations
CEVO: Comprehensive EVent Ontology Enhancing Cognitive Annotation on Relations
Saeedeh Shekarpour
 

Mehr von Saeedeh Shekarpour (7)

CEVO: Comprehensive EVent Ontology Enhancing Cognitive Annotation on Relations
CEVO: Comprehensive EVent Ontology  Enhancing Cognitive Annotation on RelationsCEVO: Comprehensive EVent Ontology  Enhancing Cognitive Annotation on Relations
CEVO: Comprehensive EVent Ontology Enhancing Cognitive Annotation on Relations
 
A quality type aware annotated corpus and lexicon for harassment research
A quality type aware annotated corpus and lexicon for harassment researchA quality type aware annotated corpus and lexicon for harassment research
A quality type aware annotated corpus and lexicon for harassment research
 
Windowing of attention
Windowing of attentionWindowing of attention
Windowing of attention
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems
 
Semantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked DataSemantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked Data
 
Sina presentation in IBM
Sina presentation in IBMSina presentation in IBM
Sina presentation in IBM
 
Wi presentation
Wi presentationWi presentation
Wi presentation
 

Kürzlich hochgeladen

Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Kürzlich hochgeladen (20)

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 

Metrics for Evaluating Quality of Embeddings for Ontological Concepts

  • 1. Metrics for Evaluating Quality of Embeddings for Ontological Concepts Faisal Alshargi University of Leipzig Saeedeh Shekarpour University of Dayton Tommaso Soru University of Leipzig Amit Sheth Kno.e.sis Center 1
  • 2. Motivation: the growth of Linked Open Data 2 March 2018 ~3000 Datasets >190 Triples
  • 3. Motivation: Knowledge Transformation It is necessary to dispose this valuable knowledge for extrinsic tasks such as natural language processing or data mining. ➔ the knowledge (i.e. schema level and instance level) has to be transformed from discrete representations to numerical representations (called embeddings). 3
  • 4. Research Gap and our Contribution Existing deficiency: is the lack of an evaluation framework for comprehensive and fair judgment on the quality of the embeddings of concepts Evaluation can be intrinsic or extrinsic Our Contribution: concerned with evaluating the quality of embeddings for concepts. It extends the state of the art by providing several intrinsic metrics for evaluating the quality of the embedding of concepts on three aspects: i. the categorization aspect, ii. the hierarchical aspect, and iii. the relational aspect. 4
  • 5. Ontological Concept Ontological concepts play a crucial role in ➔ capturing the semantics of a particular domain ➔ typing entities which bridge a schema level and an instance level ➔ determining valid types of sources and destinations for relations in a knowledge graph. Thus, the embeddings of the concepts are expected to truly reflect characteristics of ontological concepts in the embedding space. 5
  • 6. Contribution contributing in providing several metrics for evaluating the quality of the embedding of concepts from three perspectives: (i) how the embedding of concepts behaves for categorizing their instantiated entities; (ii) how the embedding of concepts behaves with respect to hierarchical semantics described in the underlying ontology; and (iii) how the embedding of concepts behaves with respect to relations 6
  • 7. Embedding Models (1) ➔ Word2vec: Skip-Gram Model and Continuous Bag of Words (CBOW) Model. learns two separate embeddings for each target word wi , (i) the word embedding and (ii) the context embedding. ➔ GloVe Model: The GloVe model (Pennington, Socher, and Manning 2014) is a global log-bilinear regression model for the unsupervised learning of word embeddings. It captures global statistics of words in a corpus and benefits the advantages of the other two models: (i) global matrix factorization and (ii) local context window methods. ➔ RDF2Vec Model: RDF2Vec (Ristoski and Paulheim 2016) is an approach for learning embeddings of entities in RDF graphs. It initially converts the RDF graphs into a set of sequences using two strategies: (i) Weisfeiler-Lehman Subtree RDF Graph Kernels, and (ii) graph random walks. Then, word2vec is employed for learning embeddings over these produced sequences. 7
  • 8. Embedding Models (2) ➔ Translation-based Models: The TransE and TransH models assume that the embeddings of both the entities and relations of a knowledge graph are represented in the same semantic space, whereas the TransR considers two separate embedding spaces for entities and relations. All three approaches share the same principle, summing the vectors of the subject and the predicate, one can obtain an approximation of the vectors of the objects. ➔ Other Knowledge Graph Embedding (KGE) Models. ◆ HolE (Holographic Embeddings) ◆ DistMult ◆ ComplEx ◆ ... All approaches above have shown to reach state-of-the-art performances on link prediction and triplet classification. 8
  • 9. Excluding of non-scalable KGE Approaches ➔ We selected the knowledge graph embedding approaches for the evaluation of our metrics among RDF2Vec, TransE and three of the methods described in the previous subsection (i.e., HolE, DistMult, and ComplEx) ➔ Differently from RDF2Vec, we could not find DBpedia embeddings pre-trained using any of the other approaches online, thus we conducted a scalability test on them to verify their ability to handle the size of DBpediawe decided to select only the more scalable RDF2Vec approach in our evaluation 9
  • 10. Data Preparation ➔ Corpus: DBpedia + Wikipedia ➔ From the DBpedia ontology, we selected 12 concepts, which are positioned in various levels of the hierarchy. Furthermore, for each concept, we retrieved 10,000 entities typed by it (in case of unavailability, all existing entities were retrieved). ➔ For each concept class, we retrieved 10,000 instances and their respective labels; in case of unavailability, all existing instances were retrieved. ➔ Then, the embeddings of these concepts as well as their associated instances were computed from the embedding models: (i) skip-gram, and (ii) CBOW and (iii) GloVe trained on Wikipedia and DBpedia 10
  • 11. Task 1: Evaluating the Categorization Aspect of Concepts in Embeddings ➔ Ontological concepts C categorize entities by typing them, mainly using rdf:type. In other words, all the entities with a common type share specific characteristics. For example, all the entities with the type dbo:Country have common characteristics distinguishing them from the entities with the type dbo:Person. ➔ Research Question: How far is the categorization aspect of concepts captured (i.e., encoded) by an embedding model? In other words, we aim to measure the quality of the embeddings for concepts via observing their behaviour in categorization tasks. 11
  • 12. Categorization metric ➔ In the context of unstructured data, this metric aligns a clustering of words into different categories ➔ how well the embedding of a concept performs as the background concept of the entities typed by it. ➔ To quantify this metric, we compute the averaged vector of the embeddings of all the entities having type c and then compute the cosine similarity of this averaged vector and the embedding of the concept Vc 12
  • 14. Coherence metric It measures whether or not a group of words adjacent in the embedding space are mutually related. Commonly, this relatedness task has been evaluated in a subjective manner (i.e. using a human judge). However, in the context of structured data we define the concept of relatedness as the related entities which share a background concept, a background concept is the concept from which a given entity is typed (i.e. inherited). For example, the entities dbr:Berlin and dbr:Sana’a are related because both are typed by the concept dbo:City 14
  • 15. Coherence metric A. Quantitative Evaluation: For the given concept c and the given radius n, we find the n-top similar entities from the pool (having the highest cosine similarity with Vc ). Then, the coherence metric for the given concept c with the radius n is computed as the number of entities having the same background concept as the given concept; B. Qualitative Evaluation: visualization approach 15
  • 17. Task 2: Evaluating Hierarchical Aspect of Concepts in Embeddings There is a relatively long standing research for measuring the similarity of two given concepts s(ci , cj ) either across ontologies or inside a common ontology. Typically, the similarity of concepts is calculated at the lexical level and at the conceptual level. We present three metrics which can be employed for evaluating the embeddings of concepts with respect to the hierarchical structure and the semantics. 17
  • 18. Absolute Error We introduce the metric absolute semantic error which quantitatively measures the quality of embeddings for concepts against their semantic similarity. 18
  • 19. Semantic Relatedness metric We tune this metric from word embedding for knowledge graphs by exchanging words for concepts. Typically, this metric represents the relatedness score of two given words. In the context of a knowledge graph, we give a pair of concepts to human judges (usually domain experts) to rate the relatedness score on a predefined scale, then, the correlation of the cosine similarity of the embeddings for concepts is measured with human judge scores using Spearman or Pearson. 19
  • 20. Visualization The embeddings of all concepts of the knowledge graph can be represented in a two-dimensional visualization. This approach is an appropriate means for qualitative evaluation of the hierarchical aspect of concepts. The visualizations are given to a human who judges them to recognize patterns revealing the hierarchical structure and the semantics. 20
  • 22. Task 3: Evaluating Relational Aspect of Concepts in Embedding A. Relation Validation: whether or not the inferred relation is compatible with the type of entities engaged. For example, the relation capital is valid if it is recognized between entities with the types country and city. B. Domain and Range of Relations: This validation process in a knowledge graph is eased by considering the axioms rdfs:domain and rdfs:range of the schema properties and rdf:type of entities. The expectation from embeddings generated for relations is to truly reflect compatibility with the embeddings of the concepts asserted in the domain and range 22
  • 23. Metrics for evaluating the quality of the embeddings for concepts and relations A. Selectional preference: This metric presented in assesses the relevance of a given noun as a subject or object of a given verb (e.g. people-eat or city-talk). We tune this metric for knowledge graphs as pairs of concept-relation which are represented to a human judge for the approval or disapproval of their compatibility B. Semantic transition distance: The inspiration for this metric comes from where Mikolov demonstrated that capital cities and their corresponding countries follow the same distance. We introduce this metric relying on an objective assessment. This metric considers the relational axioms (i.e. rdfs:domain and rdfs:range) in a knowledge graph. 23
  • 24. Discussion ➔ there is no single embedding model which shows superior performance in every scenario. For example, while the skip-gram model performs better in the categorization task, the GloVe and CBOW model perform better for the hierarchical task. Thus, one conclusion is that each of these models is suited for a specific scenario. ➔ The other conclusion is that it seems that each embedding model captures specific features of the ontological concepts, so integrating or aligning these embeddings can be a solution for fully capturing all of these features. ➔ Although our initial expectation was that the embeddings learned from the knowledge graph (i.e. DBpedia) should have higher quality in comparison to the embeddings learned from unstructured data (i.e. Wikipedia), in practice we did not observe that as a constant behaviour. 24
  • 25. Conclusion ➔ Required to generate embeddings with respect to the these three aspects ➔ Improving and extending the evaluation scenario 25
  • 26. 26