SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
UNIBA: Exploiting a
Distributional Semantic Model for
Disambiguating and
Linking Entities in Tweets
Pierpaolo Basile, Annalina Caputo, Giovanni
Semeraro, Fedelucio Narducci
{fedelucio.narducci, pierpaolo.basile}@uniba.it
#Microposts2015, NEEL Challenge, Florence 18th May 2015
The Challenge
Just watched Frozen for the first time ever and knew the
words to all the songs... How?! #productplacement
Problem: Find and link entities in tweets
ProductEntity type
Our Approach
• Entity Recognition
• using PoS-tag
• relying on n-grams
• Disambiguation
• knowledge-based method that combines a
Distributional Semantic Models (DSM) with prior
probability assigned to each DBpedia concept
• Type
• manual map for all types defined in the dbpedia-owl
ontology to the respective types in the task
Entity Recognition: Indexing
Frozen
<dbpedia.org/resource/Frozen_(Madonna_song)>
Frozen
<dbpedia.org/resource/Frozen_(2013_film)>
Apple
<dbpedia.org/resource/Apple_Inc.>
Apple Inc.
<dbpedia.org/resource/Apple_Inc.>
Barack Obama
<http://dbpedia.org/resource/Barack_Obama>
DBpedia titles file and DBpedia NLP resources
http://wifo5-04.informatik.uni-mannheim.de/downloads/datasets/
Indexing
Entity Recognition…
PoS-tagger
N-grams
generation
Tokenization and
Normalization
Candidate list
of surface
forms
Tweet
…Entity Recognition
Search and
Filtering
Search Score
Levenshtein Distance
Jaccard Index
Candidate list
of surface
forms
Candidate
entities and
list of possible
concepts
Disambiguation
Building the glosses
Building the context
Semantic Ranking
3-step approach
Disambiguation:
Building the glosses
"Frozen" is a song by American singer-
songwriter Madonna…
Frozen is a 2013 American 3D computer-
animated musical…
DBpedia extended abstracts
Disambiguation:
Building the context
Just watched Frozen for the first time ever and knew the
words to all the songs... How?! #productplacement
<just, watched, first, time, knew, words, all, songs,
how, product, placement>
Context
Disambiguation:
Semantic Ranking 1/3
• Words as points in a
mathematical space
• Close words are similar
• Word space is built analyzing
word co-occurrences in a
large corpus
• Vector composition using
superposition (+)
Disambiguation:
Semantic Ranking 2/3
word2vec: https://code.google.com/p/word2vec/
Distributional Semantic Model built on Wikipedia
Context
• Cosine similarity
between the gloss and
the context
• Linear combination
with a function which
takes into account the
usage of concepts in
Wikipedia
Disambiguation:
Semantic Ranking 3/3
Statistics about the usage of concepts in Wikipedia
𝑝 𝑐𝑖𝑗 𝑒𝑖 =
𝑡 𝑒𝑖, 𝑐𝑖𝑗 + 1
#𝑒𝑖 + |𝐶𝑖|
Concept probability
given the entity
𝑝 𝑐𝑖𝑗 𝑒𝑖 =
𝑡 𝑒𝑖, 𝑐𝑖𝑗 + 1
#𝑒𝑖 + |𝐶𝑖|
Disambiguation: Semantic
overlap 3/3
Statistics about the usage of concepts in Wikipedia
Number of times
ei is linked as cij
Number of concepts
assigned to ei
Evaluation
• Development set
• 500 manually annotated tweets
• Metrics
• SLM: Strong Link Match
• STMM: Strong Typed Mention Match
• MC: Mention Ceaf
• System setup
• TweetNLP for tokenization and PoS-tagging
• word2vec for DSM building: 400 vector dimensions
analyzing only terms that occur at least 25 times
• Developed in JAVA
Results
• Low performance in entity recognition
• Good results in disambiguation: F=0.825
considering correct recognition and no-NIL
instances
Entity Recognition F-SLM F-STMM F-MC
PoS-tag 0.362 0.267 0.389
N-grams 0.258 0.191 0.306
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets

Weitere ähnliche Inhalte

Andere mochten auch

Quels facteurs de pertinence pour la recherche de produits e-commerce ?
Quels facteurs de pertinence pour la recherche de produits e-commerce ?Quels facteurs de pertinence pour la recherche de produits e-commerce ?
Quels facteurs de pertinence pour la recherche de produits e-commerce ?
Lamjed Ben Jabeur
 
Intégration des facteurs temps et autorité sociale dans un modèle bayésien de...
Intégration des facteurs temps et autorité sociale dans un modèle bayésien de...Intégration des facteurs temps et autorité sociale dans un modèle bayésien de...
Intégration des facteurs temps et autorité sociale dans un modèle bayésien de...
Lamjed Ben Jabeur
 
Un modèle de recherche d’information sociale dans les microblogs : cas de Twi...
Un modèle de recherche d’information sociale dans les microblogs : cas de Twi...Un modèle de recherche d’information sociale dans les microblogs : cas de Twi...
Un modèle de recherche d’information sociale dans les microblogs : cas de Twi...
Lamjed Ben Jabeur
 

Andere mochten auch (8)

Web-scale semantic search
Web-scale semantic searchWeb-scale semantic search
Web-scale semantic search
 
(Micro)Blog : un sujet de recherche actuel [08/02/2011]
(Micro)Blog : un sujet de recherche actuel [08/02/2011](Micro)Blog : un sujet de recherche actuel [08/02/2011]
(Micro)Blog : un sujet de recherche actuel [08/02/2011]
 
Barometre RegionsJob/Bringr : les conversations "emploi" sur les réseaux sociaux
Barometre RegionsJob/Bringr : les conversations "emploi" sur les réseaux sociauxBarometre RegionsJob/Bringr : les conversations "emploi" sur les réseaux sociaux
Barometre RegionsJob/Bringr : les conversations "emploi" sur les réseaux sociaux
 
Quels facteurs de pertinence pour la recherche de produits e-commerce ?
Quels facteurs de pertinence pour la recherche de produits e-commerce ?Quels facteurs de pertinence pour la recherche de produits e-commerce ?
Quels facteurs de pertinence pour la recherche de produits e-commerce ?
 
Moederpresentatie Cross Media Cafe - Uit het Lab
Moederpresentatie Cross Media Cafe - Uit het LabMoederpresentatie Cross Media Cafe - Uit het Lab
Moederpresentatie Cross Media Cafe - Uit het Lab
 
Intégration des facteurs temps et autorité sociale dans un modèle bayésien de...
Intégration des facteurs temps et autorité sociale dans un modèle bayésien de...Intégration des facteurs temps et autorité sociale dans un modèle bayésien de...
Intégration des facteurs temps et autorité sociale dans un modèle bayésien de...
 
Un modèle de recherche d’information sociale dans les microblogs : cas de Twi...
Un modèle de recherche d’information sociale dans les microblogs : cas de Twi...Un modèle de recherche d’information sociale dans les microblogs : cas de Twi...
Un modèle de recherche d’information sociale dans les microblogs : cas de Twi...
 
Un modèle de Recherche d'Information Sociale pour l'Accès aux Ressources Bib...
Un modèle de Recherche d'Information Sociale  pour l'Accès aux Ressources Bib...Un modèle de Recherche d'Information Sociale  pour l'Accès aux Ressources Bib...
Un modèle de Recherche d'Information Sociale pour l'Accès aux Ressources Bib...
 

Ähnlich wie UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets

Extracting Key Terms From Noisy and Multi-theme Documents
Extracting Key Terms From Noisy and Multi-theme DocumentsExtracting Key Terms From Noisy and Multi-theme Documents
Extracting Key Terms From Noisy and Multi-theme Documents
maria.grineva
 
Jtelss presentation Paola Monachesi
Jtelss presentation Paola MonachesiJtelss presentation Paola Monachesi
Jtelss presentation Paola Monachesi
guestff44453
 
PhD defense : Multi-points of view semantic enrichment of folksonomies
PhD defense : Multi-points of view semantic enrichment of folksonomiesPhD defense : Multi-points of view semantic enrichment of folksonomies
PhD defense : Multi-points of view semantic enrichment of folksonomies
Freddy Limpens
 
final_nlp
final_nlpfinal_nlp
final_nlp
aphex34
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Lucidworks
 
The civil rights movement ppt for itc 1 kj 4
The civil rights movement ppt for itc 1 kj 4The civil rights movement ppt for itc 1 kj 4
The civil rights movement ppt for itc 1 kj 4
hollowaymm
 
Social semantic web
Social semantic webSocial semantic web
Social semantic web
Vlad Posea
 
Doctoral seminar (DBIS RWTH Aachen)
Doctoral seminar  (DBIS RWTH Aachen)Doctoral seminar  (DBIS RWTH Aachen)
Doctoral seminar (DBIS RWTH Aachen)
Zina Petrushyna
 
The civil rights movement ppt for itc 1 kj 7
The civil rights movement ppt for itc 1 kj 7The civil rights movement ppt for itc 1 kj 7
The civil rights movement ppt for itc 1 kj 7
hollowaymm
 

Ähnlich wie UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets (20)

Extracting Key Terms From Noisy and Multi-theme Documents
Extracting Key Terms From Noisy and Multi-theme DocumentsExtracting Key Terms From Noisy and Multi-theme Documents
Extracting Key Terms From Noisy and Multi-theme Documents
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 
ESWC 2014 Tutorial part 3
ESWC 2014 Tutorial part 3ESWC 2014 Tutorial part 3
ESWC 2014 Tutorial part 3
 
Jtelss presentation Paola Monachesi
Jtelss presentation Paola MonachesiJtelss presentation Paola Monachesi
Jtelss presentation Paola Monachesi
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPedia
 
Effective Extraction of Thematically Grouped Key Terms From Text
Effective Extraction of Thematically Grouped Key Terms From TextEffective Extraction of Thematically Grouped Key Terms From Text
Effective Extraction of Thematically Grouped Key Terms From Text
 
PhD defense : Multi-points of view semantic enrichment of folksonomies
PhD defense : Multi-points of view semantic enrichment of folksonomiesPhD defense : Multi-points of view semantic enrichment of folksonomies
PhD defense : Multi-points of view semantic enrichment of folksonomies
 
An Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on TwitterAn Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on Twitter
 
Using Knowledge Graph for Promoting Cognitive Computing
Using Knowledge Graph for Promoting Cognitive ComputingUsing Knowledge Graph for Promoting Cognitive Computing
Using Knowledge Graph for Promoting Cognitive Computing
 
Websci 2018
Websci 2018Websci 2018
Websci 2018
 
final_nlp
final_nlpfinal_nlp
final_nlp
 
Using DBpedia for Spotting and Disambiguating Entities
Using DBpedia for Spotting and Disambiguating EntitiesUsing DBpedia for Spotting and Disambiguating Entities
Using DBpedia for Spotting and Disambiguating Entities
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Linked Data Workshop Stanford University
Linked Data Workshop Stanford University
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
The civil rights movement ppt for itc 1 kj 4
The civil rights movement ppt for itc 1 kj 4The civil rights movement ppt for itc 1 kj 4
The civil rights movement ppt for itc 1 kj 4
 
Social semantic web
Social semantic webSocial semantic web
Social semantic web
 
BabelNet 3.0
BabelNet 3.0BabelNet 3.0
BabelNet 3.0
 
Doctoral seminar (DBIS RWTH Aachen)
Doctoral seminar  (DBIS RWTH Aachen)Doctoral seminar  (DBIS RWTH Aachen)
Doctoral seminar (DBIS RWTH Aachen)
 
The civil rights movement ppt for itc 1 kj 7
The civil rights movement ppt for itc 1 kj 7The civil rights movement ppt for itc 1 kj 7
The civil rights movement ppt for itc 1 kj 7
 

Mehr von Pierpaolo Basile

Sst evalita2011 basile_pierpaolo
Sst evalita2011 basile_pierpaoloSst evalita2011 basile_pierpaolo
Sst evalita2011 basile_pierpaolo
Pierpaolo Basile
 
Word Sense Disambiguation and Intelligent Information Access
Word Sense Disambiguation and Intelligent Information AccessWord Sense Disambiguation and Intelligent Information Access
Word Sense Disambiguation and Intelligent Information Access
Pierpaolo Basile
 

Mehr von Pierpaolo Basile (20)

Diachronic analysis of entities by exploiting wikipedia page revisions
Diachronic analysis of entities by exploiting wikipedia page revisionsDiachronic analysis of entities by exploiting wikipedia page revisions
Diachronic analysis of entities by exploiting wikipedia page revisions
 
Come l'industria tecnologica ha cancellato le donne dalla storia
Come l'industria tecnologica ha cancellato le donne dalla storiaCome l'industria tecnologica ha cancellato le donne dalla storia
Come l'industria tecnologica ha cancellato le donne dalla storia
 
EVALITA 2018 NLP4FUN - Solving language games
EVALITA 2018 NLP4FUN - Solving language gamesEVALITA 2018 NLP4FUN - Solving language games
EVALITA 2018 NLP4FUN - Solving language games
 
Buon appetito! Analyzing Happiness in Italian Tweets
Buon appetito! Analyzing Happiness in Italian TweetsBuon appetito! Analyzing Happiness in Italian Tweets
Buon appetito! Analyzing Happiness in Italian Tweets
 
Detecting semantic shift in large corpora by exploiting temporal random indexing
Detecting semantic shift in large corpora by exploiting temporal random indexingDetecting semantic shift in large corpora by exploiting temporal random indexing
Detecting semantic shift in large corpora by exploiting temporal random indexing
 
Bi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
Bi-directional LSTM-CNNs-CRF for Italian Sequence LabelingBi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
Bi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
 
INSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
INSERT COIN - Storia dei videogame: da Spacewar a Street FighterINSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
INSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
 
QuestionCube DigithON 2017
QuestionCube DigithON 2017QuestionCube DigithON 2017
QuestionCube DigithON 2017
 
Diachronic Analysis
Diachronic AnalysisDiachronic Analysis
Diachronic Analysis
 
(Open) data hacking
(Open) data hacking(Open) data hacking
(Open) data hacking
 
La macchina più geek dell’universo The Turing Machine
La macchina più geek dell’universo The Turing MachineLa macchina più geek dell’universo The Turing Machine
La macchina più geek dell’universo The Turing Machine
 
Building WordSpaces via Random Indexing from simple to complex spaces
Building WordSpaces via Random Indexing from simple to complex spacesBuilding WordSpaces via Random Indexing from simple to complex spaces
Building WordSpaces via Random Indexing from simple to complex spaces
 
Analysing Word Meaning over Time by Exploiting Temporal Random Indexing
Analysing Word Meaning over Time by Exploiting Temporal Random IndexingAnalysing Word Meaning over Time by Exploiting Temporal Random Indexing
Analysing Word Meaning over Time by Exploiting Temporal Random Indexing
 
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
 
A Study on Compositional Semantics of Words in Distributional Spaces
A Study on Compositional Semantics of Words in Distributional SpacesA Study on Compositional Semantics of Words in Distributional Spaces
A Study on Compositional Semantics of Words in Distributional Spaces
 
Exploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question AnsweringExploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question Answering
 
Sst evalita2011 basile_pierpaolo
Sst evalita2011 basile_pierpaoloSst evalita2011 basile_pierpaolo
Sst evalita2011 basile_pierpaolo
 
AI*IA 2012 PAI Workshop OTTHO
AI*IA 2012 PAI Workshop OTTHOAI*IA 2012 PAI Workshop OTTHO
AI*IA 2012 PAI Workshop OTTHO
 
Word Sense Disambiguation and Intelligent Information Access
Word Sense Disambiguation and Intelligent Information AccessWord Sense Disambiguation and Intelligent Information Access
Word Sense Disambiguation and Intelligent Information Access
 
Encoding syntactic dependencies by vector permutation
Encoding syntactic dependencies by vector permutationEncoding syntactic dependencies by vector permutation
Encoding syntactic dependencies by vector permutation
 

UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets