This presentation describes the participation of the UNIBA team
in the Named Entity rEcognition and Linking (NEEL) Chal-
lenge. We propose a knowledge-based algorithm able to
recognize and link named entities in English tweets. The
approach combines the simple Lesk algorithm with informa-
tion coming from both a distributional semantic model and
usage frequency of Wikipedia concepts. The algorithm per-
forms poorly in the entity recognition, while it achieves good
results in the disambiguation step.
Encoding syntactic dependencies by vector permutation
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets
1. UNIBA: Exploiting a
Distributional Semantic Model for
Disambiguating and
Linking Entities in Tweets
Pierpaolo Basile, Annalina Caputo, Giovanni
Semeraro, Fedelucio Narducci
{fedelucio.narducci, pierpaolo.basile}@uniba.it
#Microposts2015, NEEL Challenge, Florence 18th May 2015
2. The Challenge
Just watched Frozen for the first time ever and knew the
words to all the songs... How?! #productplacement
Problem: Find and link entities in tweets
ProductEntity type
3. Our Approach
• Entity Recognition
• using PoS-tag
• relying on n-grams
• Disambiguation
• knowledge-based method that combines a
Distributional Semantic Models (DSM) with prior
probability assigned to each DBpedia concept
• Type
• manual map for all types defined in the dbpedia-owl
ontology to the respective types in the task
9. Disambiguation:
Building the context
Just watched Frozen for the first time ever and knew the
words to all the songs... How?! #productplacement
<just, watched, first, time, knew, words, all, songs,
how, product, placement>
Context
10. Disambiguation:
Semantic Ranking 1/3
• Words as points in a
mathematical space
• Close words are similar
• Word space is built analyzing
word co-occurrences in a
large corpus
• Vector composition using
superposition (+)
11. Disambiguation:
Semantic Ranking 2/3
word2vec: https://code.google.com/p/word2vec/
Distributional Semantic Model built on Wikipedia
Context
• Cosine similarity
between the gloss and
the context
• Linear combination
with a function which
takes into account the
usage of concepts in
Wikipedia
13. 𝑝 𝑐𝑖𝑗 𝑒𝑖 =
𝑡 𝑒𝑖, 𝑐𝑖𝑗 + 1
#𝑒𝑖 + |𝐶𝑖|
Disambiguation: Semantic
overlap 3/3
Statistics about the usage of concepts in Wikipedia
Number of times
ei is linked as cij
Number of concepts
assigned to ei
14. Evaluation
• Development set
• 500 manually annotated tweets
• Metrics
• SLM: Strong Link Match
• STMM: Strong Typed Mention Match
• MC: Mention Ceaf
• System setup
• TweetNLP for tokenization and PoS-tagging
• word2vec for DSM building: 400 vector dimensions
analyzing only terms that occur at least 25 times
• Developed in JAVA
15. Results
• Low performance in entity recognition
• Good results in disambiguation: F=0.825
considering correct recognition and no-NIL
instances
Entity Recognition F-SLM F-STMM F-MC
PoS-tag 0.362 0.267 0.389
N-grams 0.258 0.191 0.306