1. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Word Sense Disambiguation and Induction
Leon Derczynski
University of Sheffield
27 January 2011
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
2. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Origin
Originally a course at ESSLLI 2011, Copenhagen
by Roberto Navigli and Simone Ponzetto
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
3. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Outline
1 Introduction
2 WSD
3 WSI
4 Evaluation and Issues
5 Wikipedia
6 Summary
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
4. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
General Problem
Being able to disambiguate words in context is a crucial
problem
Can potentially help improve many other NLP applications
Polysemy is everywhere – our job is to model this
Ambiguity is rampant.
I saw a man who is 98 years old and can still walk and tell
jokes.
saw:26 man:11 years:4 old:8 can:5 still:4 walk:10 tell:8 jokes:3
43 929 600 possible senses for this simple sentence.
general problem, ambiguity is rampant
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
5. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Word Senses
Monosemous words – only one meaning; plant life, internet
Polysemous words – more than one meaning; bar, bass
A word sense is a commonly-accepted meaning of a word.
We are fond of fruit such as the kiwifruit and banana.
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
6. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Enumerative Approach
Fixed sense inventory enumerates the range of possible
meanings of a word
Context is used to select a particular sense
chop vegetables with a knife, was stabbed with a knife
However, we may want to add senses.
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
7. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
WSD Tasks
Different representations of senses change the way we think
about WSD
Lexical sample – disambiguate a restricted set of words
All words – disambiguate all content words
Cross lingual WSD – disambiguate a target word by labeling it
with the appropriate translation in other languages; eg.
English coach → German Bus/Linienbus/Omnibus/Reisebus.
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
8. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Representing the Context
Text is unstructured, and needs to be made machine-readable.
Flat representation (surface features) vs. Structured
representation (graphs, trees)
Local features: local context of a word usage, e.g. PoS tags
and surrounding word forms
Topical features: general topic of a sentence or discourse,
represented as a bag of words
Syntactic features: argument-head relations between target
and rest of sentence
Semantic features: previously established word senses
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
9. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Knowledge Resources
Structured and Unstructured
Thesauri, machine-readable dictionaries, semantic networks
(WordNet)
BabelNet – Babel synsets, with semantic relations (is-a,
part-of)
Raw corpora
Collocation (Web1T)
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
10. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Applications
Information extraction – acronym expansion, disambiguate
people names, domain-specific IE
Information retrieval
Machine Translation
Semantic web
Question answering
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
11. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Approaches
Supervised WSD: classification task, hand-labelled data
KB WSD: uses knowledge resources, no training
Unsupervised: performs WSI
Word sense dominance: find predominant sense of a word
Domain-driven WSD: use domain information as vectors to
compare with senses of w
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
12. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Outline
1 Introduction
2 WSD
3 WSI
4 Evaluation and Issues
5 Wikipedia
6 Summary
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
13. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Supervised WSD
Given a set of manually sense-annotated examples (training
set), learn a classifier
Features for WSD: Bag of words, bigrams, collocations, VP
and NP heads, PoS
Using WordNet as a sense inventory, SemCor is a readily
available source of sense-labelled data
Current SotA performance from SVMs
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
14. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Knowledge-based WSD
Exploit knowledge resources (dictionaries, thesauri,
collocations) to assign senses
Lower performance than supervised methods, but wider
coverage
No need to train or be tuned to a task/domain
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
15. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Gloss Overlap
Knowledge-based method proposed by Lesk (1986)
Retrieve all sense definitions of target word
Compare each sense definition with the definitions of other
words in context
Choose the sense with the most overlap
To disambiguate pine cone;
pine: 1. a kind of evergreen tree; 2. to waste away through
sorrow.
cone: 1. a solid body which narrows to a point; 2. something
of this shape; 3. fruit of certain evergreen trees.
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
16. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Lexical Chains
Knowledge-based method proposed by Hirst and St Onge
(1998)
A lexical chain is a sequence of semantically related words in a
text
Assign scores to senses based on the chain of related words it
is in
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
17. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
PageRank
Knowledge-based method proposed by Agirre and Soroa
(2009)
Build a graph including all synsets of words in the input text
Assign an initial low value to each node in the graph
Apply PageRank (Brin and Page) to the graph, and select
synsets with the highest PR
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
18. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Knowledge Acquisition Bottleneck
WSD needs knowledge! Corpora, dictionaries, semantic
networks
More knowledge is required to improve the performance of
both:
Supervised systems – more training data
Knowledge based systems – richer networks
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
19. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Minimally Supervised WSD
Human supervision is expensive, but required for training
examples or a knowledge base
Minimally supervised approaches aim to learn classifiers from
annotated data with minimal human supervision
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
20. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Bootstrapping
Given a set labelled examples L, a set of unlabelled examples
U and a classifier c:
1. Choose N examples from U and add them to U ′
2. Train c on L and label U ′
3. Select K most confidently labelled instances from U ′ and
assign them to L
Repeat until U or K is empty
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
21. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Outline
1 Introduction
2 WSD
3 WSI
4 Evaluation and Issues
5 Wikipedia
6 Summary
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
22. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Word Sense Induction
Based on the idea that one sense of a word will have similar
neighbouring words
Follows the idea that the meaning of a word is given by its
usage
We induce word sense from input text by clustering word
occurrences
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
23. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Clustering
Unsupervised machine learning for grouping similar objects
into groups
No a priori input (sense labels)
Context clustering: each occurrence of a word is represented
as a context vector; cluster vectors into groups
Word clustering: cluster words which are semantically similar
and thus have a specific meaning
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
24. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Word Clustering
Aims to cluster words which are semantically similar
Lin (1998) proposes this method:
1. Extract dependency triples from a text corpus
John eats a yummy kiwi → (eat subj John), (kiwi obj-of eat),
(kiwi det a) ...
2. Define a measure of similarity between two words
3. Use similarity scores to create a similarity tree; start with a
root node, and add recursively add children in descending
order of similarity.
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
25. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Lin’s approach: example
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
26. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
WSI: pros and cons
+ Actually performs word sense disambiguation
+ Aims to divide the occurrences of a word into a number of
classes
- Makes objective evaluation more difficult if not
domain-specific
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
27. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Outline
1 Introduction
2 WSD
3 WSI
4 Evaluation and Issues
5 Wikipedia
6 Summary
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
28. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Disambiguation Evaluation
Disambiguation is easy to evaluate – we have discrete sense
inventories
Evaluate with Coverage (answers given),
Precision and Recall, and then F1
Accuracy – correct answers / total answers
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
29. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Disambiguation Baselines
MFS – Most Frequent Sense
Strong baseline - 50-60% accuracy on lexical sample task
Doesn’t take into account genre (e.g. star in astrophysics /
newswire)
Subject to idiosyncracies of corpus
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
30. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Evaluation with gold-standard clustering
Given a standard clustering, compare the gold standard and
output clustering
Can evaluate with set Entropy, Purity
Also RandIndex (similar to Jacquard) and F-Score.
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
31. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Discrimination Baselines
All-in-one: group all words into one big cluster
Random: produce a random set of clusters
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
32. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Pseudowords
Discrimination evaluation method
Generates new words with artificial ambiguity
Select two or more monosemous terms from gold standard
data
Given all their occurrences in a corpus, replace them with a
pseudoword formed by joining the two terms
Compare automatic discrimination to gold standard
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
33. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
SemEval-2007
Lexical sample and all-words coarse grained WSD
Preposition disambiguation
Evaluation of WSD on cross-language RI
WSI, lexical substitution
Top systems reach 88.7% accuracy (on lexical sample) and
82.5% (on all-words)
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
34. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
SemEval-2010
Fifth event of its kind
Includes specific cross-lingual tasks
Combined WSI/WSD task
Domain-specific all-words task
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
35. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Issues
Representation of word senses: enumerative vs. generative
approach
Knowledge Acquisition Bottleneck: not enough data!
Benefits for AI/NLP applications
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
36. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Alleviating the Knowledge Acquisition Bottleneck
Weakly-supervised algorithms, incorporating bootstrapping or
active learning
Continuing manual efforts – WordNet, Open Mind Word
Expert, OntoNotes
Automatic enrichment of knowledge resources – collocation
and relation triple extraction, BabelNet
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
37. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Future Challenges
How can we mine even larger repositories of textual data –
e.g. the whole web! – to create huge knowledge repositories?
How can we design high performance and scalable algorithms
to use this data?
Need to decide which kind of word sense are needed for which
application
Still, need to develop a general representation of word senses
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
38. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Outline
1 Introduction
2 WSD
3 WSI
4 Evaluation and Issues
5 Wikipedia
6 Summary
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
39. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Wikipedia as sense inventory
Wikipedia articles provide an inventory of disambiguated word
senses and entity references
Task: Use their occurrences in texts, i.e. the internal
Wikipedia hyperlinks, as named entity and sense annotations
The articles’ texts provide a sense annotated corpus
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
40. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Mihalcea (2007)
Mihalcea proposes a method for automatically generating
sense-tagged data using Wikipedia
Rhythm is the arrangement of sounds in time. Meter animates
time in regular pulse groupings, called measures or [[bar
(music)—bar]].
The nightlife is particularly active around the beachfront
promenades because of its many nightclubs and [[bar
(establishment)—bars]].
1. Extract all paragraphs in Wikipedia containing word w
2. Collect all possible labels l2 ..ln for w
3. Map each label l to its WordNet sense s
4. Annotate each occurrence of li |w with its sense s
System trained on Wikipedia significantly outperforms MFS
and Lesk baselines
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
41. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Knowledge-rich WSD
General aim is to relieve knowledge acquisition bottleneck of
NLP systems, with WSD as a case study
Main ideas:
- Extend WordNet with millions of semantic relations (using
Wikipedia)
- Apply knowledge-based WSD to exploit extended WordNet
Results: integration of many, many semantic relations in
knowledge-based systems yields performance competitive with
SotA supervised approaches
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
42. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Wikification
The task of generating hyperlinks to disambiguated Wikipedia
concepts
Two sub-tasks: automatic keyword extraction, WSD
Wikify!1 can perform KW extraction by extracting candidates
and then ranking them
The system does knowledge-based and data-driven WSD,
filtering out annotations that contain disagreements
Disambiguate links using relatedness, commonness (prior
probability of a sense), and context quality (context terms).
1
Csomai and Mihalcea (2008)
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
43. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Outline
1 Introduction
2 WSD
3 WSI
4 Evaluation and Issues
5 Wikipedia
6 Summary
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction
44. Introduction WSD WSI Evaluation and Issues Wikipedia Summary
Questions
Thank you. Are there any questions?
Leon Derczynski University of Sheffield
Word Sense Disambiguation and Induction