Graph Connectivity Measures for Unsupervised Word Sense Disambiguation
1. + about the paper:
Graph Connectivity
Measures for
UnsupervisedWord
Sense Disambiguation
Giovanni Murru
Mirella Lapata
Seminars in Computational learning
methods for Natural Language Processing
Prof.Roberto Basili
Roberto Navigli Dipartimento di
Informatica
Sapienza
Università di Roma
School of
Informatics
University of
Edinburg
2. +
Abstract
n Development of graph-based unsupervised
algorithms for Word Sense Disambiguation
n Discussion about a variety of measures that
analyze the connectivity of the graph
structures
n Test the performance of these approaches
on standard data sets
3. +
Word Sense Disambiguation
n Word Sense Disambiguation (WSD) is an open research topic
in Natural Language Processing
n Its goal is to identify which sense of a word is intended in a
context, a sentence.
n The sense of the word is selected from a set of predefined
possibilities
n Sense Inventory (Dictionary,Thesaurus)
n Knowledge intensive methods, Supervised Learning
4. +
The essentiality of WSD
n Word Sense Disambiguation is essential for many
applications:
n Machine Translation (e.g. complex translations between natural
languages, achieved with corpus techniques)
n Information Retrieval (Used in Internet)
n Question Answering
n Knowledge Acquisition
n Summarization
5. +
Huge Data Sets
n One of the problems of Word Sense Disambiguation (WSD) is
the necessity to deal with huge data sets, in particular with
the supervised approach.
n While the Supervised Disambiguation is based on a labeled
training set the Unsupervised Disambiguation uses
unlabeled corpora.
n The corpora are large and structured sets of text.
n Supervised approach outperforms the unsupervised one, but
requires large amounts of training data.
6. +
Limitations of Supervised
n The Supervised Disambiguation can obtain reliable results
only with words, whose sense has been labeled.
n These sense tagged corpora are usually created by-hand,
and this is very expensive and requires a lot of work
n Paucity, scarcity of suitable data for many languages and text
genres.
n POSSIBLE SOLUTION?
Unsupervised Disambiguation
7. +
Graph vs Similarity (1/2)
n The Unsupervised method can be generally divided in 2
categories:
1. Graph Based
2. Similarity Base
n No need to label senses à optimal for large scale sense
disambiguation
n Similarity Based algorithms assign a sense to an ambiguous
word by comparing each of its senses with those of the
words surrounding the context.
n The sense with the highest similarity is assumed to be the right
one.
8. +
Graph vs Similarity (2/2)
n The work developed by Navigli and Lapata takes in account
the Graph-Based approach.
n Graph-Based steps:
n Build a graph representing all possible interpretations of the word
sequence that we have to disambiguate.
n Graph nodes à Word meanings
n Graph edges à Semantic relations between these senses
n Estimate the value of each node in order to determine its
importance.
n Sense Disambiguation is about finding the most important
node for each word.
9. +
Building the Graph (1/2)
n In the experiments, Navigli and Lapata used the WordNet
sense inventory.
n For each generic sentence σ they build a graph G
n σ= {w1,w2, … , wn} is a set of words
n The graph G is composed by a set of vertices
Vσ = {v1, v2, … , vn}
n Vσ initially contains, for each word wi that belongs to σ,
the set of senses associated to that particular word in the
WordNet sense inventory.
n The set of the edges E of the graph G is initially empty
10. +
Building the Graph (2/2)
n Let’s say V =Vσ
n For each word sense vi in Vσ, a depth-first search
regarding it in the WordNet graph is performed, and
n everytime a different word vj also contained in Vσ is found
n The semantic relations encountered during the path between vi
and vj are added to the set of edges E
n and the nodes involved in this path (between vi and vj) are
added to the set V of the vertices of the graph G.
n G is hence a representation of the semantic relations
between the words related to the particular sentence that
G represents.
11. +
Why the graph is built?
n G is a subgraph of the WordNet, whose vertices and
relations are reasonably useful for the WSD problem
n Remember:
n The aim of WSD is to find the most appropriate sense for each
word that belongs to the sentence σ.
n This is determined by ranking each vertex in the graph
G, according to its importance.
n How can we achieve this ranking?
How can we measure the relevance of a word sense?
n CONNECTIVITY MEASURES
12. +
Connectivity Measures (1/2)
n They are used to rank the nodes in order to select the most
plausible meaning.
n Connectivity measures can be of two types
n LOCAL
n GLOBAL
n While global measures estimate the connectivity of the
entire structure of the graph, the local measures capture the
degree of connectivity related to a single vertex in the graph.
13. +
Connectivity Measures (2/2)
n Assume to work with undirected graphs
n The researchers motivated this choice because semantic
relations often have a counterpart, like in the case of hypernymy
and hyponymy (IS-A)
n e.g. RED
n Hypernymy: something that red is a kind of (e.g. chromatic color)
n Hyponymy: something that is a kind of red (e.g. scarlet)
n They define a distance function d as the length of the shortest
path between two nodes
n In the case these two nodes are disconnected, d = K, where K is
the number of the graph’s nodes.
14. +
Local Measures (1/2)
n Local measures used in the experiments are:
n In-degree centrality
n Normalized number of edges terminating in a vertex
n Betweenness centrality
n The normalized fraction of shortest paths between node pairs
that pass through a vertex
n Key Player Problem (KPP)
n The normalized sum of the inverse of the distances between
the vertex and the remaining nodes of the graph
KPP(v) =
1
d(u,v)u∈S,v∈T
∑
V −1
15. +
Local Measures (2/2)
n The researchers also used the local measures:
n HITS and PageRank
n Link analysis algorithms that are normally used to rate web
pages, but can also be applied in the graph theory because
of the particular structure of the web.
n Maximum Flow
n Maximum s-t flow: number of independent paths between a
pair of vertices contained in the same partition of s and t
respectively.
n Evaluates the flow towards a vertex v, as a measure of the
sum of the maximum flows having v as a sink and the other
vertices of the graph as source.
16. +
Global Measures (1/2)
n They characterize the overall graph structure, thus they are
not particularly helpful in selecting a unique sense for
ambiguous words
n Navigli and Lapata used these 3 well-known Global
Measures in their experiments:
n Compactness
n High value à vertices are connected with small distances, the
graph is compact
n Low value à vertices are disconnected or connected with big
distances.
17. +
Global Measures (2/2)
n Graph Entropy
n Low value = few vertices are important
n High value = vertices are almost equally important
n Edge Density
n Is computed as the ratio between the number of edges in a graph
and the number of edges of a complete graph with the same
number of nodes.
18. +
Experiments
n The experiments organized by Navigli and Lapata used a
sentence-by-sentence disambiguation approach in order
to evaluate the lately explained measures.
n They built a graph for each sentence, ranked the nodes using
the measures, and selected the most appropriate meanings.
n They tested their algorithm using two different sense
inventories:
n WordNet 2.0
n An extended version of WordNet created by Navigli, adding
semantic edges (~ 60.000) extracted from collocation resources
(e.g Oxford Collocation, etc), that in particular defines
restrictions on how words can be used together:
n e.g. strong tea is ok, powerful tea is not
19. +
Experiments
n Two data standard sets
n SemCor Corpus
n subset of Brown Corpus
n 200,000 words manually tagged with WordNet senses
n Senseval-3 English all word
n subset of Penn TreeBank Corpus
n 2,081 words manually tagged with WordNet senses
n All the connectivity measures tested with SemCor.
n The best performing with SemCor, was tested with Senseval-3
too.
n Comparison between the graph-based algorithm developed
by the researchers and a naïve criterion that randomly selects
a sense for each word
20. +
The tests’ results (1/4)
n The tests were made using words with more than one
WordNet sense (polysemous).
n They used a chi-square test, a common statistical test.
LEGEND:
Prec: Precision,
measure of exactness
Rec: Recall, measure
of completeness
F1: mean between
Precision and Recall
F1 = 2 •
PREC • REC
PREC + REC
21. +
The tests’ results (2/4)
n PageRank better than HITS: maybe because of the random
surfer model, researcher stated.
n The best performing local measure is KPP with a F1=31.8%
or F1=40.5% using WordNet or EnWordNet respectively.
n The best performing global measure is Graph Entropy with a
F1=29.4% (WordNet) and F1=30.5% (ExtWordNet)
§ EnWordNet performs better than
WordNet:
• The existence of a denser
lexicon with large number of
semantic relations enhance
the measures.
22. +
The tests’ results (3/4)
n Since KPP was the best performing algorithm in SemCor, the
researcher tested the behavior of this particular algorithm
with SensVal-3 too, using the Enriched version of WordNet.
n And they compare it with the actually best unsupervised
system, based on a domain driven disambiguation.
23. +
The tests’ results (4/4)
n IRST-DDD compares the domain of the context surrounding
the target word with the domain of its senses and uses a
version of WordNet augmented with the use of domain
labels (e.g. economy, geography).
n KPP comparable to IRST-DDD for nouns and adjectives, but
worst for verbs.
n This can be explained as a lack of sentence relations (related
to verbs) in the enriched WordNet used for the tests.
24. +
Summary
n Navigli and Lapata presented a study of graph connectivity
measures for unsupervised WSD.
n A large number of local and global measures has been
evaluated.
n Local measures perform better than Global ones.
n KPP is better than other connectivity measures at identifying
which node in the graph is maximally connected to the
others (same results also in social network analysis).
n If the enrichment of WordNet is increased PageRank and
InDegree are comparable to KPP in terms of performance.