SlideShare ist ein Scribd-Unternehmen logo
1 von 24
+ about the paper:
Graph Connectivity
Measures for
UnsupervisedWord
Sense Disambiguation
Giovanni Murru
Mirella Lapata
Seminars in Computational learning
methods for Natural Language Processing
Prof.Roberto Basili
Roberto Navigli Dipartimento di
Informatica
Sapienza
Università di Roma
School of
Informatics
University of
Edinburg
+
Abstract
n Development of graph-based unsupervised
algorithms for Word Sense Disambiguation
n Discussion about a variety of measures that
analyze the connectivity of the graph
structures
n Test the performance of these approaches
on standard data sets
+
Word Sense Disambiguation
n  Word Sense Disambiguation (WSD) is an open research topic
in Natural Language Processing
n  Its goal is to identify which sense of a word is intended in a
context, a sentence.
n  The sense of the word is selected from a set of predefined
possibilities
n  Sense Inventory (Dictionary,Thesaurus)
n  Knowledge intensive methods, Supervised Learning
+
The essentiality of WSD
n  Word Sense Disambiguation is essential for many
applications:
n  Machine Translation (e.g. complex translations between natural
languages, achieved with corpus techniques)
n  Information Retrieval (Used in Internet)
n  Question Answering
n  Knowledge Acquisition
n  Summarization
+
Huge Data Sets
n  One of the problems of Word Sense Disambiguation (WSD) is
the necessity to deal with huge data sets, in particular with
the supervised approach.
n  While the Supervised Disambiguation is based on a labeled
training set the Unsupervised Disambiguation uses
unlabeled corpora.
n  The corpora are large and structured sets of text.
n  Supervised approach outperforms the unsupervised one, but
requires large amounts of training data.
+
Limitations of Supervised
n  The Supervised Disambiguation can obtain reliable results
only with words, whose sense has been labeled.
n  These sense tagged corpora are usually created by-hand,
and this is very expensive and requires a lot of work
n  Paucity, scarcity of suitable data for many languages and text
genres.
n  POSSIBLE SOLUTION?
Unsupervised Disambiguation
+
Graph vs Similarity (1/2)
n  The Unsupervised method can be generally divided in 2
categories:
1.  Graph Based
2.  Similarity Base
n  No need to label senses à optimal for large scale sense
disambiguation
n  Similarity Based algorithms assign a sense to an ambiguous
word by comparing each of its senses with those of the
words surrounding the context.
n  The sense with the highest similarity is assumed to be the right
one.
+
Graph vs Similarity (2/2)
n  The work developed by Navigli and Lapata takes in account
the Graph-Based approach.
n  Graph-Based steps:
n  Build a graph representing all possible interpretations of the word
sequence that we have to disambiguate.
n  Graph nodes à Word meanings
n  Graph edges à Semantic relations between these senses
n  Estimate the value of each node in order to determine its
importance.
n  Sense Disambiguation is about finding the most important
node for each word.
+
Building the Graph (1/2)
n  In the experiments, Navigli and Lapata used the WordNet
sense inventory.
n  For each generic sentence σ they build a graph G
n  σ= {w1,w2, … , wn} is a set of words
n  The graph G is composed by a set of vertices
Vσ = {v1, v2, … , vn}
n  Vσ initially contains, for each word wi that belongs to σ,
the set of senses associated to that particular word in the
WordNet sense inventory.
n  The set of the edges E of the graph G is initially empty
+
Building the Graph (2/2)
n  Let’s say V =Vσ
n  For each word sense vi in Vσ, a depth-first search
regarding it in the WordNet graph is performed, and
n  everytime a different word vj also contained in Vσ is found
n  The semantic relations encountered during the path between vi
and vj are added to the set of edges E
n  and the nodes involved in this path (between vi and vj) are
added to the set V of the vertices of the graph G.
n  G is hence a representation of the semantic relations
between the words related to the particular sentence that
G represents.
+
Why the graph is built?
n  G is a subgraph of the WordNet, whose vertices and
relations are reasonably useful for the WSD problem
n  Remember:
n  The aim of WSD is to find the most appropriate sense for each
word that belongs to the sentence σ.
n  This is determined by ranking each vertex in the graph
G, according to its importance.
n  How can we achieve this ranking?
How can we measure the relevance of a word sense?
n  CONNECTIVITY MEASURES
+
Connectivity Measures (1/2)
n  They are used to rank the nodes in order to select the most
plausible meaning.
n  Connectivity measures can be of two types
n  LOCAL
n  GLOBAL
n  While global measures estimate the connectivity of the
entire structure of the graph, the local measures capture the
degree of connectivity related to a single vertex in the graph.
+
Connectivity Measures (2/2)
n  Assume to work with undirected graphs
n  The researchers motivated this choice because semantic
relations often have a counterpart, like in the case of hypernymy
and hyponymy (IS-A)
n  e.g. RED
n  Hypernymy: something that red is a kind of (e.g. chromatic color)
n  Hyponymy: something that is a kind of red (e.g. scarlet)
n  They define a distance function d as the length of the shortest
path between two nodes
n  In the case these two nodes are disconnected, d = K, where K is
the number of the graph’s nodes.
+
Local Measures (1/2)
n  Local measures used in the experiments are:
n  In-degree centrality
n  Normalized number of edges terminating in a vertex
n  Betweenness centrality
n  The normalized fraction of shortest paths between node pairs
that pass through a vertex
n  Key Player Problem (KPP)
n  The normalized sum of the inverse of the distances between
the vertex and the remaining nodes of the graph
KPP(v) =
1
d(u,v)u∈S,v∈T
∑
V −1
+
Local Measures (2/2)
n  The researchers also used the local measures:
n  HITS and PageRank
n  Link analysis algorithms that are normally used to rate web
pages, but can also be applied in the graph theory because
of the particular structure of the web.
n  Maximum Flow
n  Maximum s-t flow: number of independent paths between a
pair of vertices contained in the same partition of s and t
respectively.
n  Evaluates the flow towards a vertex v, as a measure of the
sum of the maximum flows having v as a sink and the other
vertices of the graph as source.
+
Global Measures (1/2)
n  They characterize the overall graph structure, thus they are
not particularly helpful in selecting a unique sense for
ambiguous words
n  Navigli and Lapata used these 3 well-known Global
Measures in their experiments:
n  Compactness
n  High value à vertices are connected with small distances, the
graph is compact
n  Low value à vertices are disconnected or connected with big
distances.
+
Global Measures (2/2)
n  Graph Entropy
n  Low value = few vertices are important
n  High value = vertices are almost equally important
n  Edge Density
n  Is computed as the ratio between the number of edges in a graph
and the number of edges of a complete graph with the same
number of nodes.
+
Experiments
n  The experiments organized by Navigli and Lapata used a
sentence-by-sentence disambiguation approach in order
to evaluate the lately explained measures.
n  They built a graph for each sentence, ranked the nodes using
the measures, and selected the most appropriate meanings.
n  They tested their algorithm using two different sense
inventories:
n  WordNet 2.0
n  An extended version of WordNet created by Navigli, adding
semantic edges (~ 60.000) extracted from collocation resources
(e.g Oxford Collocation, etc), that in particular defines
restrictions on how words can be used together:
n  e.g. strong tea is ok, powerful tea is not
+
Experiments
n  Two data standard sets
n  SemCor Corpus
n  subset of Brown Corpus
n  200,000 words manually tagged with WordNet senses
n  Senseval-3 English all word
n  subset of Penn TreeBank Corpus
n  2,081 words manually tagged with WordNet senses
n  All the connectivity measures tested with SemCor.
n  The best performing with SemCor, was tested with Senseval-3
too.
n  Comparison between the graph-based algorithm developed
by the researchers and a naïve criterion that randomly selects
a sense for each word
+
The tests’ results (1/4)
n  The tests were made using words with more than one
WordNet sense (polysemous).
n  They used a chi-square test, a common statistical test.
LEGEND:
Prec: Precision,
measure of exactness
Rec: Recall, measure
of completeness
F1: mean between
Precision and Recall
F1 = 2 •
PREC • REC
PREC + REC
+
The tests’ results (2/4)
n  PageRank better than HITS: maybe because of the random
surfer model, researcher stated.
n  The best performing local measure is KPP with a F1=31.8%
or F1=40.5% using WordNet or EnWordNet respectively.
n  The best performing global measure is Graph Entropy with a
F1=29.4% (WordNet) and F1=30.5% (ExtWordNet)
§ EnWordNet performs better than
WordNet:
• The existence of a denser
lexicon with large number of
semantic relations enhance
the measures.
+
The tests’ results (3/4)
n  Since KPP was the best performing algorithm in SemCor, the
researcher tested the behavior of this particular algorithm
with SensVal-3 too, using the Enriched version of WordNet.
n  And they compare it with the actually best unsupervised
system, based on a domain driven disambiguation.
+
The tests’ results (4/4)
n  IRST-DDD compares the domain of the context surrounding
the target word with the domain of its senses and uses a
version of WordNet augmented with the use of domain
labels (e.g. economy, geography).
n  KPP comparable to IRST-DDD for nouns and adjectives, but
worst for verbs.
n  This can be explained as a lack of sentence relations (related
to verbs) in the enriched WordNet used for the tests.
+
Summary
n  Navigli and Lapata presented a study of graph connectivity
measures for unsupervised WSD.
n  A large number of local and global measures has been
evaluated.
n  Local measures perform better than Global ones.
n  KPP is better than other connectivity measures at identifying
which node in the graph is maximally connected to the
others (same results also in social network analysis).
n  If the enrichment of WordNet is increased PageRank and
InDegree are comparable to KPP in terms of performance.

Weitere ähnliche Inhalte

Andere mochten auch

Controllers for 3R Robot
Controllers for 3R RobotControllers for 3R Robot
Controllers for 3R RobotGiovanni Murru
 
Mackey Glass Time Series Prediction
Mackey Glass Time Series PredictionMackey Glass Time Series Prediction
Mackey Glass Time Series PredictionGiovanni Murru
 
Practical Augmented Visualization on Handheld Devices for Cultural Heritage
Practical Augmented Visualization on Handheld Devices for Cultural Heritage Practical Augmented Visualization on Handheld Devices for Cultural Heritage
Practical Augmented Visualization on Handheld Devices for Cultural Heritage Giovanni Murru
 
About the paper: Development and application of a new steady-hand manipulator...
About the paper: Development and application of a new steady-hand manipulator...About the paper: Development and application of a new steady-hand manipulator...
About the paper: Development and application of a new steady-hand manipulator...Giovanni Murru
 
AI Strategies for Solving Poker Texas Hold'em
AI Strategies for Solving Poker Texas Hold'emAI Strategies for Solving Poker Texas Hold'em
AI Strategies for Solving Poker Texas Hold'emGiovanni Murru
 
Passive Optical Networks: The MAC protocols
Passive Optical Networks: The MAC protocolsPassive Optical Networks: The MAC protocols
Passive Optical Networks: The MAC protocolsGiovanni Murru
 
About the paper USC CINAPS Builds Bridges Observing and Monitoring the Southe...
About the paper USC CINAPS Builds Bridges Observing and Monitoring the Southe...About the paper USC CINAPS Builds Bridges Observing and Monitoring the Southe...
About the paper USC CINAPS Builds Bridges Observing and Monitoring the Southe...Giovanni Murru
 
Task Constrained Motion Planning for Snake Robot
Task Constrained Motion Planning for Snake RobotTask Constrained Motion Planning for Snake Robot
Task Constrained Motion Planning for Snake RobotGiovanni Murru
 
"3D from 2D: Theory, Implementation, and Applications of Structure from Motio...
"3D from 2D: Theory, Implementation, and Applications of Structure from Motio..."3D from 2D: Theory, Implementation, and Applications of Structure from Motio...
"3D from 2D: Theory, Implementation, and Applications of Structure from Motio...Edge AI and Vision Alliance
 
A new remote sensing methodology for detailed international mapping in the V4...
A new remote sensing methodology for detailed international mapping in the V4...A new remote sensing methodology for detailed international mapping in the V4...
A new remote sensing methodology for detailed international mapping in the V4...mirijovsky
 
Langhammer, Miřijovský
Langhammer, MiřijovskýLanghammer, Miřijovský
Langhammer, Miřijovskýmirijovsky
 
Structure from motion
Structure from motionStructure from motion
Structure from motionFatima Radi
 
Диспетчер тегов Google. Меньше затрат, больше контроля.
Диспетчер тегов Google. Меньше затрат, больше контроля.Диспетчер тегов Google. Меньше затрат, больше контроля.
Диспетчер тегов Google. Меньше затрат, больше контроля.CubeLine Agency
 

Andere mochten auch (20)

Controllers for 3R Robot
Controllers for 3R RobotControllers for 3R Robot
Controllers for 3R Robot
 
Mackey Glass Time Series Prediction
Mackey Glass Time Series PredictionMackey Glass Time Series Prediction
Mackey Glass Time Series Prediction
 
Practical Augmented Visualization on Handheld Devices for Cultural Heritage
Practical Augmented Visualization on Handheld Devices for Cultural Heritage Practical Augmented Visualization on Handheld Devices for Cultural Heritage
Practical Augmented Visualization on Handheld Devices for Cultural Heritage
 
About the paper: Development and application of a new steady-hand manipulator...
About the paper: Development and application of a new steady-hand manipulator...About the paper: Development and application of a new steady-hand manipulator...
About the paper: Development and application of a new steady-hand manipulator...
 
AI Strategies for Solving Poker Texas Hold'em
AI Strategies for Solving Poker Texas Hold'emAI Strategies for Solving Poker Texas Hold'em
AI Strategies for Solving Poker Texas Hold'em
 
Passive Optical Networks: The MAC protocols
Passive Optical Networks: The MAC protocolsPassive Optical Networks: The MAC protocols
Passive Optical Networks: The MAC protocols
 
About the paper USC CINAPS Builds Bridges Observing and Monitoring the Southe...
About the paper USC CINAPS Builds Bridges Observing and Monitoring the Southe...About the paper USC CINAPS Builds Bridges Observing and Monitoring the Southe...
About the paper USC CINAPS Builds Bridges Observing and Monitoring the Southe...
 
GLUTetris
GLUTetrisGLUTetris
GLUTetris
 
Task Constrained Motion Planning for Snake Robot
Task Constrained Motion Planning for Snake RobotTask Constrained Motion Planning for Snake Robot
Task Constrained Motion Planning for Snake Robot
 
"3D from 2D: Theory, Implementation, and Applications of Structure from Motio...
"3D from 2D: Theory, Implementation, and Applications of Structure from Motio..."3D from 2D: Theory, Implementation, and Applications of Structure from Motio...
"3D from 2D: Theory, Implementation, and Applications of Structure from Motio...
 
Introduction to Fault Scarps
Introduction to Fault Scarps Introduction to Fault Scarps
Introduction to Fault Scarps
 
SfM Research Applications Presentation
SfM Research Applications PresentationSfM Research Applications Presentation
SfM Research Applications Presentation
 
A new remote sensing methodology for detailed international mapping in the V4...
A new remote sensing methodology for detailed international mapping in the V4...A new remote sensing methodology for detailed international mapping in the V4...
A new remote sensing methodology for detailed international mapping in the V4...
 
Langhammer, Miřijovský
Langhammer, MiřijovskýLanghammer, Miřijovský
Langhammer, Miřijovský
 
SfM Workflow Presentation
SfM Workflow PresentationSfM Workflow Presentation
SfM Workflow Presentation
 
Structure from motion
Structure from motionStructure from motion
Structure from motion
 
Performance agency
Performance agencyPerformance agency
Performance agency
 
Диспетчер тегов Google. Меньше затрат, больше контроля.
Диспетчер тегов Google. Меньше затрат, больше контроля.Диспетчер тегов Google. Меньше затрат, больше контроля.
Диспетчер тегов Google. Меньше затрат, больше контроля.
 
Anordjony
AnordjonyAnordjony
Anordjony
 
Teaching Profession
Teaching ProfessionTeaching Profession
Teaching Profession
 

Ähnlich wie Graph Connectivity Measures for Unsupervised Word Sense Disambiguation

M. De Cubellis, F. De Fausti, Word Embeddings: modellare il significato delle...
M. De Cubellis, F. De Fausti, Word Embeddings: modellare il significato delle...M. De Cubellis, F. De Fausti, Word Embeddings: modellare il significato delle...
M. De Cubellis, F. De Fausti, Word Embeddings: modellare il significato delle...Istituto nazionale di statistica
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsSharath TS
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsShubhangi Tandon
 
Distributed coloring with O(sqrt. log n) bits
Distributed coloring with O(sqrt. log n) bitsDistributed coloring with O(sqrt. log n) bits
Distributed coloring with O(sqrt. log n) bitsSubhajit Sahu
 
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET Journal
 
A survey on parallel corpora alignment
A survey on parallel corpora alignment A survey on parallel corpora alignment
A survey on parallel corpora alignment andrefsantos
 
An Slight Overview of the Critical Elements of Spatial Statistics
An Slight Overview of the Critical Elements of Spatial StatisticsAn Slight Overview of the Critical Elements of Spatial Statistics
An Slight Overview of the Critical Elements of Spatial StatisticsTony Fast
 
Hierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamHierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamkevig
 
Hierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamHierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamkevig
 
A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2Jisoo Jang
 
PyData Los Angeles 2020 (Abhilash Majumder)
PyData Los Angeles 2020 (Abhilash Majumder)PyData Los Angeles 2020 (Abhilash Majumder)
PyData Los Angeles 2020 (Abhilash Majumder)Abhilash Majumder
 
GraphSignalProcessingFinalPaper
GraphSignalProcessingFinalPaperGraphSignalProcessingFinalPaper
GraphSignalProcessingFinalPaperChiraz Nafouki
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)dnac
 

Ähnlich wie Graph Connectivity Measures for Unsupervised Word Sense Disambiguation (20)

Networks and Natural Language Processing
Networks and Natural Language ProcessingNetworks and Natural Language Processing
Networks and Natural Language Processing
 
M. De Cubellis, F. De Fausti, Word Embeddings: modellare il significato delle...
M. De Cubellis, F. De Fausti, Word Embeddings: modellare il significato delle...M. De Cubellis, F. De Fausti, Word Embeddings: modellare il significato delle...
M. De Cubellis, F. De Fausti, Word Embeddings: modellare il significato delle...
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into Texts
 
2016 m7 w2
2016 m7 w22016 m7 w2
2016 m7 w2
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into Texts
 
Distributed coloring with O(sqrt. log n) bits
Distributed coloring with O(sqrt. log n) bitsDistributed coloring with O(sqrt. log n) bits
Distributed coloring with O(sqrt. log n) bits
 
BoysTownJobTalk
BoysTownJobTalkBoysTownJobTalk
BoysTownJobTalk
 
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
 
A survey on parallel corpora alignment
A survey on parallel corpora alignment A survey on parallel corpora alignment
A survey on parallel corpora alignment
 
HalifaxNGGs
HalifaxNGGsHalifaxNGGs
HalifaxNGGs
 
An Slight Overview of the Critical Elements of Spatial Statistics
An Slight Overview of the Critical Elements of Spatial StatisticsAn Slight Overview of the Critical Elements of Spatial Statistics
An Slight Overview of the Critical Elements of Spatial Statistics
 
Hierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamHierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a stream
 
Hierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamHierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a stream
 
A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2
 
PyData Los Angeles 2020 (Abhilash Majumder)
PyData Los Angeles 2020 (Abhilash Majumder)PyData Los Angeles 2020 (Abhilash Majumder)
PyData Los Angeles 2020 (Abhilash Majumder)
 
GraphSignalProcessingFinalPaper
GraphSignalProcessingFinalPaperGraphSignalProcessingFinalPaper
GraphSignalProcessingFinalPaper
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
08 Exponential Random Graph Models (2016)
08 Exponential Random Graph Models (2016)08 Exponential Random Graph Models (2016)
08 Exponential Random Graph Models (2016)
 
08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)
 

Kürzlich hochgeladen

UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGSIVASHANKAR N
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 

Kürzlich hochgeladen (20)

UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 

Graph Connectivity Measures for Unsupervised Word Sense Disambiguation

  • 1. + about the paper: Graph Connectivity Measures for UnsupervisedWord Sense Disambiguation Giovanni Murru Mirella Lapata Seminars in Computational learning methods for Natural Language Processing Prof.Roberto Basili Roberto Navigli Dipartimento di Informatica Sapienza Università di Roma School of Informatics University of Edinburg
  • 2. + Abstract n Development of graph-based unsupervised algorithms for Word Sense Disambiguation n Discussion about a variety of measures that analyze the connectivity of the graph structures n Test the performance of these approaches on standard data sets
  • 3. + Word Sense Disambiguation n  Word Sense Disambiguation (WSD) is an open research topic in Natural Language Processing n  Its goal is to identify which sense of a word is intended in a context, a sentence. n  The sense of the word is selected from a set of predefined possibilities n  Sense Inventory (Dictionary,Thesaurus) n  Knowledge intensive methods, Supervised Learning
  • 4. + The essentiality of WSD n  Word Sense Disambiguation is essential for many applications: n  Machine Translation (e.g. complex translations between natural languages, achieved with corpus techniques) n  Information Retrieval (Used in Internet) n  Question Answering n  Knowledge Acquisition n  Summarization
  • 5. + Huge Data Sets n  One of the problems of Word Sense Disambiguation (WSD) is the necessity to deal with huge data sets, in particular with the supervised approach. n  While the Supervised Disambiguation is based on a labeled training set the Unsupervised Disambiguation uses unlabeled corpora. n  The corpora are large and structured sets of text. n  Supervised approach outperforms the unsupervised one, but requires large amounts of training data.
  • 6. + Limitations of Supervised n  The Supervised Disambiguation can obtain reliable results only with words, whose sense has been labeled. n  These sense tagged corpora are usually created by-hand, and this is very expensive and requires a lot of work n  Paucity, scarcity of suitable data for many languages and text genres. n  POSSIBLE SOLUTION? Unsupervised Disambiguation
  • 7. + Graph vs Similarity (1/2) n  The Unsupervised method can be generally divided in 2 categories: 1.  Graph Based 2.  Similarity Base n  No need to label senses à optimal for large scale sense disambiguation n  Similarity Based algorithms assign a sense to an ambiguous word by comparing each of its senses with those of the words surrounding the context. n  The sense with the highest similarity is assumed to be the right one.
  • 8. + Graph vs Similarity (2/2) n  The work developed by Navigli and Lapata takes in account the Graph-Based approach. n  Graph-Based steps: n  Build a graph representing all possible interpretations of the word sequence that we have to disambiguate. n  Graph nodes à Word meanings n  Graph edges à Semantic relations between these senses n  Estimate the value of each node in order to determine its importance. n  Sense Disambiguation is about finding the most important node for each word.
  • 9. + Building the Graph (1/2) n  In the experiments, Navigli and Lapata used the WordNet sense inventory. n  For each generic sentence σ they build a graph G n  σ= {w1,w2, … , wn} is a set of words n  The graph G is composed by a set of vertices Vσ = {v1, v2, … , vn} n  Vσ initially contains, for each word wi that belongs to σ, the set of senses associated to that particular word in the WordNet sense inventory. n  The set of the edges E of the graph G is initially empty
  • 10. + Building the Graph (2/2) n  Let’s say V =Vσ n  For each word sense vi in Vσ, a depth-first search regarding it in the WordNet graph is performed, and n  everytime a different word vj also contained in Vσ is found n  The semantic relations encountered during the path between vi and vj are added to the set of edges E n  and the nodes involved in this path (between vi and vj) are added to the set V of the vertices of the graph G. n  G is hence a representation of the semantic relations between the words related to the particular sentence that G represents.
  • 11. + Why the graph is built? n  G is a subgraph of the WordNet, whose vertices and relations are reasonably useful for the WSD problem n  Remember: n  The aim of WSD is to find the most appropriate sense for each word that belongs to the sentence σ. n  This is determined by ranking each vertex in the graph G, according to its importance. n  How can we achieve this ranking? How can we measure the relevance of a word sense? n  CONNECTIVITY MEASURES
  • 12. + Connectivity Measures (1/2) n  They are used to rank the nodes in order to select the most plausible meaning. n  Connectivity measures can be of two types n  LOCAL n  GLOBAL n  While global measures estimate the connectivity of the entire structure of the graph, the local measures capture the degree of connectivity related to a single vertex in the graph.
  • 13. + Connectivity Measures (2/2) n  Assume to work with undirected graphs n  The researchers motivated this choice because semantic relations often have a counterpart, like in the case of hypernymy and hyponymy (IS-A) n  e.g. RED n  Hypernymy: something that red is a kind of (e.g. chromatic color) n  Hyponymy: something that is a kind of red (e.g. scarlet) n  They define a distance function d as the length of the shortest path between two nodes n  In the case these two nodes are disconnected, d = K, where K is the number of the graph’s nodes.
  • 14. + Local Measures (1/2) n  Local measures used in the experiments are: n  In-degree centrality n  Normalized number of edges terminating in a vertex n  Betweenness centrality n  The normalized fraction of shortest paths between node pairs that pass through a vertex n  Key Player Problem (KPP) n  The normalized sum of the inverse of the distances between the vertex and the remaining nodes of the graph KPP(v) = 1 d(u,v)u∈S,v∈T ∑ V −1
  • 15. + Local Measures (2/2) n  The researchers also used the local measures: n  HITS and PageRank n  Link analysis algorithms that are normally used to rate web pages, but can also be applied in the graph theory because of the particular structure of the web. n  Maximum Flow n  Maximum s-t flow: number of independent paths between a pair of vertices contained in the same partition of s and t respectively. n  Evaluates the flow towards a vertex v, as a measure of the sum of the maximum flows having v as a sink and the other vertices of the graph as source.
  • 16. + Global Measures (1/2) n  They characterize the overall graph structure, thus they are not particularly helpful in selecting a unique sense for ambiguous words n  Navigli and Lapata used these 3 well-known Global Measures in their experiments: n  Compactness n  High value à vertices are connected with small distances, the graph is compact n  Low value à vertices are disconnected or connected with big distances.
  • 17. + Global Measures (2/2) n  Graph Entropy n  Low value = few vertices are important n  High value = vertices are almost equally important n  Edge Density n  Is computed as the ratio between the number of edges in a graph and the number of edges of a complete graph with the same number of nodes.
  • 18. + Experiments n  The experiments organized by Navigli and Lapata used a sentence-by-sentence disambiguation approach in order to evaluate the lately explained measures. n  They built a graph for each sentence, ranked the nodes using the measures, and selected the most appropriate meanings. n  They tested their algorithm using two different sense inventories: n  WordNet 2.0 n  An extended version of WordNet created by Navigli, adding semantic edges (~ 60.000) extracted from collocation resources (e.g Oxford Collocation, etc), that in particular defines restrictions on how words can be used together: n  e.g. strong tea is ok, powerful tea is not
  • 19. + Experiments n  Two data standard sets n  SemCor Corpus n  subset of Brown Corpus n  200,000 words manually tagged with WordNet senses n  Senseval-3 English all word n  subset of Penn TreeBank Corpus n  2,081 words manually tagged with WordNet senses n  All the connectivity measures tested with SemCor. n  The best performing with SemCor, was tested with Senseval-3 too. n  Comparison between the graph-based algorithm developed by the researchers and a naïve criterion that randomly selects a sense for each word
  • 20. + The tests’ results (1/4) n  The tests were made using words with more than one WordNet sense (polysemous). n  They used a chi-square test, a common statistical test. LEGEND: Prec: Precision, measure of exactness Rec: Recall, measure of completeness F1: mean between Precision and Recall F1 = 2 • PREC • REC PREC + REC
  • 21. + The tests’ results (2/4) n  PageRank better than HITS: maybe because of the random surfer model, researcher stated. n  The best performing local measure is KPP with a F1=31.8% or F1=40.5% using WordNet or EnWordNet respectively. n  The best performing global measure is Graph Entropy with a F1=29.4% (WordNet) and F1=30.5% (ExtWordNet) § EnWordNet performs better than WordNet: • The existence of a denser lexicon with large number of semantic relations enhance the measures.
  • 22. + The tests’ results (3/4) n  Since KPP was the best performing algorithm in SemCor, the researcher tested the behavior of this particular algorithm with SensVal-3 too, using the Enriched version of WordNet. n  And they compare it with the actually best unsupervised system, based on a domain driven disambiguation.
  • 23. + The tests’ results (4/4) n  IRST-DDD compares the domain of the context surrounding the target word with the domain of its senses and uses a version of WordNet augmented with the use of domain labels (e.g. economy, geography). n  KPP comparable to IRST-DDD for nouns and adjectives, but worst for verbs. n  This can be explained as a lack of sentence relations (related to verbs) in the enriched WordNet used for the tests.
  • 24. + Summary n  Navigli and Lapata presented a study of graph connectivity measures for unsupervised WSD. n  A large number of local and global measures has been evaluated. n  Local measures perform better than Global ones. n  KPP is better than other connectivity measures at identifying which node in the graph is maximally connected to the others (same results also in social network analysis). n  If the enrichment of WordNet is increased PageRank and InDegree are comparable to KPP in terms of performance.