SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Downloaden Sie, um offline zu lesen
Introduction           WSD                WSI           Evaluation and Issues   Wikipedia              Summary




                  Word Sense Disambiguation and Induction

                                                Leon Derczynski

                                                University of Sheffield


                                                27 January 2011




Leon Derczynski                                                                             University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Origin




                     Originally a course at ESSLLI 2011, Copenhagen
                         by Roberto Navigli and Simone Ponzetto




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Outline


       1 Introduction

       2 WSD

       3 WSI

       4 Evaluation and Issues

       5 Wikipedia

       6 Summary



Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




General Problem

               Being able to disambiguate words in context is a crucial
               problem
               Can potentially help improve many other NLP applications
               Polysemy is everywhere – our job is to model this
               Ambiguity is rampant.
               I saw a man who is 98 years old and can still walk and tell
               jokes.
               saw:26 man:11 years:4 old:8 can:5 still:4 walk:10 tell:8 jokes:3
               43 929 600 possible senses for this simple sentence.
       general problem, ambiguity is rampant

Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Word Senses




               Monosemous words – only one meaning; plant life, internet
               Polysemous words – more than one meaning; bar, bass
               A word sense is a commonly-accepted meaning of a word.
               We are fond of fruit such as the kiwifruit and banana.




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Enumerative Approach




               Fixed sense inventory enumerates the range of possible
               meanings of a word
               Context is used to select a particular sense
               chop vegetables with a knife, was stabbed with a knife
               However, we may want to add senses.




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




WSD Tasks



               Different representations of senses change the way we think
               about WSD
               Lexical sample – disambiguate a restricted set of words
               All words – disambiguate all content words
               Cross lingual WSD – disambiguate a target word by labeling it
               with the appropriate translation in other languages; eg.
               English coach → German Bus/Linienbus/Omnibus/Reisebus.




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Representing the Context


               Text is unstructured, and needs to be made machine-readable.
               Flat representation (surface features) vs. Structured
               representation (graphs, trees)
               Local features: local context of a word usage, e.g. PoS tags
               and surrounding word forms
               Topical features: general topic of a sentence or discourse,
               represented as a bag of words
               Syntactic features: argument-head relations between target
               and rest of sentence
               Semantic features: previously established word senses


Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Knowledge Resources



               Structured and Unstructured
               Thesauri, machine-readable dictionaries, semantic networks
               (WordNet)
               BabelNet – Babel synsets, with semantic relations (is-a,
               part-of)
               Raw corpora
               Collocation (Web1T)




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Applications



               Information extraction – acronym expansion, disambiguate
               people names, domain-specific IE
               Information retrieval
               Machine Translation
               Semantic web
               Question answering




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Approaches



               Supervised WSD: classification task, hand-labelled data
               KB WSD: uses knowledge resources, no training
               Unsupervised: performs WSI
               Word sense dominance: find predominant sense of a word
               Domain-driven WSD: use domain information as vectors to
               compare with senses of w




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Outline


       1 Introduction

       2 WSD

       3 WSI

       4 Evaluation and Issues

       5 Wikipedia

       6 Summary



Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Supervised WSD



               Given a set of manually sense-annotated examples (training
               set), learn a classifier
               Features for WSD: Bag of words, bigrams, collocations, VP
               and NP heads, PoS
               Using WordNet as a sense inventory, SemCor is a readily
               available source of sense-labelled data
               Current SotA performance from SVMs




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Knowledge-based WSD




               Exploit knowledge resources (dictionaries, thesauri,
               collocations) to assign senses
               Lower performance than supervised methods, but wider
               coverage
               No need to train or be tuned to a task/domain




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Gloss Overlap

               Knowledge-based method proposed by Lesk (1986)
               Retrieve all sense definitions of target word
               Compare each sense definition with the definitions of other
               words in context
               Choose the sense with the most overlap
               To disambiguate pine cone;
               pine: 1. a kind of evergreen tree; 2. to waste away through
               sorrow.
               cone: 1. a solid body which narrows to a point; 2. something
               of this shape; 3. fruit of certain evergreen trees.


Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Lexical Chains



               Knowledge-based method proposed by Hirst and St Onge
               (1998)
               A lexical chain is a sequence of semantically related words in a
               text
               Assign scores to senses based on the chain of related words it
               is in




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




PageRank



               Knowledge-based method proposed by Agirre and Soroa
               (2009)
               Build a graph including all synsets of words in the input text
               Assign an initial low value to each node in the graph
               Apply PageRank (Brin and Page) to the graph, and select
               synsets with the highest PR




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Knowledge Acquisition Bottleneck



               WSD needs knowledge! Corpora, dictionaries, semantic
               networks
               More knowledge is required to improve the performance of
               both:
               Supervised systems – more training data
               Knowledge based systems – richer networks




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Minimally Supervised WSD




               Human supervision is expensive, but required for training
               examples or a knowledge base
               Minimally supervised approaches aim to learn classifiers from
               annotated data with minimal human supervision




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Bootstrapping



               Given a set labelled examples L, a set of unlabelled examples
               U and a classifier c:
               1. Choose N examples from U and add them to U ′
               2. Train c on L and label U ′
               3. Select K most confidently labelled instances from U ′ and
               assign them to L
               Repeat until U or K is empty




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Outline


       1 Introduction

       2 WSD

       3 WSI

       4 Evaluation and Issues

       5 Wikipedia

       6 Summary



Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Word Sense Induction



               Based on the idea that one sense of a word will have similar
               neighbouring words
               Follows the idea that the meaning of a word is given by its
               usage
               We induce word sense from input text by clustering word
               occurrences




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Clustering



               Unsupervised machine learning for grouping similar objects
               into groups
               No a priori input (sense labels)
               Context clustering: each occurrence of a word is represented
               as a context vector; cluster vectors into groups
               Word clustering: cluster words which are semantically similar
               and thus have a specific meaning




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Word Clustering


               Aims to cluster words which are semantically similar
               Lin (1998) proposes this method:
               1. Extract dependency triples from a text corpus
               John eats a yummy kiwi → (eat subj John), (kiwi obj-of eat),
               (kiwi det a) ...
               2. Define a measure of similarity between two words
               3. Use similarity scores to create a similarity tree; start with a
               root node, and add recursively add children in descending
               order of similarity.



Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Lin’s approach: example




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




WSI: pros and cons




               + Actually performs word sense disambiguation
               + Aims to divide the occurrences of a word into a number of
               classes
               - Makes objective evaluation more difficult if not
               domain-specific




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Outline


       1 Introduction

       2 WSD

       3 WSI

       4 Evaluation and Issues

       5 Wikipedia

       6 Summary



Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Disambiguation Evaluation




               Disambiguation is easy to evaluate – we have discrete sense
               inventories
               Evaluate with Coverage (answers given),
               Precision and Recall, and then F1
               Accuracy – correct answers / total answers




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Disambiguation Baselines




               MFS – Most Frequent Sense
               Strong baseline - 50-60% accuracy on lexical sample task
               Doesn’t take into account genre (e.g. star in astrophysics /
               newswire)
               Subject to idiosyncracies of corpus




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Evaluation with gold-standard clustering




               Given a standard clustering, compare the gold standard and
               output clustering
               Can evaluate with set Entropy, Purity
               Also RandIndex (similar to Jacquard) and F-Score.




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Discrimination Baselines




               All-in-one: group all words into one big cluster
               Random: produce a random set of clusters




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Pseudowords



               Discrimination evaluation method
               Generates new words with artificial ambiguity
               Select two or more monosemous terms from gold standard
               data
               Given all their occurrences in a corpus, replace them with a
               pseudoword formed by joining the two terms
               Compare automatic discrimination to gold standard




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




SemEval-2007



               Lexical sample and all-words coarse grained WSD
               Preposition disambiguation
               Evaluation of WSD on cross-language RI
               WSI, lexical substitution
               Top systems reach 88.7% accuracy (on lexical sample) and
               82.5% (on all-words)




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




SemEval-2010




               Fifth event of its kind
               Includes specific cross-lingual tasks
               Combined WSI/WSD task
               Domain-specific all-words task




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Issues




               Representation of word senses: enumerative vs. generative
               approach
               Knowledge Acquisition Bottleneck: not enough data!
               Benefits for AI/NLP applications




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Alleviating the Knowledge Acquisition Bottleneck



               Weakly-supervised algorithms, incorporating bootstrapping or
               active learning
               Continuing manual efforts – WordNet, Open Mind Word
               Expert, OntoNotes
               Automatic enrichment of knowledge resources – collocation
               and relation triple extraction, BabelNet




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Future Challenges



               How can we mine even larger repositories of textual data –
               e.g. the whole web! – to create huge knowledge repositories?
               How can we design high performance and scalable algorithms
               to use this data?
               Need to decide which kind of word sense are needed for which
               application
               Still, need to develop a general representation of word senses




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Outline


       1 Introduction

       2 WSD

       3 WSI

       4 Evaluation and Issues

       5 Wikipedia

       6 Summary



Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Wikipedia as sense inventory




               Wikipedia articles provide an inventory of disambiguated word
               senses and entity references
               Task: Use their occurrences in texts, i.e. the internal
               Wikipedia hyperlinks, as named entity and sense annotations
               The articles’ texts provide a sense annotated corpus




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Mihalcea (2007)
               Mihalcea proposes a method for automatically generating
               sense-tagged data using Wikipedia
               Rhythm is the arrangement of sounds in time. Meter animates
               time in regular pulse groupings, called measures or [[bar
               (music)—bar]].
               The nightlife is particularly active around the beachfront
               promenades because of its many nightclubs and [[bar
               (establishment)—bars]].
               1. Extract all paragraphs in Wikipedia containing word w
               2. Collect all possible labels l2 ..ln for w
               3. Map each label l to its WordNet sense s
               4. Annotate each occurrence of li |w with its sense s
               System trained on Wikipedia significantly outperforms MFS
               and Lesk baselines
Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Knowledge-rich WSD


               General aim is to relieve knowledge acquisition bottleneck of
               NLP systems, with WSD as a case study
               Main ideas:
               - Extend WordNet with millions of semantic relations (using
               Wikipedia)
               - Apply knowledge-based WSD to exploit extended WordNet
               Results: integration of many, many semantic relations in
               knowledge-based systems yields performance competitive with
               SotA supervised approaches



Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction              WSD             WSI   Evaluation and Issues   Wikipedia              Summary




Wikification


                    The task of generating hyperlinks to disambiguated Wikipedia
                    concepts
                    Two sub-tasks: automatic keyword extraction, WSD
                    Wikify!1 can perform KW extraction by extracting candidates
                    and then ranking them
                    The system does knowledge-based and data-driven WSD,
                    filtering out annotations that contain disagreements
                    Disambiguate links using relatedness, commonness (prior
                    probability of a sense), and context quality (context terms).


               1
                   Csomai and Mihalcea (2008)
Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Outline


       1 Introduction

       2 WSD

       3 WSI

       4 Evaluation and Issues

       5 Wikipedia

       6 Summary



Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction
Introduction           WSD                WSI   Evaluation and Issues   Wikipedia              Summary




Questions




                              Thank you. Are there any questions?




Leon Derczynski                                                                     University of Sheffield
Word Sense Disambiguation and Induction

Weitere ähnliche Inhalte

Was ist angesagt?

Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Rajnish Raj
 
WORDNET: A Database of Lexical Relations
WORDNET: A Database of Lexical RelationsWORDNET: A Database of Lexical Relations
WORDNET: A Database of Lexical RelationsAhmed Abd-Elwasaa
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyMarina Santini
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptxHeneWijaya
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processinggulshan kumar
 
Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text ClassificationSai Srinivas Kotni
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanismKhang Pham
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with PythonBenjamin Bengfort
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processingrohitnayak
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
Lightweight Natural Language Processing (NLP)
Lightweight Natural Language Processing (NLP)Lightweight Natural Language Processing (NLP)
Lightweight Natural Language Processing (NLP)Lithium
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingIla Group
 

Was ist angesagt? (20)

Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
 
WORDNET: A Database of Lexical Relations
WORDNET: A Database of Lexical RelationsWORDNET: A Database of Lexical Relations
WORDNET: A Database of Lexical Relations
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Word embedding
Word embedding Word embedding
Word embedding
 
Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vec
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language Technology
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptx
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processing
 
Language models
Language modelsLanguage models
Language models
 
Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text Classification
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanism
 
Wordnet
WordnetWordnet
Wordnet
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
NLP
NLPNLP
NLP
 
Machine Tanslation
Machine TanslationMachine Tanslation
Machine Tanslation
 
Lightweight Natural Language Processing (NLP)
Lightweight Natural Language Processing (NLP)Lightweight Natural Language Processing (NLP)
Lightweight Natural Language Processing (NLP)
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 

Andere mochten auch

Word sense disambiguation a survey
Word sense disambiguation a surveyWord sense disambiguation a survey
Word sense disambiguation a surveyunyil96
 
Error analysis of Word Sense Disambiguation
Error analysis of Word Sense DisambiguationError analysis of Word Sense Disambiguation
Error analysis of Word Sense DisambiguationRubén Izquierdo Beviá
 
Word Sense Disambiguation and Intelligent Information Access
Word Sense Disambiguation and Intelligent Information AccessWord Sense Disambiguation and Intelligent Information Access
Word Sense Disambiguation and Intelligent Information AccessPierpaolo Basile
 
Biomedical Word Sense Disambiguation presentation [Autosaved]
Biomedical Word Sense Disambiguation presentation [Autosaved]Biomedical Word Sense Disambiguation presentation [Autosaved]
Biomedical Word Sense Disambiguation presentation [Autosaved]akm sabbir
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationvini89
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksLeonardo Di Donato
 
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition ResourceBroad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition ResourceLeon Derczynski
 
Draft programme 15 09-2015
Draft programme 15 09-2015Draft programme 15 09-2015
Draft programme 15 09-2015predim
 
Word sense dissambiguation
Word sense dissambiguationWord sense dissambiguation
Word sense dissambiguationAshwin Perti
 
BibleTech2011
BibleTech2011BibleTech2011
BibleTech2011Andi Wu
 
An Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense DisambiguationAn Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense DisambiguationSurabhi Verma
 
A word sense disambiguation technique for sinhala
A word sense disambiguation technique  for sinhalaA word sense disambiguation technique  for sinhala
A word sense disambiguation technique for sinhalaVijayindu Gamage
 
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...Pierpaolo Basile
 
Graph-based Word Sense Disambiguation
Graph-based Word Sense DisambiguationGraph-based Word Sense Disambiguation
Graph-based Word Sense DisambiguationElena-Oana Tabaranu
 
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Innovation Quotient Pvt Ltd
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationvini89
 
Amharic WSD using WordNet
Amharic WSD using WordNetAmharic WSD using WordNet
Amharic WSD using WordNetSeid Hassen
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word CloudsMarina Santini
 
Zoological nomenclature
Zoological nomenclatureZoological nomenclature
Zoological nomenclatureManideep Raj
 

Andere mochten auch (20)

Word sense disambiguation a survey
Word sense disambiguation a surveyWord sense disambiguation a survey
Word sense disambiguation a survey
 
Error analysis of Word Sense Disambiguation
Error analysis of Word Sense DisambiguationError analysis of Word Sense Disambiguation
Error analysis of Word Sense Disambiguation
 
Word Sense Disambiguation and Intelligent Information Access
Word Sense Disambiguation and Intelligent Information AccessWord Sense Disambiguation and Intelligent Information Access
Word Sense Disambiguation and Intelligent Information Access
 
Biomedical Word Sense Disambiguation presentation [Autosaved]
Biomedical Word Sense Disambiguation presentation [Autosaved]Biomedical Word Sense Disambiguation presentation [Autosaved]
Biomedical Word Sense Disambiguation presentation [Autosaved]
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguation
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
 
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition ResourceBroad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
 
Draft programme 15 09-2015
Draft programme 15 09-2015Draft programme 15 09-2015
Draft programme 15 09-2015
 
Word sense dissambiguation
Word sense dissambiguationWord sense dissambiguation
Word sense dissambiguation
 
BibleTech2011
BibleTech2011BibleTech2011
BibleTech2011
 
An Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense DisambiguationAn Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense Disambiguation
 
A word sense disambiguation technique for sinhala
A word sense disambiguation technique  for sinhalaA word sense disambiguation technique  for sinhala
A word sense disambiguation technique for sinhala
 
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
 
Graph-based Word Sense Disambiguation
Graph-based Word Sense DisambiguationGraph-based Word Sense Disambiguation
Graph-based Word Sense Disambiguation
 
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...
 
Thesis
ThesisThesis
Thesis
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguation
 
Amharic WSD using WordNet
Amharic WSD using WordNetAmharic WSD using WordNet
Amharic WSD using WordNet
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
 
Zoological nomenclature
Zoological nomenclatureZoological nomenclature
Zoological nomenclature
 

Ähnlich wie Word Sense Disambiguation and Induction

Making sense of word senses: An introduction to word-sense disambiguation and...
Making sense of word senses: An introduction to word-sense disambiguation and...Making sense of word senses: An introduction to word-sense disambiguation and...
Making sense of word senses: An introduction to word-sense disambiguation and...Sebastian Ruder
 
Understanding ASL Grammatical Features and Discourse Mapping
Understanding ASL Grammatical Features and Discourse MappingUnderstanding ASL Grammatical Features and Discourse Mapping
Understanding ASL Grammatical Features and Discourse MappingDoug Stringham
 
DATA641 Lecture 3 - Word meaning.pptx
DATA641 Lecture 3 - Word meaning.pptxDATA641 Lecture 3 - Word meaning.pptx
DATA641 Lecture 3 - Word meaning.pptxDrPraveenPawar
 
Es hora de ayudar template spanish 4
Es hora de ayudar template spanish 4Es hora de ayudar template spanish 4
Es hora de ayudar template spanish 4pasaportealmundo
 

Ähnlich wie Word Sense Disambiguation and Induction (7)

Making sense of word senses: An introduction to word-sense disambiguation and...
Making sense of word senses: An introduction to word-sense disambiguation and...Making sense of word senses: An introduction to word-sense disambiguation and...
Making sense of word senses: An introduction to word-sense disambiguation and...
 
Understanding ASL Grammatical Features and Discourse Mapping
Understanding ASL Grammatical Features and Discourse MappingUnderstanding ASL Grammatical Features and Discourse Mapping
Understanding ASL Grammatical Features and Discourse Mapping
 
Stance indexicalityworkshop
Stance indexicalityworkshopStance indexicalityworkshop
Stance indexicalityworkshop
 
Media analysis 1
Media analysis 1Media analysis 1
Media analysis 1
 
DATA641 Lecture 3 - Word meaning.pptx
DATA641 Lecture 3 - Word meaning.pptxDATA641 Lecture 3 - Word meaning.pptx
DATA641 Lecture 3 - Word meaning.pptx
 
Es hora de ayudar template spanish 4
Es hora de ayudar template spanish 4Es hora de ayudar template spanish 4
Es hora de ayudar template spanish 4
 
Grading scale
Grading scaleGrading scale
Grading scale
 

Mehr von Leon Derczynski

Joint Rumour Stance and Veracity
Joint Rumour Stance and VeracityJoint Rumour Stance and Veracity
Joint Rumour Stance and VeracityLeon Derczynski
 
State of Tools for NLP in Danish: 2018
State of Tools for NLP in Danish: 2018State of Tools for NLP in Danish: 2018
State of Tools for NLP in Danish: 2018Leon Derczynski
 
Handling and Mining Linguistic Variation in UGC
Handling and Mining Linguistic Variation in UGCHandling and Mining Linguistic Variation in UGC
Handling and Mining Linguistic Variation in UGCLeon Derczynski
 
Efficient named entity annotation through pre-empting
Efficient named entity annotation through pre-emptingEfficient named entity annotation through pre-empting
Efficient named entity annotation through pre-emptingLeon Derczynski
 
Leveraging the Power of Social Media
Leveraging the Power of Social MediaLeveraging the Power of Social Media
Leveraging the Power of Social MediaLeon Derczynski
 
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice GuidelinesCorpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice GuidelinesLeon Derczynski
 
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...Leon Derczynski
 
Starting to Process Social Media
Starting to Process Social MediaStarting to Process Social Media
Starting to Process Social MediaLeon Derczynski
 
Christmas Presentation at Aarhus: What I do
Christmas Presentation at Aarhus: What I doChristmas Presentation at Aarhus: What I do
Christmas Presentation at Aarhus: What I doLeon Derczynski
 
Recognising and Interpreting Named Temporal Expressions
Recognising and Interpreting Named Temporal ExpressionsRecognising and Interpreting Named Temporal Expressions
Recognising and Interpreting Named Temporal ExpressionsLeon Derczynski
 
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog TextTwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog TextLeon Derczynski
 
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data
 Twitter Part-of-Speech Tagging for All:  Overcoming Sparse and Noisy Data Twitter Part-of-Speech Tagging for All:  Overcoming Sparse and Noisy Data
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy DataLeon Derczynski
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Leon Derczynski
 
Determining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseDetermining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseLeon Derczynski
 
Microblog-genre noise and its impact on semantic annotation accuracy
Microblog-genre noise and its impact on semantic annotation accuracyMicroblog-genre noise and its impact on semantic annotation accuracy
Microblog-genre noise and its impact on semantic annotation accuracyLeon Derczynski
 
Empirical Validation of Reichenbach’s Tense Framework
Empirical Validation of Reichenbach’s Tense FrameworkEmpirical Validation of Reichenbach’s Tense Framework
Empirical Validation of Reichenbach’s Tense FrameworkLeon Derczynski
 
Towards Context-Aware Search and Analysis on Social Media Data
Towards Context-Aware Search and Analysis on Social Media DataTowards Context-Aware Search and Analysis on Social Media Data
Towards Context-Aware Search and Analysis on Social Media DataLeon Derczynski
 
Determining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseDetermining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseLeon Derczynski
 
TIMEN: An Open Temporal Expression Normalisation Resource
TIMEN: An Open Temporal Expression Normalisation ResourceTIMEN: An Open Temporal Expression Normalisation Resource
TIMEN: An Open Temporal Expression Normalisation ResourceLeon Derczynski
 

Mehr von Leon Derczynski (20)

Joint Rumour Stance and Veracity
Joint Rumour Stance and VeracityJoint Rumour Stance and Veracity
Joint Rumour Stance and Veracity
 
State of Tools for NLP in Danish: 2018
State of Tools for NLP in Danish: 2018State of Tools for NLP in Danish: 2018
State of Tools for NLP in Danish: 2018
 
RumourEval
RumourEvalRumourEval
RumourEval
 
Handling and Mining Linguistic Variation in UGC
Handling and Mining Linguistic Variation in UGCHandling and Mining Linguistic Variation in UGC
Handling and Mining Linguistic Variation in UGC
 
Efficient named entity annotation through pre-empting
Efficient named entity annotation through pre-emptingEfficient named entity annotation through pre-empting
Efficient named entity annotation through pre-empting
 
Leveraging the Power of Social Media
Leveraging the Power of Social MediaLeveraging the Power of Social Media
Leveraging the Power of Social Media
 
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice GuidelinesCorpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
 
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
 
Starting to Process Social Media
Starting to Process Social MediaStarting to Process Social Media
Starting to Process Social Media
 
Christmas Presentation at Aarhus: What I do
Christmas Presentation at Aarhus: What I doChristmas Presentation at Aarhus: What I do
Christmas Presentation at Aarhus: What I do
 
Recognising and Interpreting Named Temporal Expressions
Recognising and Interpreting Named Temporal ExpressionsRecognising and Interpreting Named Temporal Expressions
Recognising and Interpreting Named Temporal Expressions
 
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog TextTwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
 
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data
 Twitter Part-of-Speech Tagging for All:  Overcoming Sparse and Noisy Data Twitter Part-of-Speech Tagging for All:  Overcoming Sparse and Noisy Data
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
 
Determining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseDetermining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in Discourse
 
Microblog-genre noise and its impact on semantic annotation accuracy
Microblog-genre noise and its impact on semantic annotation accuracyMicroblog-genre noise and its impact on semantic annotation accuracy
Microblog-genre noise and its impact on semantic annotation accuracy
 
Empirical Validation of Reichenbach’s Tense Framework
Empirical Validation of Reichenbach’s Tense FrameworkEmpirical Validation of Reichenbach’s Tense Framework
Empirical Validation of Reichenbach’s Tense Framework
 
Towards Context-Aware Search and Analysis on Social Media Data
Towards Context-Aware Search and Analysis on Social Media DataTowards Context-Aware Search and Analysis on Social Media Data
Towards Context-Aware Search and Analysis on Social Media Data
 
Determining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseDetermining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in Discourse
 
TIMEN: An Open Temporal Expression Normalisation Resource
TIMEN: An Open Temporal Expression Normalisation ResourceTIMEN: An Open Temporal Expression Normalisation Resource
TIMEN: An Open Temporal Expression Normalisation Resource
 

Kürzlich hochgeladen

HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxUmeshTimilsina1
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 

Kürzlich hochgeladen (20)

HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 

Word Sense Disambiguation and Induction

  • 1. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Word Sense Disambiguation and Induction Leon Derczynski University of Sheffield 27 January 2011 Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 2. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Origin Originally a course at ESSLLI 2011, Copenhagen by Roberto Navigli and Simone Ponzetto Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 3. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Outline 1 Introduction 2 WSD 3 WSI 4 Evaluation and Issues 5 Wikipedia 6 Summary Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 4. Introduction WSD WSI Evaluation and Issues Wikipedia Summary General Problem Being able to disambiguate words in context is a crucial problem Can potentially help improve many other NLP applications Polysemy is everywhere – our job is to model this Ambiguity is rampant. I saw a man who is 98 years old and can still walk and tell jokes. saw:26 man:11 years:4 old:8 can:5 still:4 walk:10 tell:8 jokes:3 43 929 600 possible senses for this simple sentence. general problem, ambiguity is rampant Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 5. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Word Senses Monosemous words – only one meaning; plant life, internet Polysemous words – more than one meaning; bar, bass A word sense is a commonly-accepted meaning of a word. We are fond of fruit such as the kiwifruit and banana. Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 6. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Enumerative Approach Fixed sense inventory enumerates the range of possible meanings of a word Context is used to select a particular sense chop vegetables with a knife, was stabbed with a knife However, we may want to add senses. Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 7. Introduction WSD WSI Evaluation and Issues Wikipedia Summary WSD Tasks Different representations of senses change the way we think about WSD Lexical sample – disambiguate a restricted set of words All words – disambiguate all content words Cross lingual WSD – disambiguate a target word by labeling it with the appropriate translation in other languages; eg. English coach → German Bus/Linienbus/Omnibus/Reisebus. Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 8. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Representing the Context Text is unstructured, and needs to be made machine-readable. Flat representation (surface features) vs. Structured representation (graphs, trees) Local features: local context of a word usage, e.g. PoS tags and surrounding word forms Topical features: general topic of a sentence or discourse, represented as a bag of words Syntactic features: argument-head relations between target and rest of sentence Semantic features: previously established word senses Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 9. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Knowledge Resources Structured and Unstructured Thesauri, machine-readable dictionaries, semantic networks (WordNet) BabelNet – Babel synsets, with semantic relations (is-a, part-of) Raw corpora Collocation (Web1T) Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 10. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Applications Information extraction – acronym expansion, disambiguate people names, domain-specific IE Information retrieval Machine Translation Semantic web Question answering Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 11. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Approaches Supervised WSD: classification task, hand-labelled data KB WSD: uses knowledge resources, no training Unsupervised: performs WSI Word sense dominance: find predominant sense of a word Domain-driven WSD: use domain information as vectors to compare with senses of w Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 12. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Outline 1 Introduction 2 WSD 3 WSI 4 Evaluation and Issues 5 Wikipedia 6 Summary Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 13. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Supervised WSD Given a set of manually sense-annotated examples (training set), learn a classifier Features for WSD: Bag of words, bigrams, collocations, VP and NP heads, PoS Using WordNet as a sense inventory, SemCor is a readily available source of sense-labelled data Current SotA performance from SVMs Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 14. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Knowledge-based WSD Exploit knowledge resources (dictionaries, thesauri, collocations) to assign senses Lower performance than supervised methods, but wider coverage No need to train or be tuned to a task/domain Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 15. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Gloss Overlap Knowledge-based method proposed by Lesk (1986) Retrieve all sense definitions of target word Compare each sense definition with the definitions of other words in context Choose the sense with the most overlap To disambiguate pine cone; pine: 1. a kind of evergreen tree; 2. to waste away through sorrow. cone: 1. a solid body which narrows to a point; 2. something of this shape; 3. fruit of certain evergreen trees. Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 16. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Lexical Chains Knowledge-based method proposed by Hirst and St Onge (1998) A lexical chain is a sequence of semantically related words in a text Assign scores to senses based on the chain of related words it is in Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 17. Introduction WSD WSI Evaluation and Issues Wikipedia Summary PageRank Knowledge-based method proposed by Agirre and Soroa (2009) Build a graph including all synsets of words in the input text Assign an initial low value to each node in the graph Apply PageRank (Brin and Page) to the graph, and select synsets with the highest PR Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 18. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Knowledge Acquisition Bottleneck WSD needs knowledge! Corpora, dictionaries, semantic networks More knowledge is required to improve the performance of both: Supervised systems – more training data Knowledge based systems – richer networks Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 19. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Minimally Supervised WSD Human supervision is expensive, but required for training examples or a knowledge base Minimally supervised approaches aim to learn classifiers from annotated data with minimal human supervision Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 20. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Bootstrapping Given a set labelled examples L, a set of unlabelled examples U and a classifier c: 1. Choose N examples from U and add them to U ′ 2. Train c on L and label U ′ 3. Select K most confidently labelled instances from U ′ and assign them to L Repeat until U or K is empty Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 21. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Outline 1 Introduction 2 WSD 3 WSI 4 Evaluation and Issues 5 Wikipedia 6 Summary Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 22. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Word Sense Induction Based on the idea that one sense of a word will have similar neighbouring words Follows the idea that the meaning of a word is given by its usage We induce word sense from input text by clustering word occurrences Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 23. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Clustering Unsupervised machine learning for grouping similar objects into groups No a priori input (sense labels) Context clustering: each occurrence of a word is represented as a context vector; cluster vectors into groups Word clustering: cluster words which are semantically similar and thus have a specific meaning Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 24. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Word Clustering Aims to cluster words which are semantically similar Lin (1998) proposes this method: 1. Extract dependency triples from a text corpus John eats a yummy kiwi → (eat subj John), (kiwi obj-of eat), (kiwi det a) ... 2. Define a measure of similarity between two words 3. Use similarity scores to create a similarity tree; start with a root node, and add recursively add children in descending order of similarity. Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 25. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Lin’s approach: example Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 26. Introduction WSD WSI Evaluation and Issues Wikipedia Summary WSI: pros and cons + Actually performs word sense disambiguation + Aims to divide the occurrences of a word into a number of classes - Makes objective evaluation more difficult if not domain-specific Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 27. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Outline 1 Introduction 2 WSD 3 WSI 4 Evaluation and Issues 5 Wikipedia 6 Summary Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 28. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Disambiguation Evaluation Disambiguation is easy to evaluate – we have discrete sense inventories Evaluate with Coverage (answers given), Precision and Recall, and then F1 Accuracy – correct answers / total answers Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 29. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Disambiguation Baselines MFS – Most Frequent Sense Strong baseline - 50-60% accuracy on lexical sample task Doesn’t take into account genre (e.g. star in astrophysics / newswire) Subject to idiosyncracies of corpus Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 30. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Evaluation with gold-standard clustering Given a standard clustering, compare the gold standard and output clustering Can evaluate with set Entropy, Purity Also RandIndex (similar to Jacquard) and F-Score. Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 31. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Discrimination Baselines All-in-one: group all words into one big cluster Random: produce a random set of clusters Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 32. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Pseudowords Discrimination evaluation method Generates new words with artificial ambiguity Select two or more monosemous terms from gold standard data Given all their occurrences in a corpus, replace them with a pseudoword formed by joining the two terms Compare automatic discrimination to gold standard Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 33. Introduction WSD WSI Evaluation and Issues Wikipedia Summary SemEval-2007 Lexical sample and all-words coarse grained WSD Preposition disambiguation Evaluation of WSD on cross-language RI WSI, lexical substitution Top systems reach 88.7% accuracy (on lexical sample) and 82.5% (on all-words) Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 34. Introduction WSD WSI Evaluation and Issues Wikipedia Summary SemEval-2010 Fifth event of its kind Includes specific cross-lingual tasks Combined WSI/WSD task Domain-specific all-words task Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 35. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Issues Representation of word senses: enumerative vs. generative approach Knowledge Acquisition Bottleneck: not enough data! Benefits for AI/NLP applications Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 36. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Alleviating the Knowledge Acquisition Bottleneck Weakly-supervised algorithms, incorporating bootstrapping or active learning Continuing manual efforts – WordNet, Open Mind Word Expert, OntoNotes Automatic enrichment of knowledge resources – collocation and relation triple extraction, BabelNet Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 37. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Future Challenges How can we mine even larger repositories of textual data – e.g. the whole web! – to create huge knowledge repositories? How can we design high performance and scalable algorithms to use this data? Need to decide which kind of word sense are needed for which application Still, need to develop a general representation of word senses Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 38. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Outline 1 Introduction 2 WSD 3 WSI 4 Evaluation and Issues 5 Wikipedia 6 Summary Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 39. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Wikipedia as sense inventory Wikipedia articles provide an inventory of disambiguated word senses and entity references Task: Use their occurrences in texts, i.e. the internal Wikipedia hyperlinks, as named entity and sense annotations The articles’ texts provide a sense annotated corpus Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 40. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Mihalcea (2007) Mihalcea proposes a method for automatically generating sense-tagged data using Wikipedia Rhythm is the arrangement of sounds in time. Meter animates time in regular pulse groupings, called measures or [[bar (music)—bar]]. The nightlife is particularly active around the beachfront promenades because of its many nightclubs and [[bar (establishment)—bars]]. 1. Extract all paragraphs in Wikipedia containing word w 2. Collect all possible labels l2 ..ln for w 3. Map each label l to its WordNet sense s 4. Annotate each occurrence of li |w with its sense s System trained on Wikipedia significantly outperforms MFS and Lesk baselines Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 41. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Knowledge-rich WSD General aim is to relieve knowledge acquisition bottleneck of NLP systems, with WSD as a case study Main ideas: - Extend WordNet with millions of semantic relations (using Wikipedia) - Apply knowledge-based WSD to exploit extended WordNet Results: integration of many, many semantic relations in knowledge-based systems yields performance competitive with SotA supervised approaches Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 42. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Wikification The task of generating hyperlinks to disambiguated Wikipedia concepts Two sub-tasks: automatic keyword extraction, WSD Wikify!1 can perform KW extraction by extracting candidates and then ranking them The system does knowledge-based and data-driven WSD, filtering out annotations that contain disagreements Disambiguate links using relatedness, commonness (prior probability of a sense), and context quality (context terms). 1 Csomai and Mihalcea (2008) Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 43. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Outline 1 Introduction 2 WSD 3 WSI 4 Evaluation and Issues 5 Wikipedia 6 Summary Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction
  • 44. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Questions Thank you. Are there any questions? Leon Derczynski University of Sheffield Word Sense Disambiguation and Induction