Error analysis of Word Sense Disambiguation

•Download as PPTX, PDF•

2 likes•906 views

CLIN-2015 Presentation Word Sense Disambiguation is still an unsolved problem in Natural Language Processing. We claim that most approaches do not model the context correctly, by relying too much on the local context (the words surrounding the word in question), or on the most frequent sense of a word. In order to provide evidence for this claim, we conducted an in-depth analysis of all-words tasks of the competitions that have been organized (Senseval 2&3, Semeval-2007, Semeval-2010, Semeval 2013). We focused on the average error rate per competition and across competitions per part of speech, lemma, relative frequency class, and polysemy class. In addition, we inspected the “difficulty” of a token(word) by calculating the average polysemy of the words in the sentence of a token. Finally, we inspected to what extent systems always chose the most frequent sense. The results from Senseval 2, which are representative of other competitions, showed that the average error rate for monosemous words was 33.3% due to part of speech errors. This number was 71% for multiword and phrasal verbs. In addition, we observe that higher polysemy yields a higher error rate. Moreover, we do not observe a drop in the error rate if there are multiple occurrences of the same lemma, which might indicate that systems rely mostly on the sentence itself. Finally, out of the 799 tokens for which the correct sense was not the most frequent sense, system still assigned the most frequent sense in 84% of the cases. For future work, we plan to develop a strategy in order to determine in which context the predominant sense should be assigned, and more importantly when it should not be assigned. One of the most important parts of this strategy would be to not only determine the meaning of a specific word, but to also know it’s referential meaning. For example, in the case of the lemma ‘winner’, we do not only want to know what ‘winner’ means, but we also want to know what this ‘winner’ won and who this ‘winner’ was.

Presentations & Public Speaking

Error analysis ofWord
Sense Disambiguation
Ruben Izquierdo
Marten Postma
PiekVossen
Izquierdo,PostmaandVossen
VUAmsterdam

Motivation
 Word Sense Disambiguation is still an unsolved problem
2 Izquierdo, Postma and Vossen VU Amsterdam

Error Analysis
 Perform error analysis on previousWSD evaluations to prove
our hypothesis
 Senseval-2: all-words task
 Senseval-3: all-words task
 Semeval2007: all-words task (#17)
 Semeval2010: all-words on specific domain (#17)
 Semeval2013: multilingual all-wordsWSD and entity linking
(#12)
3 Izquierdo, Postma and Vossen VU Amsterdam

Motivation
 Some “propagated” errors
 Errors on monosemous
 Errors because pos-tags
 Multiwords and phrasal verbs
 Little attention has been paid to the real problem
 WSD is not 1 problem but N problems
 Our hypothesis
 Context is not modeled properly in general
 System rely too much on the most frequent sense
4 Izquierdo, Postma and Vossen VU Amsterdam

Monosemous errors
5 Izquierdo, Postma and Vossen VU Amsterdam

Monosemous errors
6 Izquierdo, Postma and Vossen VU Amsterdam
Competition Monosemous Wrong Examples
Senseval2 499 (20.9%) 37.5% gene.n (suppressor_gene.n), chance.a
(chance.n) next.r (next.a)
Senseval3 334 (16.6%) 44.1% Datum.n (data.n) making.n (make.v)
out_of_sight (sight)
Semeval2007 25 (5.5%) 11.1% get_stuck.v, lack.v, write_about.v
Semeval2010 31 (2.2%) 97.9% Tidal_zone.n pine_marten.n roe_deer.n
cordgrass.n
Semeval2013
(lemmas)
348 (21.1%) 1.9% Private_enterprise, developing_country,
narrow_margin

Most Frequent Sense
7 Izquierdo, Postma and Vossen VU Amsterdam

Most Frequent Sense
 When the correct sense is NOT the most frequent sense
 Systems still assign mostly the MFS
 Senseval2
 799 tokens are not MFS
 84% systems still assign the MFS
 Most “failed” words due to MFS bias
 Senseval2, senseval3
 Say.v find.v take.v have.v cell.n church.n
 Semeval2010
 Area.n nature.n connection.n water.n population.n
8 Izquierdo, Postma and Vossen VU Amsterdam

Analysis per PoS-tag
9 Izquierdo, Postma and Vossen VU Amsterdam

Analysis per polysemy class
10 Izquierdo, Postma and Vossen VU Amsterdam
2Senses
Poly. C.
6 15
Low Medium High

Analysis per frequency class
11 Izquierdo, Postma and Vossen VU Amsterdam

Most difficult words
12 Izquierdo, Postma and Vossen VU Amsterdam

Expected vs. Observed
difficulties
 Calculate per sentence
 The “expected” difficulty
 Average polysemy, sentence length, average word length
13 Izquierdo, Postma and Vossen VU Amsterdam

 Calculate per sentence
 The “expected” difficulty
 Average polysemy, sentence length, average word length
14 Izquierdo, Postma and Vossen VU Amsterdam
Expected vs. Observed
difficulties

 Calculate per sentence
 The “expected” difficulty
 Average polysemy, sentence length, average wor length
 The “observed” difficulty
 From the real participant outputs, average error rate
 We should expect:
harder sentences  higher error rate
easier sentences   lower error rate
15 Izquierdo, Postma and Vossen VU Amsterdam
Expected vs. Observed
difficulties

16 Izquierdo, Postma and Vossen VU Amsterdam
Expected vs. Observed
difficulties

17 Izquierdo, Postma and Vossen VU Amsterdam
Expected vs. Observed
difficulties

• The context is not (probably) exploited properly
• Expected “easy” sentences SHOULD show low error rates
• Occurrences of the same word in different contexts have similar error
rate
• The difficulty of a word depends more on its polysemy than on the
context where it appears
18 Izquierdo, Postma and Vossen VU Amsterdam
Expected vs. Observed
difficulties

WSD Corpora
http://github.com/rubenIzquierdo/wsd_corpora
19 Izquierdo, Postma and Vossen VU Amsterdam

WSD Corpora
20 Izquierdo, Postma and Vossen VU Amsterdam

System Outputs
https://github.com/rubenIzquierdo/sval_systems
21 Izquierdo, Postma and Vossen VU Amsterdam

System Outputs
22 Izquierdo, Postma and Vossen VU Amsterdam

Error analysis of
Word Sense Disambiguation
Ruben Izquierdo
Marten Postma
PiekVossen
ruben.izquierdobevia@vu.nl
http://github.com/rubenIzquierdo/wsd_corpora
http://github.com/rubenIzquierdo/sval_systems
23

Analysis per PoS-tag
24 Izquierdo, Postma and Vossen VU Amsterdam

Viewers also liked

Draft programme 15 09-2015

predim

A word sense disambiguation technique for sinhala

Vijayindu Gamage

Graph-based Word Sense Disambiguation

Elena-Oana Tabaranu

COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...

Pierpaolo Basile

Usage of word sense disambiguation in concept identification in ontology cons...

Innovation Quotient Pvt Ltd

Thesis

Shrutiranjan Satpathy

Similarity based methods for word sense disambiguation

vini89

Words can have more than one distinct meaning and many words can be interpreted in multiple ways depending on the context in which they occur. The process of automatically identifying the meaning of a polysemous word in a sentence is a fundamental task in Natural Language Processing (NLP). This phenomenon poses challenges to Natural Language Processing systems. There have been many efforts on word sense disambiguation for English; however, the amount of efforts for Amharic is very little. Many natural language processing applications, such as Machine Translation, Information Retrieval, Question Answering, and Information Extraction, require this task, which occurs at the semantic level. In this thesis, a knowledge-based word sense disambiguation method that employs Amharic WordNet is developed. Knowledge-based Amharic WSD extracts knowledge from word definitions and relations among words and senses. The proposed system consists of preprocessing, morphological analysis and disambiguation components besides Amharic WordNet database. Preprocessing is used to prepare the input sentence for morphological analysis and morphological analysis is used to reduce various forms of a word to a single root or stem word. Amharic WordNet contains words along with its different meanings, synsets and semantic relations with in concepts. Finally, the disambiguation component is used to identify the ambiguous words and assign the appropriate sense of ambiguous words in a sentence using Amharic WordNet by using sense overlap and related words. We have evaluated the knowledge-based Amharic word sense disambiguation using Amharic WordNet system by conducting two experiments. The first one is evaluating the effect of Amharic WordNet with and without morphological analyzer and the second one is determining an optimal windows size for Amharic WSD. For Amharic WordNet with morphological analyzer and Amharic WordNet without morphological analyzer we have achieved an accuracy of 57.5% and 80%, respectively. In the second experiment, we have found that two-word window on each side of the ambiguous word is enough for Amharic WSD. The test results have shown that the proposed WSD methods have performed better than previous Amharic WSD methods. Keywords: Natural Language Processing, Amharic WordNet, Word Sense Disambiguation, Knowledge Based Approach, Lesk Algorithm

Amharic WSD using WordNet

Seid Hassen

Zoological nomenclature

Manideep Raj

This thesis studies weakly supervised learning for information extraction methods in two settings: (1) unimodal weakly supervised learning, where annotated texts are augmented with a large corpus of unlabeled texts and (2) multimodal weakly supervised learning, where images or videos are augmented with texts that describe the content of these images or videos. In the unimodal setting we find that traditional semi-supervised methods based on generative Bayesian models are not suitable for the textual domain because of the violation of the assumptions made by these models. We develop an unsupervised model, the latent words language model (LWLM), that learns accurate word similarities from a large corpus of unlabeled texts. We show that this model is a good model of natural language, offering better predictive quality of unseen texts than previously proposed state-of-the-art language models. In addition, the learned word similarities can be used successfully to automatically expand words in the annotated training with synonyms, where the correct synonyms are chosen depending on the context. We show that this approach improves classifiers for word sense disambiguation and semantic role labeling. The second part of this thesis discusses weakly supervised learning in a multimodal setting. We develop information extraction methods to information from texts that describe an image or video, and use this extracted information as a weak annotation of the image/video. A first model for the prediction of entities in an image uses two novel measures: The salience measure captures the importance of an entity, depending on the position of that entity in the discourse and in the sentence. The visualness measure captures the probability that an entity can be perceived visually, extracted from the WordNet database. We show that combining these measures results in an accurate prediction of the entities present in the image. We then discuss how this model can be used to learn a mapping from names in the text to faces in the image, and to retrieve images of a certain entity. We then turn to the automatic annotation of video. We develop a model that annotates a video with the visual verbs and their visual arguments, i.e. actions and arguments that can be observed in the video. The annotations of this system are successfully used to train a classifier that detects and classifies actions in the video. A second system annotates every scene in the video with the location of that scene. This system comprises a multimodal scene cut classifier that combines information from the text and the video, an IE algorithm that extracts possible locations from the text and a novel way to propagate location labels from one scene to another, depending the similarity of the scenes in the textual and visual domain.

PhD defense Koen Deschacht

guest1add48f

Word-sense disambiguation

Mariposa Speranza

Biomedical Word Sense Disambiguation presentation [Autosaved]

akm sabbir

Ontology-Based Word Sense Disambiguation for Scientific Literature

eXascale Infolab

CMSC 723: Computational Linguistics I

butest

Semantic annotation of biomedical data

INRAE (MISTEA) and University of Montpellier (LIRMM)

presentation on Rotavator

Er.Shailendra kumar

Babelfy: Entity Linking meets Word Sense Disambiguation.

Grupo HULAT

Words - Morphology Presentation- Dr. Shadia Y. Banjar

Dr. Shadia Banjar

Logic

Aminah May Estremera

Sifting Social Data: Word Sense Disambiguation Using Machine Learning

Stuart Shulman

Viewers also liked (20)

Draft programme 15 09-2015

A word sense disambiguation technique for sinhala

Graph-based Word Sense Disambiguation

COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...

Usage of word sense disambiguation in concept identification in ontology cons...

Thesis

Similarity based methods for word sense disambiguation

Amharic WSD using WordNet

Zoological nomenclature

PhD defense Koen Deschacht

Word-sense disambiguation

Biomedical Word Sense Disambiguation presentation [Autosaved]

Ontology-Based Word Sense Disambiguation for Scientific Literature

CMSC 723: Computational Linguistics I

Semantic annotation of biomedical data

presentation on Rotavator

Babelfy: Entity Linking meets Word Sense Disambiguation.

Words - Morphology Presentation- Dr. Shadia Y. Banjar

Logic

Sifting Social Data: Word Sense Disambiguation Using Machine Learning

More from Rubén Izquierdo Beviá

In this presentation will explore the closed world of language as a system of word relations. Words and texts are highly ambiguous, but we believe the complete scope and complexity of this ambiguity is not well defined yet. The goal is to more properly define the problem and find the optimal solution given the vast volumes of textual data that are available. Most of the WSD systems are not tacking properly the problem and the context is not being modelled in a proper way. Besides to this, lately WSD has been changed from a purely lexical approach (static view) to a reference approach (dynamic view). Considering these two facts, the role of the background and discourse information is crucial. To prove our hypothesis about what WSD systems are not facing properly, we performed an error analysis on the participant outputs of the SensEval/SemEval WSD competitions. Interesting and surprising conclusions came out of this analysis. Finally, our participation on the last SemEval-2015 task 13: Multilingual All-Words WSD and Entity Linking. In our system we implement our ideas about using background information to perform WSD.

ULM-1 Understanding Languages by Machines: The borders of Ambiguity

Rubén Izquierdo Beviá

DutchSemCor workshop: Domain classification and WSD systems

Rubén Izquierdo Beviá

RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus

Rubén Izquierdo Beviá

In this paper we present an approach to Word Sense Disambiguation based on Topic Modeling (LDA). Our approach consists of two different steps, where first a binary classifier is applied to decide whether the most frequent sense applies or not, and then another classifier deals with the non most frequent sense cases. An exhaustive evaluation is performed on the Spanish corpus Ancora, to analyze the performance of our two-step system and the impact of the context and the different parameters in the system. Our best experiment reaches an accuracy of 74.53, which is 6 points over the highest baseline. All the software developed for these experiments has been made freely available, to enable reproducibility and allow the re-usage of the software.

Topic modeling and WSD on the Ancora corpus

Rubén Izquierdo Beviá

Information Extraction

Rubén Izquierdo Beviá

Juan Calvino y el Calvinismo

Rubén Izquierdo Beviá

KafNafParserPy: a python library for parsing/creating KAF and NAF files

Rubén Izquierdo Beviá

CLTL python course: Object Oriented Programming (3/3)

Rubén Izquierdo Beviá

CLTL python course: Object Oriented Programming (2/3)

Rubén Izquierdo Beviá

CLTL python course: Object Oriented Programming (1/3)

Rubén Izquierdo Beviá

CLTL Software and Web Services

Rubén Izquierdo Beviá

Thesis presentation (WSD and Semantic Classes)

Rubén Izquierdo Beviá

ULM1 - The borders of Ambiguity

Rubén Izquierdo Beviá

CLTL: Description of web services and sofware. Nijmegen 2013

Rubén Izquierdo Beviá

CLTL presentation: training an opinion mining system from KAF files using CRF

Rubén Izquierdo Beviá

CLIN 2012: DutchSemCor Building a semantically annotated corpus for Dutch

Rubén Izquierdo Beviá

RANLP 2013: DutchSemcor in quest of the ideal corpus

Rubén Izquierdo Beviá

More from Rubén Izquierdo Beviá (17)

ULM-1 Understanding Languages by Machines: The borders of Ambiguity

DutchSemCor workshop: Domain classification and WSD systems

RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus

Topic modeling and WSD on the Ancora corpus

Information Extraction

Juan Calvino y el Calvinismo

KafNafParserPy: a python library for parsing/creating KAF and NAF files

CLTL python course: Object Oriented Programming (3/3)

CLTL python course: Object Oriented Programming (2/3)

CLTL python course: Object Oriented Programming (1/3)

CLTL Software and Web Services

Thesis presentation (WSD and Semantic Classes)

ULM1 - The borders of Ambiguity

CLTL: Description of web services and sofware. Nijmegen 2013

CLTL presentation: training an opinion mining system from KAF files using CRF

CLIN 2012: DutchSemCor Building a semantically annotated corpus for Dutch

RANLP 2013: DutchSemcor in quest of the ideal corpus

Recently uploaded

Dreaming Marissa Sánchez Music Video Treatment

nswingard

Yesterday in Lagos, I had the honour of delivering a lecture titled “If this Giant Must Walk; Manifesto for a New Nigeria“ at the Inaugural Memorial Lecture of Prince Emeka Obasi, founder and publisher of Business Hallmark and the inspirational figure behind the Public Policy Research and Analysis Centre - promoters of the revered Zik Leadership and Governance Awards. My lecture focused on the challenges of nation-building in Nigeria and how we can approach them in a way that promotes progress and unity. I discussed the many sources of concern about our country’s future prospects, including violent conflicts, revisionist contestations of the Amalgamation Act of 1914, discontent with the Nigerian economy, and dysfunctions in the federal system. Central to my lecture was the call to address these challenges by crafting a new manifesto for Nigeria. This manifesto, I proposed, should champion integrity, compassion, character, competence, and commitment to national unity and progress within a framework of democratic governance and cultural diversity. I firmly believe that by doing so, we can guide Nigeria to stride forward with pride and purpose. I want to thank the Board and Management of the Public Policy Forum for their kind invitation to speak at this important event and for their commitment to promoting public policy research and analysis in Nigeria. My gratitude also to the late Prince Emeka Obasi, a true inspiration and a builder of bridges across divides, for his contributions to the country. My thoughts are with his family and loved ones.

If this Giant Must Walk: A Manifesto for a New Nigeria

Kayode Fayemi

I had the honour of presenting my reflections on the autobiography of Professor Isaac Folorunso Adewole, titled "Uncommon Grace." As someone deeply invested in documentation and history, I found Professor Adewole's decision to narrate his journey from humble beginnings to occupying one of the highest offices in the land both inspiring and invaluable. In this eloquently written memoir, Professor Adewole provides a comprehensive account of his life, from his ancestral roots to his time as Minister of Health in Nigeria. The unique aspect of this autobiography is that he portrayed himself authentically without taking the help of third-party narratives, which is often seen in accounts of high-ranking officials. His upbringing was greatly influenced by his father's commitment to education. He became a prominent figure in advocating for the rights of the underprivileged through trade unionism. His story is one of unwavering determination, resilience, and faith. His experiences, including both successes and struggles, provide priceless lessons on leadership, perseverance, and the alignment of personal values with public service. While reading "Uncommon Grace," I was struck by the deep leadership lessons that are embedded within its chapters. Professor Adewole stresses the importance of inclusivity, servant leadership, and planning, which are all highly relevant in today's complex world. His commitment to accountability, as well as his primary responsibility as a researcher, serves as a guiding light for aspiring leaders across various disciplines. During his tenure as the Vice Chancellor of the University of Ibadan, he led with visionary leadership and transformative impact. His accomplishments have been meticulously documented in the book, which can serve as a blueprint for rejuvenating institutions and promoting academic excellence. In the latter part of his autobiography, Professor Adewole shares his experiences as a Minister, detailing the challenges he faced while serving the public with integrity and courage. His reflections on the complexities of public service, coupled with his commitment to the well-being of the nation, offer practical insights for policymakers and citizens alike. I have carefully read "Uncommon Grace" and it is more than just a memoir. It is a timeless book that is hard to put down once you start reading. While intellectuals may continue to debate whether uncommon grace was made possible by uncommon preparation or the other way around, I applaud Professor Adewole for sharing his ideas, knowledge, and experience with the public. I highly recommend this book to everyone.

Uncommon Grace The Autobiography of Isaac Folorunso

Kayode Fayemi

Dreaming Music Video Treatment _ Project & Portfolio III

NhPhngng3

• For a full set of 390+ questions. Go to https://skillcertpro.com/product/aws-data-engineer-associate-dea-c01-exam-questions/ • SkillCertPro offers detailed explanations to each question which helps to understand the concepts better. • It is recommended to score above 85% in SkillCertPro exams before attempting a real exam. • SkillCertPro updates exam questions every 2 weeks. • You will get life time access and life time free updates • SkillCertPro assures 100% pass guarantee in first attempt.

AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf

SkillCertProExams

Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...

amilabibi1

Thirunelveli call girls Tamil escorts 7877702510

Vipesco

Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...

Pooja Nehwal

The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf

Senaatti-kiinteistöt

Report Writing Webinar Training

KylaCullinane

ICT role in 21st century education and it's challenges.pdf

Islamia university of Rahim Yar khan campus

My Presentation "In Your Hands" by Halle Bailey

hlharris

Causes of poverty in France presentation.pptx

CamilleBoulbin1

(Vivek)Call Us, 8448380779,Call girls in Delhi NCr – We Offer best in class call girls. escort Service At Affordable Price At low Rate with Space Night 8000 We Are One Of The Oldest Escort and Call girls Agencies in Delhi. You Will Find That Our Female Escorts Are Full Of Fun, Sexy And They Would Love Enjoy Your Company. We Have A Fantastic Selection Of Escort Ladies Available For In-Calls As Well As Out-Calls. Our Escorts Are Not Only Beautiful But All Have Great Personalities Making Them The Perfect Companion For Any Occasion. In-Call:- You Can Come At Our Place in Delhi Our place Which Is Very Clean Hygienic 100% safe Accommodation. Out-Call:- You have To Come Pick The Girl From My Place We Are Also Provide Door Step Services (Delhi Ncr, Noida, Gurgaon, Faridabad, Ghaziabad Note:- Pic Collectors Time Passers Bargainers Stay Away As We Respect The Value For Your Money Time And Expect The Same From You Hygienic:- Full Ac room And Clean Rooms Available In Hotel 24 * 7 Hourly In Delhi NCR More Details, With WhatsApp Number, +91-8448380779

Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...

Delhi Call girls

Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified

Delhi Call girls

Digital collaboration with Microsoft 365 as extension of Drupal

Fabian de Rijk

lONG QUESTION ANSWER PAKISTAN STUDIES10.

lodhisaajjda

Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx

raffaeleoman

Recently uploaded (18)

Dreaming Marissa Sánchez Music Video Treatment

If this Giant Must Walk: A Manifesto for a New Nigeria

Uncommon Grace The Autobiography of Isaac Folorunso

Dreaming Music Video Treatment _ Project & Portfolio III

AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf

Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...

Thirunelveli call girls Tamil escorts 7877702510

Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...

The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf

Report Writing Webinar Training

ICT role in 21st century education and it's challenges.pdf

My Presentation "In Your Hands" by Halle Bailey

Causes of poverty in France presentation.pptx

Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...

Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified

Digital collaboration with Microsoft 365 as extension of Drupal

lONG QUESTION ANSWER PAKISTAN STUDIES10.

Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx

Error analysis of Word Sense Disambiguation

1. Error analysis ofWord Sense Disambiguation Ruben Izquierdo Marten Postma PiekVossen Izquierdo,PostmaandVossen VUAmsterdam

2. Motivation  Word Sense Disambiguation is still an unsolved problem 2 Izquierdo, Postma and Vossen VU Amsterdam

3. Error Analysis  Perform error analysis on previousWSD evaluations to prove our hypothesis  Senseval-2: all-words task  Senseval-3: all-words task  Semeval2007: all-words task (#17)  Semeval2010: all-words on specific domain (#17)  Semeval2013: multilingual all-wordsWSD and entity linking (#12) 3 Izquierdo, Postma and Vossen VU Amsterdam

4. Motivation  Some “propagated” errors  Errors on monosemous  Errors because pos-tags  Multiwords and phrasal verbs  Little attention has been paid to the real problem  WSD is not 1 problem but N problems  Our hypothesis  Context is not modeled properly in general  System rely too much on the most frequent sense 4 Izquierdo, Postma and Vossen VU Amsterdam

5. Monosemous errors 5 Izquierdo, Postma and Vossen VU Amsterdam

6. Monosemous errors 6 Izquierdo, Postma and Vossen VU Amsterdam Competition Monosemous Wrong Examples Senseval2 499 (20.9%) 37.5% gene.n (suppressor_gene.n), chance.a (chance.n) next.r (next.a) Senseval3 334 (16.6%) 44.1% Datum.n (data.n) making.n (make.v) out_of_sight (sight) Semeval2007 25 (5.5%) 11.1% get_stuck.v, lack.v, write_about.v Semeval2010 31 (2.2%) 97.9% Tidal_zone.n pine_marten.n roe_deer.n cordgrass.n Semeval2013 (lemmas) 348 (21.1%) 1.9% Private_enterprise, developing_country, narrow_margin

7. Most Frequent Sense 7 Izquierdo, Postma and Vossen VU Amsterdam

8. Most Frequent Sense  When the correct sense is NOT the most frequent sense  Systems still assign mostly the MFS  Senseval2  799 tokens are not MFS  84% systems still assign the MFS  Most “failed” words due to MFS bias  Senseval2, senseval3  Say.v find.v take.v have.v cell.n church.n  Semeval2010  Area.n nature.n connection.n water.n population.n 8 Izquierdo, Postma and Vossen VU Amsterdam

9. Analysis per PoS-tag 9 Izquierdo, Postma and Vossen VU Amsterdam

10. Analysis per polysemy class 10 Izquierdo, Postma and Vossen VU Amsterdam 2Senses Poly. C. 6 15 Low Medium High

11. Analysis per frequency class 11 Izquierdo, Postma and Vossen VU Amsterdam

12. Most difficult words 12 Izquierdo, Postma and Vossen VU Amsterdam

13. Expected vs. Observed difficulties  Calculate per sentence  The “expected” difficulty  Average polysemy, sentence length, average word length 13 Izquierdo, Postma and Vossen VU Amsterdam

14.  Calculate per sentence  The “expected” difficulty  Average polysemy, sentence length, average word length 14 Izquierdo, Postma and Vossen VU Amsterdam Expected vs. Observed difficulties

15.  Calculate per sentence  The “expected” difficulty  Average polysemy, sentence length, average wor length  The “observed” difficulty  From the real participant outputs, average error rate  We should expect: harder sentences  higher error rate easier sentences   lower error rate 15 Izquierdo, Postma and Vossen VU Amsterdam Expected vs. Observed difficulties

16. 16 Izquierdo, Postma and Vossen VU Amsterdam Expected vs. Observed difficulties

17. 17 Izquierdo, Postma and Vossen VU Amsterdam Expected vs. Observed difficulties

18. • The context is not (probably) exploited properly • Expected “easy” sentences SHOULD show low error rates • Occurrences of the same word in different contexts have similar error rate • The difficulty of a word depends more on its polysemy than on the context where it appears 18 Izquierdo, Postma and Vossen VU Amsterdam Expected vs. Observed difficulties

19. WSD Corpora http://github.com/rubenIzquierdo/wsd_corpora 19 Izquierdo, Postma and Vossen VU Amsterdam

20. WSD Corpora 20 Izquierdo, Postma and Vossen VU Amsterdam

21. System Outputs https://github.com/rubenIzquierdo/sval_systems 21 Izquierdo, Postma and Vossen VU Amsterdam

22. System Outputs 22 Izquierdo, Postma and Vossen VU Amsterdam

23. Error analysis of Word Sense Disambiguation Ruben Izquierdo Marten Postma PiekVossen ruben.izquierdobevia@vu.nl http://github.com/rubenIzquierdo/wsd_corpora http://github.com/rubenIzquierdo/sval_systems 23

24. Analysis per PoS-tag 24 Izquierdo, Postma and Vossen VU Amsterdam

Editor's Notes

Relative freq (norvig method) <0.01  low 0.01 -= 0.05  medium > 0.05 high

Error analysis of Word Sense Disambiguation

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

More from Rubén Izquierdo Beviá

More from Rubén Izquierdo Beviá (17)

Recently uploaded

Recently uploaded (18)

Error analysis of Word Sense Disambiguation

Editor's Notes