SlideShare ist ein Scribd-Unternehmen logo
1 von 54
Word Sense Disambiguation CMSC 723: Computational Linguistics I ― Session #11 Jimmy Lin The iSchool University of Maryland Wednesday, November 11, 2009 Material drawn from slides by Saif Mohammad and Bonnie Dorr
Progression of the Course Words Finite-state morphology Part-of-speech tagging (TBL + HMM) Structure CFGs + parsing (CKY, Earley) N-gram language models Meaning!
Today’s Agenda Word sense disambiguation Beyond lexical semantics Semantic attachments to syntax Shallow semantics: PropBank
Word Sense Disambiguation
Recap: Word Sense From WordNet: Noun {pipe, tobacco pipe} (a tube with a small bowl at one end; used for smoking tobacco)  {pipe, pipage, piping} (a long tube made of metal or plastic that is used to carry water or oil or gas etc.)  {pipe, tube} (a hollow cylindrical shape)  {pipe} (a tubular wind instrument)  {organ pipe, pipe, pipework} (the flues and stops on a pipe organ)  Verb {shriek, shrill, pipe up, pipe} (utter a shrill cry)  {pipe} (transport by pipeline) “pipe oil, water, and gas into the desert” {pipe} (play on a pipe) “pipe a tune” {pipe} (trim with piping) “pipe the skirt”
Word Sense Disambiguation Task: automatically select the correct sense of a word Lexical sample All-words Theoretically useful for many applications: Semantic similarity (remember from last time?) Information retrieval Machine translation … Solution in search of a problem? Why?
How big is the problem? Most words in English have only one sense 62% in Longman’s Dictionary of Contemporary English 79% in WordNet But the others tend to have several senses Average of 3.83 in LDOCE Average of 2.96 in WordNet Ambiguous words are more frequently used In the British National Corpus, 84% of instances have more than one sense Some senses are more frequent than others
Ground Truth Which sense inventory do we use? Issues there? Application specificity?
Corpora Lexical sample line-hard-serve corpus (4k sense-tagged examples) interest corpus (2,369 sense-tagged examples) …  All-words SemCor (234k words, subset of Brown Corpus) Senseval-3 (2081 tagged content words from 5k total words) … Observations about the size?
Evaluation Intrinsic Measure accuracy of sense selection wrt ground truth Extrinsic Integrate WSD as part of a bigger end-to-end system, e.g., machine translation or information retrieval Compare WSD
Baseline + Upper Bound Baseline: most frequent sense Equivalent to “take first sense” in WordNet Does surprisingly well! Upper bound: Fine-grained WordNet sense: 75-80% human agreement Coarser-grained inventories: 90% human agreement possible What does this mean? 62% accuracy in this case!
WSD Approaches Depending on use of manually created knowledge sources Knowledge-lean Knowledge-rich Depending on use of labeled data Supervised Semi- or minimally supervised Unsupervised
Lesk’s Algorithm Intuition: note word overlap between context and dictionary entries Unsupervised, but knowledge rich The bank can guarantee deposits will eventually cover future tuition costs because it invests in adjustable-rate mortgage securities.   WordNet
Lesk’s Algorithm Simplest implementation: Count overlapping content words between glosses and context Lots of variants: Include the examples in dictionary definitions Include hypernyms and hyponyms Give more weight to larger overlaps (e.g., bigrams) Give extra weight to infrequent words (e.g., idf weighting) … Works reasonably well!
Supervised WSD: NLP meets ML WSD as a supervised classification task Train a separate classifier for each word Three components of a machine learning problem: Training data (corpora) Representations (features) Learning method (algorithm, model)
Supervised Classification Training Testing training data unlabeled  document ? label1 label2 label3 label4 Representation Function label1? label2? Classifier supervised machine  learning algorithm label3? label4?
Three Laws of Machine Learning Thou shalt not mingle training data with test data Thou shalt not mingle training data with test data Thou shalt not mingle training data with test data But what do you do if you need more test data?
Features Possible features POS and surface form of the word itself Surrounding words and POS tag Positional information of surrounding words and POS tags Same as above, but with n-grams Grammatical information … Richness of the features? Richer features = ML algorithm does less of the work More impoverished features = ML algorithm does more of the work
Classifiers Once we cast the WSD problem as supervised classification, many learning techniques are possible: Naïve Bayes (the thing to try first) Decision lists Decision trees MaxEnt Support vector machines Nearest neighbor methods …
Classifiers Tradeoffs Which classifier should I use? It depends: Number of features Types of features Number of possible values for a feature Noise … General advice: Start with Naïve Bayes Use decision trees/lists if you want to understand what the classifier is doing SVMs often give state of the art performance MaxEnt methods also work well
Naïve Bayes Pick the sense that is most probable given the context Context represented by feature vector By Bayes’ Theorem: Problem: data sparsity! We can ignore this term… why?
The “Naïve” Part Feature vectors are too sparse to estimate directly: So… assume features are conditionally independent given the word sense This is naïve because? Putting everything together:
Naïve Bayes: Training How do we estimate the probability distributions? Maximum-Likelihood Estimates (MLE): What else do we need to do? Well, how well does it work? (later…)
Decision List Ordered list of tests (equivalent to “case” statement): Example decision list, discriminating between bass (fish) and bass (music):
Building Decision Lists Simple algorithm: Compute how discriminative each feature is: Create ordered list of tests from these values Limitation? How do you build n-way classifiers from binary classifiers? One vs. rest (sequential vs. parallel) Another learning problem Well, how well does it work? (later…)
Decision Trees Instead of a list, imagine a tree… fish in k words no yes striped bass FISH no yes guitar in k words FISH no yes MUSIC …
Using Decision Trees Given an instance (= list of feature values) Start at the root At each interior node, check feature value Follow corresponding branch based on the test When a leaf node is reached, return its category Decision tree material drawn from slides by Ed Loper
Building Decision Trees Basic idea: build tree top down, recursively partitioning the training data at each step At each node, try to split the training data on a feature (could be binary or otherwise) What features should we split on? Small decision tree desired Pick the feature that gives the most information about the category Example: 20 questions I’m thinking of a number from 1 to 1,000 You can ask any yes no question What question would you ask?
Evaluating Splits via Entropy Entropy of a set of events E: Where P(c) is the probability that an event in E has category c How much information does a feature give us about the category (sense)? H(E) = entropy of event set E H(E|f) = expected entropy of event set E once we know the value of feature f Information Gain: G(E, f) = H(E) – H(E|f) = amount of new information provided by feature f Split on feature that maximizes information gain Well, how well does it work? (later…)
WSD Accuracy Generally: Naïve Bayes provides a reasonable baseline: ~70% Decision lists and decision trees slightly lower State of the art is slightly higher However: Accuracy depends on actual word, sense inventory, amount of training data, number of features, etc. Remember caveat about baseline and upper bound
Minimally Supervised WSD But annotations are expensive! “Bootstrapping” or co-training (Yarowsky 1995) Start with (small) seed, learn decision list Use decision list to label rest of corpus Retain “confident” labels, treat as annotated data to learn new decision list Repeat… Heuristics (derived from observation): One sense per discourse One sense per collocation
One Sense per Discourse A word tends to preserve its meaning across all its occurrences in a  given discourse Evaluation:  8 words with two-way ambiguity, e.g. plant, crane, etc. 98% of the two-word occurrences in the same discourse carry the same meaning The grain of salt: accuracy depends on granularity Performance of “one sense per discourse” measured on SemCor is approximately 70% Slide by Mihalcea and Pedersen
One Sense per Collocation A word tends to preserve its meaning when used in the same collocation Strong for adjacent collocations Weaker as the distance between words increases Evaluation: 97% precision on words with two-way ambiguity Again, accuracy depends on granularity: 70% precision on SemCor words Slide by Mihalcea and Pedersen
Yarowsky’s Method: Example Disambiguating plant (industrial sense) vs. plant (living thing sense) Think of seed features for each sense Industrial sense: co-occurring with “manufacturing” Living thing sense: co-occurring with “life” Use “one sense per collocation” to build initial decision list classifier Treat results as annotated data, train new decision list classifier, iterate…
Yarowsky’s Method: Stopping Stop when:  Error on training data is less than a threshold No more training data is covered Use final decision list for WSD
Yarowsky’s Method: Discussion Advantages:  Accuracy is about as good as a supervised algorithm Bootstrapping: far less manual effort Disadvantages:  Seeds may be tricky to construct Works only for coarse-grained sense distinctions Snowballing error with co-training Recent extension: now apply this to the web!
WSD with Parallel Text But annotations are expensive! What’s the “proper” sense inventory? How fine or coarse grained? Application specific? Observation: multiple senses translate to different words in other languages! A “bill” in English may be a “pico” (bird jaw) in or a “cuenta” (invoice) in Spanish Use the foreign language as the sense inventory! Added bonus: annotations for free! (Byproduct of word-alignment process in machine translation)
Beyond Lexical Semantics
Syntax-Semantics Pipeline Example: FOPL
Semantic Attachments Basic idea: Associate -expressions with lexical items At branching node, apply semantics of one child to another (based on synctatic rule) Refresher in -calculus…
Augmenting Syntactic Rules
Semantic Analysis: Example
Complexities Oh, there are many… Classic problem: quantifier scoping Every restaurant has a menu Issues with this style of semantic analysis?
Semantics in NLP Today Can be characterized as “shallow semantics” Verbs denote events Represent as “frames” Nouns (in general) participate in events Types of event participants = “slots” or “roles” Event participants themselves = “slot fillers” Depending on the linguistic theory, roles may have special names: agent, theme, etc. Semantic analysis: semantic role labeling Automatically identify the event type (i.e., frame) Automatically identify event participants and the role that each plays (i.e., label the semantic role)
What works in NLP? POS-annotated corpora Tree-annotated corpora: Penn Treebank Role-annotated corpora: Proposition Bank (PropBank)
PropBank: Two Examples agree.01 Arg0: Agreer Arg1: Proposition Arg2: Other entity agreeing Example: [Arg0 John] agrees [Arg2 with Mary] [Arg1 on everything] fall.01 Arg1: Logical subject, patient, thing falling Arg2: Extent, amount fallen Arg3: Start point Arg4: End point Example: [Arg1 Sales] fell [Arg4 to $251.2 million] [Arg3 from $278.7 million]
How do we do it? Short answer: supervised machine learning One approach: classification of each tree constituent Features can be words, phrase type, linear position, tree position, etc. Apply standard machine learning algorithms
Recap of Today’s Topics Word sense disambiguation Beyond lexical semantics Semantic attachments to syntax Shallow semantics: PropBank
The Complete Picture Phonology Morphology Syntax Semantics Reasoning Speech Recognition Morphological Analysis Parsing Semantic Analysis Reasoning, Planning Speech Synthesis Morphological Realization Syntactic Realization Utterance Planning
The Home Stretch Next week: MapReduce and large-data processing No classes Thanksgiving week! December: two guest lectures by Ken Church

Weitere ähnliche Inhalte

Was ist angesagt?

Towards a theory of semantic communication
Towards a theory of semantic communicationTowards a theory of semantic communication
Towards a theory of semantic communication
Jie Bao
 
Semantic information theory in 20 minutes
Semantic information theory in 20 minutesSemantic information theory in 20 minutes
Semantic information theory in 20 minutes
Jie Bao
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2
Viral Gupta
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
Bhaskar Mitra
 
Document Classification Using KNN with Fuzzy Bags of Word Representation
Document Classification Using KNN with Fuzzy Bags of Word RepresentationDocument Classification Using KNN with Fuzzy Bags of Word Representation
Document Classification Using KNN with Fuzzy Bags of Word Representation
suthi
 

Was ist angesagt? (20)

Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
The Duet model
The Duet modelThe Duet model
The Duet model
 
Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vec
 
Word Embedding In IR
Word Embedding In IRWord Embedding In IR
Word Embedding In IR
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document Ranking
 
Towards a theory of semantic communication
Towards a theory of semantic communicationTowards a theory of semantic communication
Towards a theory of semantic communication
 
Semantic information theory in 20 minutes
Semantic information theory in 20 minutesSemantic information theory in 20 minutes
Semantic information theory in 20 minutes
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
 
Abigail See - 2017 - Get To The Point: Summarization with Pointer-Generator N...
Abigail See - 2017 - Get To The Point: Summarization with Pointer-Generator N...Abigail See - 2017 - Get To The Point: Summarization with Pointer-Generator N...
Abigail See - 2017 - Get To The Point: Summarization with Pointer-Generator N...
 
Document Classification Using KNN with Fuzzy Bags of Word Representation
Document Classification Using KNN with Fuzzy Bags of Word RepresentationDocument Classification Using KNN with Fuzzy Bags of Word Representation
Document Classification Using KNN with Fuzzy Bags of Word Representation
 
Neural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progressNeural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progress
 
Steganography
SteganographySteganography
Steganography
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modeling
 
Cns 1
Cns 1Cns 1
Cns 1
 
Abstractive Text Summarization
Abstractive Text SummarizationAbstractive Text Summarization
Abstractive Text Summarization
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
 
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
 
Topics Modeling
Topics ModelingTopics Modeling
Topics Modeling
 

Andere mochten auch

An Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense DisambiguationAn Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense Disambiguation
Surabhi Verma
 
BibleTech2011
BibleTech2011BibleTech2011
BibleTech2011
Andi Wu
 
A word sense disambiguation technique for sinhala
A word sense disambiguation technique  for sinhalaA word sense disambiguation technique  for sinhala
A word sense disambiguation technique for sinhala
Vijayindu Gamage
 
Amharic WSD using WordNet
Amharic WSD using WordNetAmharic WSD using WordNet
Amharic WSD using WordNet
Seid Hassen
 
PhD defense Koen Deschacht
PhD defense Koen DeschachtPhD defense Koen Deschacht
PhD defense Koen Deschacht
guest1add48f
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguation
vini89
 
Error analysis of Word Sense Disambiguation
Error analysis of Word Sense DisambiguationError analysis of Word Sense Disambiguation
Error analysis of Word Sense Disambiguation
Rubén Izquierdo Beviá
 
Historical linguistics
Historical linguisticsHistorical linguistics
Historical linguistics
humelt
 

Andere mochten auch (20)

An Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense DisambiguationAn Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense Disambiguation
 
Word sense dissambiguation
Word sense dissambiguationWord sense dissambiguation
Word sense dissambiguation
 
Draft programme 15 09-2015
Draft programme 15 09-2015Draft programme 15 09-2015
Draft programme 15 09-2015
 
BibleTech2011
BibleTech2011BibleTech2011
BibleTech2011
 
A word sense disambiguation technique for sinhala
A word sense disambiguation technique  for sinhalaA word sense disambiguation technique  for sinhala
A word sense disambiguation technique for sinhala
 
Thesis
ThesisThesis
Thesis
 
228-SE3001_2
228-SE3001_2228-SE3001_2
228-SE3001_2
 
Wsd final paper
Wsd final paperWsd final paper
Wsd final paper
 
Supervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting TechniqueSupervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting Technique
 
Introduction to Kernel Functions
Introduction to Kernel FunctionsIntroduction to Kernel Functions
Introduction to Kernel Functions
 
Viva
VivaViva
Viva
 
Advances In Wsd Aaai 2005
Advances In Wsd Aaai 2005Advances In Wsd Aaai 2005
Advances In Wsd Aaai 2005
 
COMPUTATIONAL LINGUISTICS
COMPUTATIONAL LINGUISTICSCOMPUTATIONAL LINGUISTICS
COMPUTATIONAL LINGUISTICS
 
Amharic WSD using WordNet
Amharic WSD using WordNetAmharic WSD using WordNet
Amharic WSD using WordNet
 
PhD defense Koen Deschacht
PhD defense Koen DeschachtPhD defense Koen Deschacht
PhD defense Koen Deschacht
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguation
 
Svm and kernel machines
Svm and kernel machinesSvm and kernel machines
Svm and kernel machines
 
Error analysis of Word Sense Disambiguation
Error analysis of Word Sense DisambiguationError analysis of Word Sense Disambiguation
Error analysis of Word Sense Disambiguation
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
 
Historical linguistics
Historical linguisticsHistorical linguistics
Historical linguistics
 

Ähnlich wie CMSC 723: Computational Linguistics I

kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
butest
 
Introduction to Machine Learning.
Introduction to Machine Learning.Introduction to Machine Learning.
Introduction to Machine Learning.
butest
 
The Geometry of Learning
The Geometry of LearningThe Geometry of Learning
The Geometry of Learning
fridolin.wild
 

Ähnlich wie CMSC 723: Computational Linguistics I (20)

Class14
Class14Class14
Class14
 
Eurolan 2005 Pedersen
Eurolan 2005 PedersenEurolan 2005 Pedersen
Eurolan 2005 Pedersen
 
Measuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and ConceptsMeasuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and Concepts
 
Icon 2007 Pedersen
Icon 2007 PedersenIcon 2007 Pedersen
Icon 2007 Pedersen
 
Ijcai 2007 Pedersen
Ijcai 2007 PedersenIjcai 2007 Pedersen
Ijcai 2007 Pedersen
 
Eacl 2006 Pedersen
Eacl 2006 PedersenEacl 2006 Pedersen
Eacl 2006 Pedersen
 
Using topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic searchUsing topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic search
 
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffnL6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
Measuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and ConceptsMeasuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and Concepts
 
Introduction to Machine Learning.
Introduction to Machine Learning.Introduction to Machine Learning.
Introduction to Machine Learning.
 
Fusing semantic data
Fusing semantic dataFusing semantic data
Fusing semantic data
 
Tg noh jeju_workshop
Tg noh jeju_workshopTg noh jeju_workshop
Tg noh jeju_workshop
 
The Role Of Ontology In Modern Expert Systems Dallas 2008
The Role Of Ontology In Modern Expert Systems   Dallas   2008The Role Of Ontology In Modern Expert Systems   Dallas   2008
The Role Of Ontology In Modern Expert Systems Dallas 2008
 
The Semantic Quilt
The Semantic QuiltThe Semantic Quilt
The Semantic Quilt
 
Aaai 2006 Pedersen
Aaai 2006 PedersenAaai 2006 Pedersen
Aaai 2006 Pedersen
 
Semeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic SimilaritySemeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic Similarity
 
The Geometry of Learning
The Geometry of LearningThe Geometry of Learning
The Geometry of Learning
 
sr.ppt
sr.pptsr.ppt
sr.ppt
 
Voice recognitionr.ppt
Voice recognitionr.pptVoice recognitionr.ppt
Voice recognitionr.ppt
 

Mehr von butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
butest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
butest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
butest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
butest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
butest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
butest
 
Facebook
Facebook Facebook
Facebook
butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
butest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
butest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
butest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
butest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
butest
 

Mehr von butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

CMSC 723: Computational Linguistics I

  • 1. Word Sense Disambiguation CMSC 723: Computational Linguistics I ― Session #11 Jimmy Lin The iSchool University of Maryland Wednesday, November 11, 2009 Material drawn from slides by Saif Mohammad and Bonnie Dorr
  • 2. Progression of the Course Words Finite-state morphology Part-of-speech tagging (TBL + HMM) Structure CFGs + parsing (CKY, Earley) N-gram language models Meaning!
  • 3. Today’s Agenda Word sense disambiguation Beyond lexical semantics Semantic attachments to syntax Shallow semantics: PropBank
  • 5. Recap: Word Sense From WordNet: Noun {pipe, tobacco pipe} (a tube with a small bowl at one end; used for smoking tobacco) {pipe, pipage, piping} (a long tube made of metal or plastic that is used to carry water or oil or gas etc.) {pipe, tube} (a hollow cylindrical shape) {pipe} (a tubular wind instrument) {organ pipe, pipe, pipework} (the flues and stops on a pipe organ) Verb {shriek, shrill, pipe up, pipe} (utter a shrill cry) {pipe} (transport by pipeline) “pipe oil, water, and gas into the desert” {pipe} (play on a pipe) “pipe a tune” {pipe} (trim with piping) “pipe the skirt”
  • 6. Word Sense Disambiguation Task: automatically select the correct sense of a word Lexical sample All-words Theoretically useful for many applications: Semantic similarity (remember from last time?) Information retrieval Machine translation … Solution in search of a problem? Why?
  • 7. How big is the problem? Most words in English have only one sense 62% in Longman’s Dictionary of Contemporary English 79% in WordNet But the others tend to have several senses Average of 3.83 in LDOCE Average of 2.96 in WordNet Ambiguous words are more frequently used In the British National Corpus, 84% of instances have more than one sense Some senses are more frequent than others
  • 8. Ground Truth Which sense inventory do we use? Issues there? Application specificity?
  • 9. Corpora Lexical sample line-hard-serve corpus (4k sense-tagged examples) interest corpus (2,369 sense-tagged examples) … All-words SemCor (234k words, subset of Brown Corpus) Senseval-3 (2081 tagged content words from 5k total words) … Observations about the size?
  • 10. Evaluation Intrinsic Measure accuracy of sense selection wrt ground truth Extrinsic Integrate WSD as part of a bigger end-to-end system, e.g., machine translation or information retrieval Compare WSD
  • 11. Baseline + Upper Bound Baseline: most frequent sense Equivalent to “take first sense” in WordNet Does surprisingly well! Upper bound: Fine-grained WordNet sense: 75-80% human agreement Coarser-grained inventories: 90% human agreement possible What does this mean? 62% accuracy in this case!
  • 12. WSD Approaches Depending on use of manually created knowledge sources Knowledge-lean Knowledge-rich Depending on use of labeled data Supervised Semi- or minimally supervised Unsupervised
  • 13. Lesk’s Algorithm Intuition: note word overlap between context and dictionary entries Unsupervised, but knowledge rich The bank can guarantee deposits will eventually cover future tuition costs because it invests in adjustable-rate mortgage securities. WordNet
  • 14. Lesk’s Algorithm Simplest implementation: Count overlapping content words between glosses and context Lots of variants: Include the examples in dictionary definitions Include hypernyms and hyponyms Give more weight to larger overlaps (e.g., bigrams) Give extra weight to infrequent words (e.g., idf weighting) … Works reasonably well!
  • 15. Supervised WSD: NLP meets ML WSD as a supervised classification task Train a separate classifier for each word Three components of a machine learning problem: Training data (corpora) Representations (features) Learning method (algorithm, model)
  • 16. Supervised Classification Training Testing training data unlabeled document ? label1 label2 label3 label4 Representation Function label1? label2? Classifier supervised machine learning algorithm label3? label4?
  • 17. Three Laws of Machine Learning Thou shalt not mingle training data with test data Thou shalt not mingle training data with test data Thou shalt not mingle training data with test data But what do you do if you need more test data?
  • 18. Features Possible features POS and surface form of the word itself Surrounding words and POS tag Positional information of surrounding words and POS tags Same as above, but with n-grams Grammatical information … Richness of the features? Richer features = ML algorithm does less of the work More impoverished features = ML algorithm does more of the work
  • 19. Classifiers Once we cast the WSD problem as supervised classification, many learning techniques are possible: Naïve Bayes (the thing to try first) Decision lists Decision trees MaxEnt Support vector machines Nearest neighbor methods …
  • 20. Classifiers Tradeoffs Which classifier should I use? It depends: Number of features Types of features Number of possible values for a feature Noise … General advice: Start with Naïve Bayes Use decision trees/lists if you want to understand what the classifier is doing SVMs often give state of the art performance MaxEnt methods also work well
  • 21. Naïve Bayes Pick the sense that is most probable given the context Context represented by feature vector By Bayes’ Theorem: Problem: data sparsity! We can ignore this term… why?
  • 22. The “Naïve” Part Feature vectors are too sparse to estimate directly: So… assume features are conditionally independent given the word sense This is naïve because? Putting everything together:
  • 23. Naïve Bayes: Training How do we estimate the probability distributions? Maximum-Likelihood Estimates (MLE): What else do we need to do? Well, how well does it work? (later…)
  • 24. Decision List Ordered list of tests (equivalent to “case” statement): Example decision list, discriminating between bass (fish) and bass (music):
  • 25. Building Decision Lists Simple algorithm: Compute how discriminative each feature is: Create ordered list of tests from these values Limitation? How do you build n-way classifiers from binary classifiers? One vs. rest (sequential vs. parallel) Another learning problem Well, how well does it work? (later…)
  • 26. Decision Trees Instead of a list, imagine a tree… fish in k words no yes striped bass FISH no yes guitar in k words FISH no yes MUSIC …
  • 27. Using Decision Trees Given an instance (= list of feature values) Start at the root At each interior node, check feature value Follow corresponding branch based on the test When a leaf node is reached, return its category Decision tree material drawn from slides by Ed Loper
  • 28. Building Decision Trees Basic idea: build tree top down, recursively partitioning the training data at each step At each node, try to split the training data on a feature (could be binary or otherwise) What features should we split on? Small decision tree desired Pick the feature that gives the most information about the category Example: 20 questions I’m thinking of a number from 1 to 1,000 You can ask any yes no question What question would you ask?
  • 29. Evaluating Splits via Entropy Entropy of a set of events E: Where P(c) is the probability that an event in E has category c How much information does a feature give us about the category (sense)? H(E) = entropy of event set E H(E|f) = expected entropy of event set E once we know the value of feature f Information Gain: G(E, f) = H(E) – H(E|f) = amount of new information provided by feature f Split on feature that maximizes information gain Well, how well does it work? (later…)
  • 30. WSD Accuracy Generally: Naïve Bayes provides a reasonable baseline: ~70% Decision lists and decision trees slightly lower State of the art is slightly higher However: Accuracy depends on actual word, sense inventory, amount of training data, number of features, etc. Remember caveat about baseline and upper bound
  • 31. Minimally Supervised WSD But annotations are expensive! “Bootstrapping” or co-training (Yarowsky 1995) Start with (small) seed, learn decision list Use decision list to label rest of corpus Retain “confident” labels, treat as annotated data to learn new decision list Repeat… Heuristics (derived from observation): One sense per discourse One sense per collocation
  • 32. One Sense per Discourse A word tends to preserve its meaning across all its occurrences in a given discourse Evaluation: 8 words with two-way ambiguity, e.g. plant, crane, etc. 98% of the two-word occurrences in the same discourse carry the same meaning The grain of salt: accuracy depends on granularity Performance of “one sense per discourse” measured on SemCor is approximately 70% Slide by Mihalcea and Pedersen
  • 33. One Sense per Collocation A word tends to preserve its meaning when used in the same collocation Strong for adjacent collocations Weaker as the distance between words increases Evaluation: 97% precision on words with two-way ambiguity Again, accuracy depends on granularity: 70% precision on SemCor words Slide by Mihalcea and Pedersen
  • 34. Yarowsky’s Method: Example Disambiguating plant (industrial sense) vs. plant (living thing sense) Think of seed features for each sense Industrial sense: co-occurring with “manufacturing” Living thing sense: co-occurring with “life” Use “one sense per collocation” to build initial decision list classifier Treat results as annotated data, train new decision list classifier, iterate…
  • 35.
  • 36.
  • 37.
  • 38.
  • 39. Yarowsky’s Method: Stopping Stop when: Error on training data is less than a threshold No more training data is covered Use final decision list for WSD
  • 40. Yarowsky’s Method: Discussion Advantages: Accuracy is about as good as a supervised algorithm Bootstrapping: far less manual effort Disadvantages: Seeds may be tricky to construct Works only for coarse-grained sense distinctions Snowballing error with co-training Recent extension: now apply this to the web!
  • 41. WSD with Parallel Text But annotations are expensive! What’s the “proper” sense inventory? How fine or coarse grained? Application specific? Observation: multiple senses translate to different words in other languages! A “bill” in English may be a “pico” (bird jaw) in or a “cuenta” (invoice) in Spanish Use the foreign language as the sense inventory! Added bonus: annotations for free! (Byproduct of word-alignment process in machine translation)
  • 44. Semantic Attachments Basic idea: Associate -expressions with lexical items At branching node, apply semantics of one child to another (based on synctatic rule) Refresher in -calculus…
  • 47. Complexities Oh, there are many… Classic problem: quantifier scoping Every restaurant has a menu Issues with this style of semantic analysis?
  • 48. Semantics in NLP Today Can be characterized as “shallow semantics” Verbs denote events Represent as “frames” Nouns (in general) participate in events Types of event participants = “slots” or “roles” Event participants themselves = “slot fillers” Depending on the linguistic theory, roles may have special names: agent, theme, etc. Semantic analysis: semantic role labeling Automatically identify the event type (i.e., frame) Automatically identify event participants and the role that each plays (i.e., label the semantic role)
  • 49. What works in NLP? POS-annotated corpora Tree-annotated corpora: Penn Treebank Role-annotated corpora: Proposition Bank (PropBank)
  • 50. PropBank: Two Examples agree.01 Arg0: Agreer Arg1: Proposition Arg2: Other entity agreeing Example: [Arg0 John] agrees [Arg2 with Mary] [Arg1 on everything] fall.01 Arg1: Logical subject, patient, thing falling Arg2: Extent, amount fallen Arg3: Start point Arg4: End point Example: [Arg1 Sales] fell [Arg4 to $251.2 million] [Arg3 from $278.7 million]
  • 51. How do we do it? Short answer: supervised machine learning One approach: classification of each tree constituent Features can be words, phrase type, linear position, tree position, etc. Apply standard machine learning algorithms
  • 52. Recap of Today’s Topics Word sense disambiguation Beyond lexical semantics Semantic attachments to syntax Shallow semantics: PropBank
  • 53. The Complete Picture Phonology Morphology Syntax Semantics Reasoning Speech Recognition Morphological Analysis Parsing Semantic Analysis Reasoning, Planning Speech Synthesis Morphological Realization Syntactic Realization Utterance Planning
  • 54. The Home Stretch Next week: MapReduce and large-data processing No classes Thanksgiving week! December: two guest lectures by Ken Church