SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
DLO8012: Natural Language
Processing
Subject Teacher:
Prof. Vikas Dubey
RIZVI COLLEGE OF ENGINEERING
BANDRA(W),MUMBAI
1
Module-3
Syntax Analysis
CO-3 [10hrs]
CO-3: Be able to model linguistic phenomena with formal grammars.
2
3
Conditional Probability and Tags
• P(Verb) is the probability of a randomly selected word being a verb.
• P(Verb|race) is “what’s the probability of a word being a verb given that it’s
the word “race”?
• Race can be a noun or a verb.
• It’s more likely to be a noun.
• P(Verb|race) can be estimated by looking at some corpus and saying “out of
all the times we saw ‘race’, how many were verbs?
• In Brown corpus, P(Verb|race) = 96/98 = .98
• How to calculate for a tag sequence, say P(NN|DT)?

P(V | race) =
Count(race is verb)
total Count(race)
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
Stochastic Tagging
• Stochastic taggers generally resolve tagging ambiguities by using a
training corpus to compute the probability of a given word having a
given tag in a given context.
• Stochastic tagger called also HMM tagger or a Maximum
Likelihood Tagger, or a Markov model HMM TAGGER tagger,
based on the Hidden Markov Model.
• For a given word sequence, Hidden Markov Model (HMM) Taggers
choose the tag sequence that maximizes,
P(word | tag) * P(tag | previous-n-tags)
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
4
Stochastic Tagging
• A bigram HMM tagger chooses the tag ti for word wi that is most
probable given the previous tag, ti-1
ti = argmaxj P(tj | ti-1, wi)
• From the chain rule for probability factorization,
• Some approximation are introduced to simplify the model, such as
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
5
Stochastic Tagging
• The word probability depends only on the tag
• The dependence of a tag from the preceding tag history is limited in
time, e.i. a tag depends only on the two preceding ones,
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
6
7
Statistical POS Tagging (Allen95)
• Let’s step back a minute and remember some probability theory and its use in
POS tagging.
• Suppose, with no context, we just want to know given the word “flies” whether
it should be tagged as a noun or as a verb.
• We use conditional probability for this: we want to know which is greater
PROB(N | flies) or PROB(V | flies)
• Note definition of conditional probability
PROB(a | b) = PROB(a & b) / PROB(b)
– Where PROB(a & b) is the probability of the two events a & b occurring
simultaneously
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
8
Calculating POS for “flies”
We need to know which is more
• PROB(N | flies) = PROB(flies & N) / PROB(flies)
• PROB(V | flies) = PROB(flies & V) / PROB(flies)
• Count on a Corpus
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
Stochastic Tagging
• The simplest stochastic tagger applies the following approaches for POS
tagging –
Approach 1: Word Frequency Approach
• In this approach, the stochastic taggers disambiguate the words based on
the probability that a word occurs with a particular tag.
• We can also say that the tag encountered most frequently with the word in
the training set is the one assigned to an ambiguous instance of that word.
• The main issue with this approach is that it may yield inadmissible
sequence of tags.
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
9
Stochastic Tagging
• Assign each word its most likely POS tag
– If w has tags t1, …, tk, then can use
– P(ti | w) = c(w,ti )/(c(w,t1) + … + c(w,tk)), where
– c(w, ti ) = number of times w/ti appears in the corpus
– Success: 91% for English
Example heat :: noun/89, verb/5
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
10
Stochastic Tagging
Approach 2: Tag Sequence Probabilities
• It is another approach of stochastic tagging, where the tagger
calculates the probability of a given sequence of tags occurring.
• It is also called n-gram approach.
• It is called so because the best tag for a given word is determined by
the probability at which it occurs with the n previous tags.
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
11
Stochastic Tagging
• Given: sequence of words W
– W = w1,w2,…,wn (a sentence)
• – e.g., W = heat water in a large vessel
• Assign sequence of tags T:
• T = t1, t2, … , tn
• Find T that maximizes P(T | W)
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
12
Stochastic Tagging
• But P(ti|wi) is difficult to compute and Bayesian classification rule is used:
P(x|y) = P(x) P(y|x) / P(y)
• When applied to the sequence of words, the most probable tag sequence
would be
P(ti|wi) = P(ti) P(wi|ti)/P(wi)
• where P(wi) does not change and thus do not need to be calculated
• Thus, the most probable tag sequence is the product of two probabilities for
each possible sequence:
– Prior probability of the tag sequence. Context P(ti)
– Likelihood of the sequence of words considering a sequence of (hidden)
tags. P(wi|ti)
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
13
Stochastic Tagging
• Two simplifications for computing the most probable sequence of tags:
– Prior probability of the part of speech tag of a word depends only on the
tag of the previous word (bigrams, reduce context to previous). Facilitates
the computation of P(ti)
– Ex. Probability of noun after determiner
– Probability of a word depends only on its part-of-speech tag.
(independent of other words in the context). Facilitates the computation of
P(wi|ti), Likelihood probability.
• Ex. given the tag noun, probability of word dog
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
14
15
Stochastic Tagging
• Based on probability of certain tag occurring given
various possibilities
• Necessitates a training corpus
• No probabilities for words not in corpus.
• Training corpus may be too different from test corpus.
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
16
Stochastic Tagging (cont.)
Simple Method: Choose most frequent tag in training text
for each word!
– Result: 90% accuracy
– Why?
– Baseline: Others will do better
– HMM is an example
Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
Thank You…
17

Weitere ähnliche Inhalte

Was ist angesagt?

Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesankit_ppt
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role LabelingMarina Santini
 
CS571: Coreference Resolution
CS571: Coreference ResolutionCS571: Coreference Resolution
CS571: Coreference ResolutionJinho Choi
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTSuman Debnath
 
Active learning
Active learningActive learning
Active learningAli Abbasi
 
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopiwan_rg
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IMachine Learning Valencia
 
Lecture 2: Computational Semantics
Lecture 2: Computational SemanticsLecture 2: Computational Semantics
Lecture 2: Computational SemanticsMarina Santini
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuningtaeseon ryu
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptxHeneWijaya
 
Cd ch2 - lexical analysis
Cd   ch2 - lexical analysisCd   ch2 - lexical analysis
Cd ch2 - lexical analysismengistu23
 
Text clustering
Text clusteringText clustering
Text clusteringKU Leuven
 
Introduction to Kneser-Ney Smoothing on Top of Generalized Language Models fo...
Introduction to Kneser-Ney Smoothing on Top of Generalized Language Models fo...Introduction to Kneser-Ney Smoothing on Top of Generalized Language Models fo...
Introduction to Kneser-Ney Smoothing on Top of Generalized Language Models fo...Martin Körner
 
XLnet RoBERTa Reformer
XLnet RoBERTa ReformerXLnet RoBERTa Reformer
XLnet RoBERTa ReformerSan Kim
 
Solving Quadratic Assignment Problems (QAP) using Ant Colony System
Solving Quadratic Assignment Problems (QAP) using Ant Colony SystemSolving Quadratic Assignment Problems (QAP) using Ant Colony System
Solving Quadratic Assignment Problems (QAP) using Ant Colony SystemAjay Bidyarthy
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 

Was ist angesagt? (20)

Parse Tree
Parse TreeParse Tree
Parse Tree
 
Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniques
 
Understanding GloVe
Understanding GloVeUnderstanding GloVe
Understanding GloVe
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
CS571: Coreference Resolution
CS571: Coreference ResolutionCS571: Coreference Resolution
CS571: Coreference Resolution
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
Active learning
Active learningActive learning
Active learning
 
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
 
Lecture 2: Computational Semantics
Lecture 2: Computational SemanticsLecture 2: Computational Semantics
Lecture 2: Computational Semantics
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptx
 
Cd ch2 - lexical analysis
Cd   ch2 - lexical analysisCd   ch2 - lexical analysis
Cd ch2 - lexical analysis
 
Text clustering
Text clusteringText clustering
Text clustering
 
Introduction to Kneser-Ney Smoothing on Top of Generalized Language Models fo...
Introduction to Kneser-Ney Smoothing on Top of Generalized Language Models fo...Introduction to Kneser-Ney Smoothing on Top of Generalized Language Models fo...
Introduction to Kneser-Ney Smoothing on Top of Generalized Language Models fo...
 
XLnet RoBERTa Reformer
XLnet RoBERTa ReformerXLnet RoBERTa Reformer
XLnet RoBERTa Reformer
 
Solving Quadratic Assignment Problems (QAP) using Ant Colony System
Solving Quadratic Assignment Problems (QAP) using Ant Colony SystemSolving Quadratic Assignment Problems (QAP) using Ant Colony System
Solving Quadratic Assignment Problems (QAP) using Ant Colony System
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 

Ähnlich wie Lecture-18(11-02-22)Stochastics POS Tagging.pdf

Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Dev Sahu
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.pptmilkesa13
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationSeonghyun Kim
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelHemantha Kulathilake
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Lifeng (Aaron) Han
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...Lifeng (Aaron) Han
 
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIESA REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIESIJCSES Journal
 
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Lviv Data Science Summer School
 
Tiancheng Zhao - 2017 - Learning Discourse-level Diversity for Neural Dialog...
Tiancheng Zhao - 2017 -  Learning Discourse-level Diversity for Neural Dialog...Tiancheng Zhao - 2017 -  Learning Discourse-level Diversity for Neural Dialog...
Tiancheng Zhao - 2017 - Learning Discourse-level Diversity for Neural Dialog...Association for Computational Linguistics
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translationasimuop
 
Lecture 3: Semantic Role Labelling
Lecture 3: Semantic Role LabellingLecture 3: Semantic Role Labelling
Lecture 3: Semantic Role LabellingMarina Santini
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Lifeng (Aaron) Han
 
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONAN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONcscpconf
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Toru Fujino
 
Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Jinpyo Lee
 
Info 2402 irt-chapter_4
Info 2402 irt-chapter_4Info 2402 irt-chapter_4
Info 2402 irt-chapter_4Shahriar Rafee
 
2021 03-02-distributed representations-of_words_and_phrases
2021 03-02-distributed representations-of_words_and_phrases2021 03-02-distributed representations-of_words_and_phrases
2021 03-02-distributed representations-of_words_and_phrasesJAEMINJEONG5
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH WarNik Chow
 

Ähnlich wie Lecture-18(11-02-22)Stochastics POS Tagging.pdf (20)

Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword Information
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
 
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIESA REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
 
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
 
Tiancheng Zhao - 2017 - Learning Discourse-level Diversity for Neural Dialog...
Tiancheng Zhao - 2017 -  Learning Discourse-level Diversity for Neural Dialog...Tiancheng Zhao - 2017 -  Learning Discourse-level Diversity for Neural Dialog...
Tiancheng Zhao - 2017 - Learning Discourse-level Diversity for Neural Dialog...
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
 
Lecture 3: Semantic Role Labelling
Lecture 3: Semantic Role LabellingLecture 3: Semantic Role Labelling
Lecture 3: Semantic Role Labelling
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...
 
Barreiro-Batista-LR4NLP@Coling2018-presentation
Barreiro-Batista-LR4NLP@Coling2018-presentationBarreiro-Batista-LR4NLP@Coling2018-presentation
Barreiro-Batista-LR4NLP@Coling2018-presentation
 
Poscat seminar 8
Poscat seminar 8Poscat seminar 8
Poscat seminar 8
 
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONAN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)
 
Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Word2vec slide(lab seminar)
Word2vec slide(lab seminar)
 
Info 2402 irt-chapter_4
Info 2402 irt-chapter_4Info 2402 irt-chapter_4
Info 2402 irt-chapter_4
 
2021 03-02-distributed representations-of_words_and_phrases
2021 03-02-distributed representations-of_words_and_phrases2021 03-02-distributed representations-of_words_and_phrases
2021 03-02-distributed representations-of_words_and_phrases
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
 

Kürzlich hochgeladen

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Kürzlich hochgeladen (20)

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Lecture-18(11-02-22)Stochastics POS Tagging.pdf

  • 1. DLO8012: Natural Language Processing Subject Teacher: Prof. Vikas Dubey RIZVI COLLEGE OF ENGINEERING BANDRA(W),MUMBAI 1
  • 2. Module-3 Syntax Analysis CO-3 [10hrs] CO-3: Be able to model linguistic phenomena with formal grammars. 2
  • 3. 3 Conditional Probability and Tags • P(Verb) is the probability of a randomly selected word being a verb. • P(Verb|race) is “what’s the probability of a word being a verb given that it’s the word “race”? • Race can be a noun or a verb. • It’s more likely to be a noun. • P(Verb|race) can be estimated by looking at some corpus and saying “out of all the times we saw ‘race’, how many were verbs? • In Brown corpus, P(Verb|race) = 96/98 = .98 • How to calculate for a tag sequence, say P(NN|DT)?  P(V | race) = Count(race is verb) total Count(race) Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
  • 4. Stochastic Tagging • Stochastic taggers generally resolve tagging ambiguities by using a training corpus to compute the probability of a given word having a given tag in a given context. • Stochastic tagger called also HMM tagger or a Maximum Likelihood Tagger, or a Markov model HMM TAGGER tagger, based on the Hidden Markov Model. • For a given word sequence, Hidden Markov Model (HMM) Taggers choose the tag sequence that maximizes, P(word | tag) * P(tag | previous-n-tags) Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 4
  • 5. Stochastic Tagging • A bigram HMM tagger chooses the tag ti for word wi that is most probable given the previous tag, ti-1 ti = argmaxj P(tj | ti-1, wi) • From the chain rule for probability factorization, • Some approximation are introduced to simplify the model, such as Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 5
  • 6. Stochastic Tagging • The word probability depends only on the tag • The dependence of a tag from the preceding tag history is limited in time, e.i. a tag depends only on the two preceding ones, Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 6
  • 7. 7 Statistical POS Tagging (Allen95) • Let’s step back a minute and remember some probability theory and its use in POS tagging. • Suppose, with no context, we just want to know given the word “flies” whether it should be tagged as a noun or as a verb. • We use conditional probability for this: we want to know which is greater PROB(N | flies) or PROB(V | flies) • Note definition of conditional probability PROB(a | b) = PROB(a & b) / PROB(b) – Where PROB(a & b) is the probability of the two events a & b occurring simultaneously Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
  • 8. 8 Calculating POS for “flies” We need to know which is more • PROB(N | flies) = PROB(flies & N) / PROB(flies) • PROB(V | flies) = PROB(flies & V) / PROB(flies) • Count on a Corpus Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
  • 9. Stochastic Tagging • The simplest stochastic tagger applies the following approaches for POS tagging – Approach 1: Word Frequency Approach • In this approach, the stochastic taggers disambiguate the words based on the probability that a word occurs with a particular tag. • We can also say that the tag encountered most frequently with the word in the training set is the one assigned to an ambiguous instance of that word. • The main issue with this approach is that it may yield inadmissible sequence of tags. Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 9
  • 10. Stochastic Tagging • Assign each word its most likely POS tag – If w has tags t1, …, tk, then can use – P(ti | w) = c(w,ti )/(c(w,t1) + … + c(w,tk)), where – c(w, ti ) = number of times w/ti appears in the corpus – Success: 91% for English Example heat :: noun/89, verb/5 Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 10
  • 11. Stochastic Tagging Approach 2: Tag Sequence Probabilities • It is another approach of stochastic tagging, where the tagger calculates the probability of a given sequence of tags occurring. • It is also called n-gram approach. • It is called so because the best tag for a given word is determined by the probability at which it occurs with the n previous tags. Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 11
  • 12. Stochastic Tagging • Given: sequence of words W – W = w1,w2,…,wn (a sentence) • – e.g., W = heat water in a large vessel • Assign sequence of tags T: • T = t1, t2, … , tn • Find T that maximizes P(T | W) Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 12
  • 13. Stochastic Tagging • But P(ti|wi) is difficult to compute and Bayesian classification rule is used: P(x|y) = P(x) P(y|x) / P(y) • When applied to the sequence of words, the most probable tag sequence would be P(ti|wi) = P(ti) P(wi|ti)/P(wi) • where P(wi) does not change and thus do not need to be calculated • Thus, the most probable tag sequence is the product of two probabilities for each possible sequence: – Prior probability of the tag sequence. Context P(ti) – Likelihood of the sequence of words considering a sequence of (hidden) tags. P(wi|ti) Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 13
  • 14. Stochastic Tagging • Two simplifications for computing the most probable sequence of tags: – Prior probability of the part of speech tag of a word depends only on the tag of the previous word (bigrams, reduce context to previous). Facilitates the computation of P(ti) – Ex. Probability of noun after determiner – Probability of a word depends only on its part-of-speech tag. (independent of other words in the context). Facilitates the computation of P(wi|ti), Likelihood probability. • Ex. given the tag noun, probability of word dog Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22 14
  • 15. 15 Stochastic Tagging • Based on probability of certain tag occurring given various possibilities • Necessitates a training corpus • No probabilities for words not in corpus. • Training corpus may be too different from test corpus. Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22
  • 16. 16 Stochastic Tagging (cont.) Simple Method: Choose most frequent tag in training text for each word! – Result: 90% accuracy – Why? – Baseline: Others will do better – HMM is an example Prof. Vikas Dubey | RCOE | COMP | NLP | BE 2021-22