SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
Language Models
Natural Language Processing
Emory University
Jinho D. Choi
Probability
2
Probability of tomorrow being cloudy?
5 days 3 days 2 days
P(cloudy) =
C(cloudy)
C(sunny) + C(cloudy) + C(snowy)
=
3
10
Conditional Probability
3
Probability of tomorrow being cloudy if today is snowy?
P(cloudy|snowy) =
C(snowy, cloudy)
C(snowy)
=
1
2
Conditional Probability
4
Probability of tomorrow being cloudy if

today and yesterday are snowy?
P(cloudy|snowy, snowy) =
C(snowy, snowy, cloudy)
C(snowy, snowy)
=
1
1
= 1
Joint Probability
5
Probability of next 2 days being cloudy, sunny?
P(cloudy, sunny) = P(cloudy) · P(sunny|cloudy)
Joint Probability
6
Probability of next 3 days being snowy, cloudy, sunny?
P(snowy, cloudy, sunny) = P(snowy)·
P(cloudy|snowy)·
P(sunny|snowy, cloudy)
N-gram Models
7
1-gram (Unigram)
P(wi) =
C(wi)
P
8k C(wk)
2-gram (Bigram)
P(wi+1|wi) =
C(wi, wi+1)
P
8k C(wi, wk)
=
C(wi, wi+1)
C(wi)
=
C(wi)
N
# of tokens
token
vs
type?
-
N-gram Models
• Unigram model
- Given any word w, it shows how likely w appears in context.
- This is known as the likelihood (probability) of w, written as P(w).
- How likely does the word “Emory” appear in context?













- Does this mean “Emory” appears 17.39% time in any context?
- How can we measure more accurate likelihoods?
8
Emory University was found as Emory College by John Emory.
Emory University is 16th among the colleges and universities in US.
P(Emory) =
4
23
⇡ 0.1739
-
N-gram Models
• Bigram model
- Given any words wi and wj in sequence, it shows the likelihood of 

wj following wi in context.
- This can be represented as the conditional probability of P(wj|wi).
- What is the most likely word following “Emory”?
9
Emory University was found as Emory College by John Emory.
Emory University is the 20th among the national universities in US.
P(University|Emory) = 2
4 = 0.5
P(College|Emory) = 1
4 = 0.25
P(.|Emory) = 1
4 = 0.25
arg max
k
P(wk|Emory)
Maximum Likelihood
10
xn
1 = x1, . . . , xn
P(xn
1 ) = P(x1) · P(x2|x1) · P(x3|x2
1) · · · P(xn|xn 1
1 )
Chain rule
Any practical issue?
(x1, …, xk) can be very sparse.
Markov assumption
P(xk|xk 1
1 ) ⇡ P(xk|xk 1)
P(xn
1 ) ⇡ P(x1) · P(x2|x1) · P(x3|x2) · · · P(xn|xn 1)
-
Maximum Likelihood
• Maximum likelihood
- Given any word sequence wi, …, wn, how likely this sequence appears
in context.
- This can be represented as the joint probability of P(wj, …, wn).
- How likely does the sequence “you know” appears in context?
11
you know , I know you know that you do .
Chain rule
P(you, know) =
2
11
P(you) · P(know|you) =
3
11
·
2
3
=
2
11
not 10?
-
Word Segmentation
• Word segmentation
- Segment a chunk of string into a sequence of words.
- Are there more than one possible sequence?
- Choose the sequence that most likely appears in context.
12
youknow
P(you) · P(know|you) > P(yo) · P(uk|yo) · P(now|uk)
log(P(you) · P(know|you)) > log(P(yo) · P(uk|yo) · P(now|uk))
log(P(you)) + log(P(know|you)) > log(P(yo)) + log(P(uk|yo)) + log(P(now|uk))
Laplace Smoothing
13
P(xn
1 ) ⇡ P(x1) · P(x2|x1) · P(x3|x2) · · · P(xn|xn 1)
What if P(x1) = 0? P(xn
1 ) ⇡ 0 ← BAD!!
Laplace Smoothing
Pl(xi) =
C(xi) + ↵
P
k(C(xk) + ↵)
=
C(xi) + ↵
P
k C(xk) + ↵|X|
P(xi) =
C(xi)
P
k C(xk)
=
C(xi) + ↵
N + ↵|X|
=
↵
N + ↵|X|
Pl(x?) =
C(x?) + ↵
P
k C(xk) + ↵|X|
Laplace Smoothing
14
Laplace Smoothing
Pl(xj|xi) =
C(xi, xj) + ↵
P
k(C(xi, xk) + ↵)
=
C(xi, xj) + ↵
P
k C(xi, xk) + ↵|Xi,⇤|
P(xj|xi) =
C(xi, xj)
P
k C(xi, xk)
=
C(xi, xj)
C(xi)
=
C(xi, xj) + ↵
C(xi) + ↵|Xi,⇤|
Pl(x?|xi) =
↵
C(xi) + ↵|Xi,⇤|
Discount Smoothing
• Issues with Laplace smoothing
- Unfair discounts.

















- Unseen likelihood may get penalized too harshly when the minimum
count is much greater than α.
- How to reduce the gap between the minimum count and unseen
count?
15
10
100
= 0.1 !
10 + 1
100 + 10
= 0.1
50
100
= 0.5 !
50 + 1
100 + 10
= 0.46
1
100
= 0.01 !
1 + 1
100 + 10
= 0.018 +0.008
0
-0.04
Discount Smoothing
16
Laplace Discount
Pl(xi) =
C(xi) + ↵
N + ↵|X|
Pl(xj|xi) =
C(xi, xj) + ↵
C(xi) + ↵|Xi,⇤|
Pl(x?|xi) =
↵
C(xi) + ↵|Xi,⇤|
Pl(x?) =
↵
N + ↵|X|
Pd(xi) =
C(xi) Pd(x?)
N
Pd(xj|xi) =
C(xi, xj) Pd(x?|xi)
C(xi)
Pd(x?|xi) = ↵ · min
k
P(xk|xi)
Pd(x?) = ↵ · min
k
P(xk)
Good-Turing Smoothing
17
Nc = the count of n-grams that appear c times
Type
carp
perch
whitefish
trout
salmon
eel
?
Total
N1 = 3, N2 = 1, N3 = 1, N4 = 1
MLE
0.33
0.25
0.17
0.08
0.08
0.08
0.00
1.00
C
4
3
2
1
1
1
0
12
C(xi)⇤
=
(C(xi) + 1) · Nc+1
Nc
P(x?) =
N1
N
C(eel) =
(C(eel) + 1) · N2
N1
=
2
3
C*
5.00
4.00
3.00
0.67
0.67
0.67
0.17
P(carp) =?
P(eel) =
2/3
12
Pb(xj|xi) =
8
<
:
Pl|d(xj|xi) P(xj|xi) > 0
· Pl|d(xj) Otherwise
Backoff
• Backoff
- Bigrams are more accurate than unigrams.
- Bigrams are sparser than unigrams.
- Use bigrams in general, and use unigrams only bigrams don’t exist.
18
How to measure?
= ↵ ·
h(P(xj|xi)ii,j
h(P(xj)ij
Interpolation
• Interpolation
- Unigrams and bigrams provide different but useful information .
- Use them both with different weights.
19
ˆP(xj|xi) = 1 · Pl|d(xj) + 2 · Pl|d(xj|xi)
X
k
k = 1

Weitere ähnliche Inhalte

Was ist angesagt?

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingIla Group
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Natural Language processing
Natural Language processingNatural Language processing
Natural Language processingSanzid Kawsar
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hakky St
 
Lightweight Natural Language Processing (NLP)
Lightweight Natural Language Processing (NLP)Lightweight Natural Language Processing (NLP)
Lightweight Natural Language Processing (NLP)Lithium
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)VenkateshMurugadas
 
Language modelling and its use cases
Language modelling and its use casesLanguage modelling and its use cases
Language modelling and its use casesKhrystyna Skopyk
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Challenges in nlp
Challenges in nlpChallenges in nlp
Challenges in nlpZareen Syed
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Rajnish Raj
 
Techniques in Deep Learning
Techniques in Deep LearningTechniques in Deep Learning
Techniques in Deep LearningSourya Dey
 
Language Model.pptx
Language Model.pptxLanguage Model.pptx
Language Model.pptxFiras Obeid
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLPAnuj Gupta
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language ModelsLeon Dohmen
 

Was ist angesagt? (20)

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
mcmc
mcmcmcmc
mcmc
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Natural Language processing
Natural Language processingNatural Language processing
Natural Language processing
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
 
Lightweight Natural Language Processing (NLP)
Lightweight Natural Language Processing (NLP)Lightweight Natural Language Processing (NLP)
Lightweight Natural Language Processing (NLP)
 
Wordnet Introduction
Wordnet IntroductionWordnet Introduction
Wordnet Introduction
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Language modelling and its use cases
Language modelling and its use casesLanguage modelling and its use cases
Language modelling and its use cases
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Challenges in nlp
Challenges in nlpChallenges in nlp
Challenges in nlp
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
 
Techniques in Deep Learning
Techniques in Deep LearningTechniques in Deep Learning
Techniques in Deep Learning
 
NLP
NLPNLP
NLP
 
Language Model.pptx
Language Model.pptxLanguage Model.pptx
Language Model.pptx
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLP
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
 

Ähnlich wie CS571: Language Models

[2019] Language Modeling
[2019] Language Modeling[2019] Language Modeling
[2019] Language ModelingJinho Choi
 
cps170_bayes_nets.ppt
cps170_bayes_nets.pptcps170_bayes_nets.ppt
cps170_bayes_nets.pptFaizAbaas
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...Joe Suzuki
 
15 Probability Distribution Practical (HSC).pdf
15 Probability Distribution Practical (HSC).pdf15 Probability Distribution Practical (HSC).pdf
15 Probability Distribution Practical (HSC).pdfvedantsk1
 
Decision Making with Hierarchical Credal Sets (IPMU 2014)
Decision Making with Hierarchical Credal Sets (IPMU 2014)Decision Making with Hierarchical Credal Sets (IPMU 2014)
Decision Making with Hierarchical Credal Sets (IPMU 2014)Alessandro Antonucci
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
 
Appendex b
Appendex bAppendex b
Appendex bswavicky
 
Ani agustina (a1 c011007) polynomial
Ani agustina (a1 c011007) polynomialAni agustina (a1 c011007) polynomial
Ani agustina (a1 c011007) polynomialAni_Agustina
 
Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...Advanced-Concepts-Team
 
Han Liu MedicReS World Congress 2015
Han Liu MedicReS World Congress 2015Han Liu MedicReS World Congress 2015
Han Liu MedicReS World Congress 2015MedicReS
 
3.4 looking for real roots of real polynomials t
3.4 looking for real roots of real polynomials t3.4 looking for real roots of real polynomials t
3.4 looking for real roots of real polynomials tmath260
 
3.2 properties of division and roots t
3.2 properties of division and roots t3.2 properties of division and roots t
3.2 properties of division and roots tmath260
 
A/B Testing for Game Design
A/B Testing for Game DesignA/B Testing for Game Design
A/B Testing for Game DesignTrieu Nguyen
 

Ähnlich wie CS571: Language Models (20)

[2019] Language Modeling
[2019] Language Modeling[2019] Language Modeling
[2019] Language Modeling
 
cps170_bayes_nets.ppt
cps170_bayes_nets.pptcps170_bayes_nets.ppt
cps170_bayes_nets.ppt
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
talk MCMC & SMC 2004
talk MCMC & SMC 2004talk MCMC & SMC 2004
talk MCMC & SMC 2004
 
15 Probability Distribution Practical (HSC).pdf
15 Probability Distribution Practical (HSC).pdf15 Probability Distribution Practical (HSC).pdf
15 Probability Distribution Practical (HSC).pdf
 
Decision Making with Hierarchical Credal Sets (IPMU 2014)
Decision Making with Hierarchical Credal Sets (IPMU 2014)Decision Making with Hierarchical Credal Sets (IPMU 2014)
Decision Making with Hierarchical Credal Sets (IPMU 2014)
 
Unit2.polynomials.algebraicfractions
Unit2.polynomials.algebraicfractionsUnit2.polynomials.algebraicfractions
Unit2.polynomials.algebraicfractions
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
Appendex b
Appendex bAppendex b
Appendex b
 
Ani agustina (a1 c011007) polynomial
Ani agustina (a1 c011007) polynomialAni agustina (a1 c011007) polynomial
Ani agustina (a1 c011007) polynomial
 
8803-09-lec16.pdf
8803-09-lec16.pdf8803-09-lec16.pdf
8803-09-lec16.pdf
 
Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...
 
Factor theorem
Factor theoremFactor theorem
Factor theorem
 
Han Liu MedicReS World Congress 2015
Han Liu MedicReS World Congress 2015Han Liu MedicReS World Congress 2015
Han Liu MedicReS World Congress 2015
 
1. Logic and Proofs.ppt
1. Logic and Proofs.ppt1. Logic and Proofs.ppt
1. Logic and Proofs.ppt
 
3.4 looking for real roots of real polynomials t
3.4 looking for real roots of real polynomials t3.4 looking for real roots of real polynomials t
3.4 looking for real roots of real polynomials t
 
3.2 properties of division and roots t
3.2 properties of division and roots t3.2 properties of division and roots t
3.2 properties of division and roots t
 
A/B Testing for Game Design
A/B Testing for Game DesignA/B Testing for Game Design
A/B Testing for Game Design
 

Mehr von Jinho Choi

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Jinho Choi
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Jinho Choi
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Jinho Choi
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Jinho Choi
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionJinho Choi
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Jinho Choi
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning RepresentationJinho Choi
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role LabelingJinho Choi
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet SimilaritiesJinho Choi
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical RelationsJinho Choi
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementJinho Choi
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingJinho Choi
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueJinho Choi
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingJinho Choi
 
Topological Sort
Topological SortTopological Sort
Topological SortJinho Choi
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseJinho Choi
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsJinho Choi
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyJinho Choi
 

Mehr von Jinho Choi (20)

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference Resolution
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning Representation
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
CKY Parsing
CKY ParsingCKY Parsing
CKY Parsing
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet Similarities
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical Relations
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue Management
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR Parsing
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue Understanding
 
Topological Sort
Topological SortTopological Sort
Topological Sort
 
Tries - Put
Tries - PutTries - Put
Tries - Put
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports Intelligently
 

Kürzlich hochgeladen

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Kürzlich hochgeladen (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

CS571: Language Models

  • 1. Language Models Natural Language Processing Emory University Jinho D. Choi
  • 2. Probability 2 Probability of tomorrow being cloudy? 5 days 3 days 2 days P(cloudy) = C(cloudy) C(sunny) + C(cloudy) + C(snowy) = 3 10
  • 3. Conditional Probability 3 Probability of tomorrow being cloudy if today is snowy? P(cloudy|snowy) = C(snowy, cloudy) C(snowy) = 1 2
  • 4. Conditional Probability 4 Probability of tomorrow being cloudy if
 today and yesterday are snowy? P(cloudy|snowy, snowy) = C(snowy, snowy, cloudy) C(snowy, snowy) = 1 1 = 1
  • 5. Joint Probability 5 Probability of next 2 days being cloudy, sunny? P(cloudy, sunny) = P(cloudy) · P(sunny|cloudy)
  • 6. Joint Probability 6 Probability of next 3 days being snowy, cloudy, sunny? P(snowy, cloudy, sunny) = P(snowy)· P(cloudy|snowy)· P(sunny|snowy, cloudy)
  • 7. N-gram Models 7 1-gram (Unigram) P(wi) = C(wi) P 8k C(wk) 2-gram (Bigram) P(wi+1|wi) = C(wi, wi+1) P 8k C(wi, wk) = C(wi, wi+1) C(wi) = C(wi) N # of tokens token vs type?
  • 8. - N-gram Models • Unigram model - Given any word w, it shows how likely w appears in context. - This is known as the likelihood (probability) of w, written as P(w). - How likely does the word “Emory” appear in context?
 
 
 
 
 
 
 - Does this mean “Emory” appears 17.39% time in any context? - How can we measure more accurate likelihoods? 8 Emory University was found as Emory College by John Emory. Emory University is 16th among the colleges and universities in US. P(Emory) = 4 23 ⇡ 0.1739
  • 9. - N-gram Models • Bigram model - Given any words wi and wj in sequence, it shows the likelihood of 
 wj following wi in context. - This can be represented as the conditional probability of P(wj|wi). - What is the most likely word following “Emory”? 9 Emory University was found as Emory College by John Emory. Emory University is the 20th among the national universities in US. P(University|Emory) = 2 4 = 0.5 P(College|Emory) = 1 4 = 0.25 P(.|Emory) = 1 4 = 0.25 arg max k P(wk|Emory)
  • 10. Maximum Likelihood 10 xn 1 = x1, . . . , xn P(xn 1 ) = P(x1) · P(x2|x1) · P(x3|x2 1) · · · P(xn|xn 1 1 ) Chain rule Any practical issue? (x1, …, xk) can be very sparse. Markov assumption P(xk|xk 1 1 ) ⇡ P(xk|xk 1) P(xn 1 ) ⇡ P(x1) · P(x2|x1) · P(x3|x2) · · · P(xn|xn 1)
  • 11. - Maximum Likelihood • Maximum likelihood - Given any word sequence wi, …, wn, how likely this sequence appears in context. - This can be represented as the joint probability of P(wj, …, wn). - How likely does the sequence “you know” appears in context? 11 you know , I know you know that you do . Chain rule P(you, know) = 2 11 P(you) · P(know|you) = 3 11 · 2 3 = 2 11 not 10?
  • 12. - Word Segmentation • Word segmentation - Segment a chunk of string into a sequence of words. - Are there more than one possible sequence? - Choose the sequence that most likely appears in context. 12 youknow P(you) · P(know|you) > P(yo) · P(uk|yo) · P(now|uk) log(P(you) · P(know|you)) > log(P(yo) · P(uk|yo) · P(now|uk)) log(P(you)) + log(P(know|you)) > log(P(yo)) + log(P(uk|yo)) + log(P(now|uk))
  • 13. Laplace Smoothing 13 P(xn 1 ) ⇡ P(x1) · P(x2|x1) · P(x3|x2) · · · P(xn|xn 1) What if P(x1) = 0? P(xn 1 ) ⇡ 0 ← BAD!! Laplace Smoothing Pl(xi) = C(xi) + ↵ P k(C(xk) + ↵) = C(xi) + ↵ P k C(xk) + ↵|X| P(xi) = C(xi) P k C(xk) = C(xi) + ↵ N + ↵|X| = ↵ N + ↵|X| Pl(x?) = C(x?) + ↵ P k C(xk) + ↵|X|
  • 14. Laplace Smoothing 14 Laplace Smoothing Pl(xj|xi) = C(xi, xj) + ↵ P k(C(xi, xk) + ↵) = C(xi, xj) + ↵ P k C(xi, xk) + ↵|Xi,⇤| P(xj|xi) = C(xi, xj) P k C(xi, xk) = C(xi, xj) C(xi) = C(xi, xj) + ↵ C(xi) + ↵|Xi,⇤| Pl(x?|xi) = ↵ C(xi) + ↵|Xi,⇤|
  • 15. Discount Smoothing • Issues with Laplace smoothing - Unfair discounts.
 
 
 
 
 
 
 
 
 - Unseen likelihood may get penalized too harshly when the minimum count is much greater than α. - How to reduce the gap between the minimum count and unseen count? 15 10 100 = 0.1 ! 10 + 1 100 + 10 = 0.1 50 100 = 0.5 ! 50 + 1 100 + 10 = 0.46 1 100 = 0.01 ! 1 + 1 100 + 10 = 0.018 +0.008 0 -0.04
  • 16. Discount Smoothing 16 Laplace Discount Pl(xi) = C(xi) + ↵ N + ↵|X| Pl(xj|xi) = C(xi, xj) + ↵ C(xi) + ↵|Xi,⇤| Pl(x?|xi) = ↵ C(xi) + ↵|Xi,⇤| Pl(x?) = ↵ N + ↵|X| Pd(xi) = C(xi) Pd(x?) N Pd(xj|xi) = C(xi, xj) Pd(x?|xi) C(xi) Pd(x?|xi) = ↵ · min k P(xk|xi) Pd(x?) = ↵ · min k P(xk)
  • 17. Good-Turing Smoothing 17 Nc = the count of n-grams that appear c times Type carp perch whitefish trout salmon eel ? Total N1 = 3, N2 = 1, N3 = 1, N4 = 1 MLE 0.33 0.25 0.17 0.08 0.08 0.08 0.00 1.00 C 4 3 2 1 1 1 0 12 C(xi)⇤ = (C(xi) + 1) · Nc+1 Nc P(x?) = N1 N C(eel) = (C(eel) + 1) · N2 N1 = 2 3 C* 5.00 4.00 3.00 0.67 0.67 0.67 0.17 P(carp) =? P(eel) = 2/3 12
  • 18. Pb(xj|xi) = 8 < : Pl|d(xj|xi) P(xj|xi) > 0 · Pl|d(xj) Otherwise Backoff • Backoff - Bigrams are more accurate than unigrams. - Bigrams are sparser than unigrams. - Use bigrams in general, and use unigrams only bigrams don’t exist. 18 How to measure? = ↵ · h(P(xj|xi)ii,j h(P(xj)ij
  • 19. Interpolation • Interpolation - Unigrams and bigrams provide different but useful information . - Use them both with different weights. 19 ˆP(xj|xi) = 1 · Pl|d(xj) + 2 · Pl|d(xj|xi) X k k = 1