SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
Web Science & Technologies
University of Koblenz ▪ Landau, Germany
Introduction to Kneser-Ney
Smoothing on Top of Generalized Language
Models for Next Word Prediction
Martin Körner
Oberseminar
25.07.2013
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
2 of 30
WeST
Content
 Introduction
 Language Models
 Generalized Language Models
 Smoothing
 Progress
 Summary
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
3 of 30
WeST
Content
 Introduction
 Language Models
 Generalized Language Models
 Smoothing
 Progress
 Summary
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
4 of 30
WeST
Introduction: Motivation
 Next word prediction: What is the next word a user will
type?
 Use cases for next word prediction:
 Augmentative and Alternative
Communication (AAC)
 Small keyboards (Smartphones)
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
5 of 30
WeST
Introduction to next word prediction
 How do we predict words?
1. Rationalist approach
• Manually encoding information about language
• “Toy” problems only
2. Empiricist approach
• Statistical, pattern recognition, and machine learning
methods applied on corpora
• Result: Language models
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
6 of 30
WeST
Content
 Introduction
 Language Models
 Generalized Language Models
 Smoothing
 Progress
 Summary
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
7 of 30
WeST
Language models in general
 Language model: How likely is a sentence 𝑠?
 Probability distribution: 𝑃 𝑠
 Calculate 𝑃 𝑠 by multiplying conditional probabilities
 Example:
𝑃 If you′
re going to San Francisco , be sure …
=
𝑃 you′
re | If ∗ 𝑃 going | If you′
re ∗
𝑃 to | If you′
re going ∗ 𝑃 San | If you′
re going to ∗
𝑃 Francisco | If you′
re going to San ∗ ⋯
 Empirical approach would fail
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
8 of 30
WeST
Conditional probabilities simplified
 Markov assumption [JM80]:
 Only the last n-1 words are relevant for a prediction
 Example with n=5:
𝑃 sure | If you′re going to San Francisco , be
≈ 𝑃 sure | San Francisco , be
Counts as a word
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
9 of 30
WeST
Definitions and Markov assumption
 n-gram: Sequence of length n with a count
 E.g.: 5-gram:
If you′re going to San 4
 Sequence naming:
𝑤1
𝑖−1
≔ 𝑤1 𝑤2 … 𝑤𝑖−1
 Markov assumption formalized:
𝑃 𝑤𝑖 𝑤1
𝑖−1
≈ 𝑃 𝑤𝑖 𝑤𝑖−𝑛+1
𝑖−1
n-1 words
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
10 of 30
WeST
Formalizing next word prediction
 Instead of 𝑃(𝑠):
 Only one conditional probability 𝑃 𝑤𝑖 𝑤𝑖−𝑛+1
𝑖−1
• Simplify 𝑃 𝑤𝑖 𝑤𝑖−𝑛+1
𝑖−1
to 𝑃 𝑤 𝑛 𝑤1
𝑛−1
NWP 𝑤1
𝑛−1
= arg max 𝑤 𝑛∈𝑊 𝑃 𝑤 𝑛 𝑤1
𝑛−1
 How to calculate the probability 𝑃 𝑤 𝑛 𝑤1
𝑛−1
?
Set of all words in the corpus
n-1 words n-1 words
Conditional probability with Markov assumption
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
11 of 30
WeST
How to calculate 𝑃(𝑤 𝑛|𝑤1
𝑛−1
)
 The easiest way:
 Maximum likelihood:
𝑃ML 𝑤 𝑛 𝑤1
𝑛−1
=
𝑐(𝑤1
𝑛
)
𝑐(𝑤1
𝑛−1
)
 Example:
𝑃 San | If you′
re going to =
𝑐 If you′re going to San
𝑐 If you′re going to
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
12 of 30
WeST
Content
 Introduction
 Language Models
 Generalized Language Models
 Smoothing
 Progress
 Summary
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
13 of 30
WeST
Intro Generalized Language Models (GLMs)
 Main idea:
 Insert wildcard words (∗) into sequences
 Example:
 Instead of 𝑃 San | If you′re going to :
• 𝑃 San | If ∗ ∗ ∗
• 𝑃 San | If ∗ ∗ to
• 𝑃 San | If ∗ going ∗
• 𝑃 San | If ∗ going to
• 𝑃 San | If you′re ∗ ∗
• …
 Separate different types of GLMs based on:
1. Sequence length
2. Number of wildcard words
 Aggregate results
Length: 5, Wildcard words: 2
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
14 of 30
WeST
Why Generalized Language Models?
 Data sparsity of n-grams
 “If you′re going to San” is seen less often than for example
“If ∗ ∗ to San”
 Question: Does that really improve the prediction?
 Result of evaluation: Yes
… but we should use smoothing for language models
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
15 of 30
WeST
Content
 Introduction
 Language Models
 Generalized Language Models
 Smoothing
 Progress
 Summary
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
16 of 30
WeST
Smoothing
 Problem: Unseen sequences
 Try to estimate probabilities of unseen sequences
 Probabilities of seen sequences need to be reduced
 Two approaches:
1. Backoff smoothing
2. Interpolation smoothing
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
17 of 30
WeST
Backoff smoothing
 If sequence unseen: use shorter sequence
 E.g.: if 𝑃 San | going to = 0 use 𝑃 San | to
𝑃𝑏𝑎𝑐𝑘 𝑤 𝑛 𝑤𝑖
𝑛−1
=
𝜏 𝑤 𝑛 𝑤𝑖
𝑛−1
𝑖𝑓 𝑐 𝑤𝑖
𝑛
> 0
𝛾 ∗ 𝑃𝑏𝑎𝑐𝑘 𝑤 𝑛 𝑤𝑖+1
𝑛−1
𝑖𝑓 𝑐 𝑤𝑖
𝑛
= 0
Weight Lower order
probability (recursive)
Higher order
probability
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
18 of 30
WeST
Interpolated Smoothing
 Always use shorter sequence for calculation
𝑃𝑖𝑛𝑡𝑒𝑟 𝑤 𝑛 𝑤𝑖
𝑛−1
= 𝜏 𝑤 𝑛 𝑤𝑖
𝑛−1
+ 𝛾 ∗ 𝑃𝑖𝑛𝑡𝑒𝑟 𝑤 𝑛 𝑤𝑖+1
𝑛−1
 Seems to work better than backoff smoothing
Higher order
probability
Weight Lower order
probability (recursive)
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
19 of 30
WeST
Kneser-Ney smoothing [KN95] intro
 Interpolated smoothing
 Idea: Improve lower order calculation
 Example: Word visiting unseen in corpus
𝑃 Francisco | visiting = 0
 Normal interpolation: 0 + γ ∗ 𝑃 Francisco
𝑃 San | visiting = 0
 Normal interpolation: 0 + γ ∗ 𝑃 San
Result: Francisco is as likely as San at that position
Is that correct?
 Difference between Francisco and San?
Answer: Number of different contexts
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
20 of 30
WeST
Kneser-Ney smoothing idea
 For lower order calculation:
 Don’t use 𝑐 𝑤 𝑛
 Instead: Number of different bigrams the word completes:
𝑁1+ • 𝑤 𝑛 ≔ 𝑤 𝑛−1: 𝑐 𝑤 𝑛−1
𝑛
> 0
 Or in general:
𝑁1+ • 𝑤𝑖+1
𝑛
= 𝑤𝑖: 𝑐 𝑤𝑖
𝑛
> 0
 In addition:
 𝑁1+ • 𝑤𝑖+1
𝑛−1
• = 𝑤 𝑛
𝑁1+ • 𝑤𝑖+1
𝑛
 𝑁1+ 𝑤𝑖
𝑛−1
• = 𝑤 𝑛: 𝑐 𝑤𝑖
𝑛
> 0
Count
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
21 of 30
WeST
Kneser-Ney smoothing equation (highest)
 Highest order calculation:
𝑃KN 𝑤 𝑛 𝑤𝑖
𝑛−1
=
max{𝑐 𝑤𝑖
𝑛
− 𝐷, 0}
𝑐 𝑤𝑖
𝑛−1
+
𝐷
𝑐 𝑤𝑖
𝑛−1
𝑁1+ 𝑤𝑖
𝑛−1
• 𝑃KN 𝑤 𝑛 𝑤𝑖+1
𝑛−1
count
Total counts
Assure positive value
Discount value
0 ≤ 𝐷 ≤ 1
Lower order probability
(recursion)
Lower order weight
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
22 of 30
WeST
Kneser-Ney smoothing equation
 Lower order calculation:
𝑃KN 𝑤 𝑛 𝑤𝑖
𝑛−1
=
max{𝑁1+ • 𝑤𝑖
𝑛
− 𝐷, 0}
𝑁1+ • 𝑤𝑖
𝑛−1
•
+
𝐷
𝑁1+ • 𝑤𝑖
𝑛−1
•
𝑁1+ 𝑤𝑖
𝑛−1
• 𝑃KN 𝑤 𝑛 𝑤𝑖+1
𝑛−1
 Lowest order calculation: 𝑃KN 𝑤 𝑛 =
𝑁1+ •𝑤𝑖
𝑛
𝑁1+ •𝑤𝑖
𝑛−1•
Continuation count
Total continuation counts
Assure positive value
Discount value
Lower order probability
(recursion)
Lower order weight
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
23 of 30
WeST
Modified Kneser-Ney smoothing [CG98]
 Different discount values for different absolute counts
 Lower order calculation:
𝑃KN 𝑤 𝑛 𝑤𝑖
𝑛−1
=
max{𝑁1+ • 𝑤𝑖
𝑛
− 𝐷(𝑐 𝑤𝑖
𝑛
), 0}
𝑁1+ • 𝑤𝑖
𝑛−1
•
+
𝐷1 𝑁1 𝑤𝑖
𝑛−1
• + 𝐷2 𝑁2 𝑤𝑖
𝑛−1
• + 𝐷3+ 𝑁3+ 𝑤𝑖
𝑛−1
•
𝑁1+ • 𝑤𝑖
𝑛−1
•
𝑃KN 𝑤 𝑛 𝑤𝑖+1
𝑛−1
 State of the art (since 15 years!)
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
24 of 30
WeST
Smoothing of GLMs
 We can use all smoothing techniques on GLMs as well!
 Small modification:
E.g: 𝑃 San | If ∗ going ∗
Lower order sequence :
– Normally: 𝑃 San | ∗ going ∗
– Instead use 𝑃 San | going ∗
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
25 of 30
WeST
Content
 Introduction
 Language Models
 Generalized Language Models
 Smoothing
 Progress
 Summary
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
26 of 30
WeST
Progress
 Done Yet:
 Extract text from XML files
 Building GLMs
 Kneser-Ney and modified Kneser-Ney smoothing
 Indexing with MySQL
 ToDo’s
 Finish evaluation program
 Run evaluation
 Analyze results
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
27 of 30
WeST
Content
 Introduction
 Language Models
 Generalized Language Models
 Smoothing
 Progress
 Summary
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
28 of 30
WeST
Summary
Data Sets Language Models Smoothing
• More Data
• Better Data
• Katz
• Good-Turing
• Witten-Bell
• Kneser-Ney
• …
• n-grams
• Generalized
Language
Models
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
29 of 30
WeST
Thank you for your attention!
Questions?
Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
30 of 30
WeST
Sources
 Images:
 Wheelchair Joystick (Slide 4):
http://i01.i.aliimg.com/img/pb/741/422/527/527422741_355.jpg
 Smartphone Keyboard (Slide 4):
https://activecaptain.com/articles/mobilePhones/iPhone/iPhone_Keyboard.jpg
 References:
 [CG98]: Stanley Chen and Joshua Goodman. An empirical study of smoothing
techniques for language modeling. Technical report, Technical Report TR-10-
98, Harvard University, August, 1998.
 [JM80]: F. Jelinek and R.L. Mercer. Interpolated estimation of markov source
parameters from sparse data. In Proceedings of the Workshop on Pattern
Recognition in Practice, pages 381–397, 1980.
 [KN95]: Reinhard Kneser and Hermann Ney. Improved backing-off for m-gram
language modeling. In Acoustics, Speech, and Signal Processing, 1995.
ICASSP-95., 1995 International Conference on, volume 1, pages 181–184.
IEEE, 1995.

Weitere ähnliche Inhalte

Was ist angesagt?

Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksSequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksNguyen Quang
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network Yan Xu
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERTshaurya uppal
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Alexandros Karatzoglou
 
Representation learning on graphs
Representation learning on graphsRepresentation learning on graphs
Representation learning on graphsDeakin University
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Dev Sahu
 
[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路台灣資料科學年會
 
Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...Thang Nguyen
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoSeongwon Hwang
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
Machine learning and_neural_network_lecture_slide_ece_dku
Machine learning and_neural_network_lecture_slide_ece_dkuMachine learning and_neural_network_lecture_slide_ece_dku
Machine learning and_neural_network_lecture_slide_ece_dkuSeokhyun Yoon
 
Lecture 6
Lecture 6Lecture 6
Lecture 6hunglq
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Jeong-Gwan Lee
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer modelsDing Li
 
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...Heiko Paulheim
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNNShuai Zhang
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningMohamed Loey
 

Was ist angesagt? (20)

Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksSequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural Networks
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
GoogLeNet Insights
GoogLeNet InsightsGoogLeNet Insights
GoogLeNet Insights
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
 
Representation learning on graphs
Representation learning on graphsRepresentation learning on graphs
Representation learning on graphs
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路
 
Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in Theano
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Machine learning and_neural_network_lecture_slide_ece_dku
Machine learning and_neural_network_lecture_slide_ece_dkuMachine learning and_neural_network_lecture_slide_ece_dku
Machine learning and_neural_network_lecture_slide_ece_dku
 
Lecture 6
Lecture 6Lecture 6
Lecture 6
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 

Ähnlich wie Introduction to Kneser-Ney Smoothing on Top of Generalized Language Models for Next Word Prediction

OUTDATED Text Mining 2/5: Language Modeling
OUTDATED Text Mining 2/5: Language ModelingOUTDATED Text Mining 2/5: Language Modeling
OUTDATED Text Mining 2/5: Language ModelingFlorian Leitner
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2ananth
 
A guide for teachers – Years 11 and 121 23
A guide for teachers – Years 11 and 121  23 A guide for teachers – Years 11 and 121  23
A guide for teachers – Years 11 and 121 23 mecklenburgstrelitzh
 
A guide for teachers – Years 11 and 121 23 .docx
A guide for teachers – Years 11 and 121  23 .docxA guide for teachers – Years 11 and 121  23 .docx
A guide for teachers – Years 11 and 121 23 .docxmakdul
 
Unsupervised sentence-embeddings by manifold approximation and projection
Unsupervised sentence-embeddings by manifold approximation and projectionUnsupervised sentence-embeddings by manifold approximation and projection
Unsupervised sentence-embeddings by manifold approximation and projectionDeep Kayal
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.pptmilkesa13
 
A note on word embedding
A note on word embeddingA note on word embedding
A note on word embeddingKhang Pham
 

Ähnlich wie Introduction to Kneser-Ney Smoothing on Top of Generalized Language Models for Next Word Prediction (8)

OUTDATED Text Mining 2/5: Language Modeling
OUTDATED Text Mining 2/5: Language ModelingOUTDATED Text Mining 2/5: Language Modeling
OUTDATED Text Mining 2/5: Language Modeling
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2
 
A guide for teachers – Years 11 and 121 23
A guide for teachers – Years 11 and 121  23 A guide for teachers – Years 11 and 121  23
A guide for teachers – Years 11 and 121 23
 
A guide for teachers – Years 11 and 121 23 .docx
A guide for teachers – Years 11 and 121  23 .docxA guide for teachers – Years 11 and 121  23 .docx
A guide for teachers – Years 11 and 121 23 .docx
 
Unsupervised sentence-embeddings by manifold approximation and projection
Unsupervised sentence-embeddings by manifold approximation and projectionUnsupervised sentence-embeddings by manifold approximation and projection
Unsupervised sentence-embeddings by manifold approximation and projection
 
class23.ppt
class23.pptclass23.ppt
class23.ppt
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
 
A note on word embedding
A note on word embeddingA note on word embedding
A note on word embedding
 

Kürzlich hochgeladen

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Kürzlich hochgeladen (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Introduction to Kneser-Ney Smoothing on Top of Generalized Language Models for Next Word Prediction

  • 1. Web Science & Technologies University of Koblenz ▪ Landau, Germany Introduction to Kneser-Ney Smoothing on Top of Generalized Language Models for Next Word Prediction Martin Körner Oberseminar 25.07.2013
  • 2. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 2 of 30 WeST Content  Introduction  Language Models  Generalized Language Models  Smoothing  Progress  Summary
  • 3. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 3 of 30 WeST Content  Introduction  Language Models  Generalized Language Models  Smoothing  Progress  Summary
  • 4. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 4 of 30 WeST Introduction: Motivation  Next word prediction: What is the next word a user will type?  Use cases for next word prediction:  Augmentative and Alternative Communication (AAC)  Small keyboards (Smartphones)
  • 5. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 5 of 30 WeST Introduction to next word prediction  How do we predict words? 1. Rationalist approach • Manually encoding information about language • “Toy” problems only 2. Empiricist approach • Statistical, pattern recognition, and machine learning methods applied on corpora • Result: Language models
  • 6. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 6 of 30 WeST Content  Introduction  Language Models  Generalized Language Models  Smoothing  Progress  Summary
  • 7. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 7 of 30 WeST Language models in general  Language model: How likely is a sentence 𝑠?  Probability distribution: 𝑃 𝑠  Calculate 𝑃 𝑠 by multiplying conditional probabilities  Example: 𝑃 If you′ re going to San Francisco , be sure … = 𝑃 you′ re | If ∗ 𝑃 going | If you′ re ∗ 𝑃 to | If you′ re going ∗ 𝑃 San | If you′ re going to ∗ 𝑃 Francisco | If you′ re going to San ∗ ⋯  Empirical approach would fail
  • 8. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 8 of 30 WeST Conditional probabilities simplified  Markov assumption [JM80]:  Only the last n-1 words are relevant for a prediction  Example with n=5: 𝑃 sure | If you′re going to San Francisco , be ≈ 𝑃 sure | San Francisco , be Counts as a word
  • 9. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 9 of 30 WeST Definitions and Markov assumption  n-gram: Sequence of length n with a count  E.g.: 5-gram: If you′re going to San 4  Sequence naming: 𝑤1 𝑖−1 ≔ 𝑤1 𝑤2 … 𝑤𝑖−1  Markov assumption formalized: 𝑃 𝑤𝑖 𝑤1 𝑖−1 ≈ 𝑃 𝑤𝑖 𝑤𝑖−𝑛+1 𝑖−1 n-1 words
  • 10. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 10 of 30 WeST Formalizing next word prediction  Instead of 𝑃(𝑠):  Only one conditional probability 𝑃 𝑤𝑖 𝑤𝑖−𝑛+1 𝑖−1 • Simplify 𝑃 𝑤𝑖 𝑤𝑖−𝑛+1 𝑖−1 to 𝑃 𝑤 𝑛 𝑤1 𝑛−1 NWP 𝑤1 𝑛−1 = arg max 𝑤 𝑛∈𝑊 𝑃 𝑤 𝑛 𝑤1 𝑛−1  How to calculate the probability 𝑃 𝑤 𝑛 𝑤1 𝑛−1 ? Set of all words in the corpus n-1 words n-1 words Conditional probability with Markov assumption
  • 11. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 11 of 30 WeST How to calculate 𝑃(𝑤 𝑛|𝑤1 𝑛−1 )  The easiest way:  Maximum likelihood: 𝑃ML 𝑤 𝑛 𝑤1 𝑛−1 = 𝑐(𝑤1 𝑛 ) 𝑐(𝑤1 𝑛−1 )  Example: 𝑃 San | If you′ re going to = 𝑐 If you′re going to San 𝑐 If you′re going to
  • 12. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 12 of 30 WeST Content  Introduction  Language Models  Generalized Language Models  Smoothing  Progress  Summary
  • 13. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 13 of 30 WeST Intro Generalized Language Models (GLMs)  Main idea:  Insert wildcard words (∗) into sequences  Example:  Instead of 𝑃 San | If you′re going to : • 𝑃 San | If ∗ ∗ ∗ • 𝑃 San | If ∗ ∗ to • 𝑃 San | If ∗ going ∗ • 𝑃 San | If ∗ going to • 𝑃 San | If you′re ∗ ∗ • …  Separate different types of GLMs based on: 1. Sequence length 2. Number of wildcard words  Aggregate results Length: 5, Wildcard words: 2
  • 14. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 14 of 30 WeST Why Generalized Language Models?  Data sparsity of n-grams  “If you′re going to San” is seen less often than for example “If ∗ ∗ to San”  Question: Does that really improve the prediction?  Result of evaluation: Yes … but we should use smoothing for language models
  • 15. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 15 of 30 WeST Content  Introduction  Language Models  Generalized Language Models  Smoothing  Progress  Summary
  • 16. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 16 of 30 WeST Smoothing  Problem: Unseen sequences  Try to estimate probabilities of unseen sequences  Probabilities of seen sequences need to be reduced  Two approaches: 1. Backoff smoothing 2. Interpolation smoothing
  • 17. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 17 of 30 WeST Backoff smoothing  If sequence unseen: use shorter sequence  E.g.: if 𝑃 San | going to = 0 use 𝑃 San | to 𝑃𝑏𝑎𝑐𝑘 𝑤 𝑛 𝑤𝑖 𝑛−1 = 𝜏 𝑤 𝑛 𝑤𝑖 𝑛−1 𝑖𝑓 𝑐 𝑤𝑖 𝑛 > 0 𝛾 ∗ 𝑃𝑏𝑎𝑐𝑘 𝑤 𝑛 𝑤𝑖+1 𝑛−1 𝑖𝑓 𝑐 𝑤𝑖 𝑛 = 0 Weight Lower order probability (recursive) Higher order probability
  • 18. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 18 of 30 WeST Interpolated Smoothing  Always use shorter sequence for calculation 𝑃𝑖𝑛𝑡𝑒𝑟 𝑤 𝑛 𝑤𝑖 𝑛−1 = 𝜏 𝑤 𝑛 𝑤𝑖 𝑛−1 + 𝛾 ∗ 𝑃𝑖𝑛𝑡𝑒𝑟 𝑤 𝑛 𝑤𝑖+1 𝑛−1  Seems to work better than backoff smoothing Higher order probability Weight Lower order probability (recursive)
  • 19. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 19 of 30 WeST Kneser-Ney smoothing [KN95] intro  Interpolated smoothing  Idea: Improve lower order calculation  Example: Word visiting unseen in corpus 𝑃 Francisco | visiting = 0  Normal interpolation: 0 + γ ∗ 𝑃 Francisco 𝑃 San | visiting = 0  Normal interpolation: 0 + γ ∗ 𝑃 San Result: Francisco is as likely as San at that position Is that correct?  Difference between Francisco and San? Answer: Number of different contexts
  • 20. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 20 of 30 WeST Kneser-Ney smoothing idea  For lower order calculation:  Don’t use 𝑐 𝑤 𝑛  Instead: Number of different bigrams the word completes: 𝑁1+ • 𝑤 𝑛 ≔ 𝑤 𝑛−1: 𝑐 𝑤 𝑛−1 𝑛 > 0  Or in general: 𝑁1+ • 𝑤𝑖+1 𝑛 = 𝑤𝑖: 𝑐 𝑤𝑖 𝑛 > 0  In addition:  𝑁1+ • 𝑤𝑖+1 𝑛−1 • = 𝑤 𝑛 𝑁1+ • 𝑤𝑖+1 𝑛  𝑁1+ 𝑤𝑖 𝑛−1 • = 𝑤 𝑛: 𝑐 𝑤𝑖 𝑛 > 0 Count
  • 21. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 21 of 30 WeST Kneser-Ney smoothing equation (highest)  Highest order calculation: 𝑃KN 𝑤 𝑛 𝑤𝑖 𝑛−1 = max{𝑐 𝑤𝑖 𝑛 − 𝐷, 0} 𝑐 𝑤𝑖 𝑛−1 + 𝐷 𝑐 𝑤𝑖 𝑛−1 𝑁1+ 𝑤𝑖 𝑛−1 • 𝑃KN 𝑤 𝑛 𝑤𝑖+1 𝑛−1 count Total counts Assure positive value Discount value 0 ≤ 𝐷 ≤ 1 Lower order probability (recursion) Lower order weight
  • 22. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 22 of 30 WeST Kneser-Ney smoothing equation  Lower order calculation: 𝑃KN 𝑤 𝑛 𝑤𝑖 𝑛−1 = max{𝑁1+ • 𝑤𝑖 𝑛 − 𝐷, 0} 𝑁1+ • 𝑤𝑖 𝑛−1 • + 𝐷 𝑁1+ • 𝑤𝑖 𝑛−1 • 𝑁1+ 𝑤𝑖 𝑛−1 • 𝑃KN 𝑤 𝑛 𝑤𝑖+1 𝑛−1  Lowest order calculation: 𝑃KN 𝑤 𝑛 = 𝑁1+ •𝑤𝑖 𝑛 𝑁1+ •𝑤𝑖 𝑛−1• Continuation count Total continuation counts Assure positive value Discount value Lower order probability (recursion) Lower order weight
  • 23. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 23 of 30 WeST Modified Kneser-Ney smoothing [CG98]  Different discount values for different absolute counts  Lower order calculation: 𝑃KN 𝑤 𝑛 𝑤𝑖 𝑛−1 = max{𝑁1+ • 𝑤𝑖 𝑛 − 𝐷(𝑐 𝑤𝑖 𝑛 ), 0} 𝑁1+ • 𝑤𝑖 𝑛−1 • + 𝐷1 𝑁1 𝑤𝑖 𝑛−1 • + 𝐷2 𝑁2 𝑤𝑖 𝑛−1 • + 𝐷3+ 𝑁3+ 𝑤𝑖 𝑛−1 • 𝑁1+ • 𝑤𝑖 𝑛−1 • 𝑃KN 𝑤 𝑛 𝑤𝑖+1 𝑛−1  State of the art (since 15 years!)
  • 24. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 24 of 30 WeST Smoothing of GLMs  We can use all smoothing techniques on GLMs as well!  Small modification: E.g: 𝑃 San | If ∗ going ∗ Lower order sequence : – Normally: 𝑃 San | ∗ going ∗ – Instead use 𝑃 San | going ∗
  • 25. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 25 of 30 WeST Content  Introduction  Language Models  Generalized Language Models  Smoothing  Progress  Summary
  • 26. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 26 of 30 WeST Progress  Done Yet:  Extract text from XML files  Building GLMs  Kneser-Ney and modified Kneser-Ney smoothing  Indexing with MySQL  ToDo’s  Finish evaluation program  Run evaluation  Analyze results
  • 27. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 27 of 30 WeST Content  Introduction  Language Models  Generalized Language Models  Smoothing  Progress  Summary
  • 28. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 28 of 30 WeST Summary Data Sets Language Models Smoothing • More Data • Better Data • Katz • Good-Turing • Witten-Bell • Kneser-Ney • … • n-grams • Generalized Language Models
  • 29. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 29 of 30 WeST Thank you for your attention! Questions?
  • 30. Martin Körner mkoerner@uni-koblenz.de Oberseminar 25.07.2013 30 of 30 WeST Sources  Images:  Wheelchair Joystick (Slide 4): http://i01.i.aliimg.com/img/pb/741/422/527/527422741_355.jpg  Smartphone Keyboard (Slide 4): https://activecaptain.com/articles/mobilePhones/iPhone/iPhone_Keyboard.jpg  References:  [CG98]: Stanley Chen and Joshua Goodman. An empirical study of smoothing techniques for language modeling. Technical report, Technical Report TR-10- 98, Harvard University, August, 1998.  [JM80]: F. Jelinek and R.L. Mercer. Interpolated estimation of markov source parameters from sparse data. In Proceedings of the Workshop on Pattern Recognition in Practice, pages 381–397, 1980.  [KN95]: Reinhard Kneser and Hermann Ney. Improved backing-off for m-gram language modeling. In Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on, volume 1, pages 181–184. IEEE, 1995.