SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Downloaden Sie, um offline zu lesen
[Zhang+ ACL2014] Kneser-Ney Smoothing on Expected Count
[Pickhardt+ ACL2014] A Generalized Language Model as the
Comination of Skipped n-grams and Modified Kneser-Ney Smoothing
2014/7/12 ACL Reading @ PFI
Nakatani Shuyo, Cybozu Labs Inc.
Kneser-Ney Smoothing
[Kneser+ 1995]
• Discounting & Interpolation
𝑃 𝑤𝑖 𝑤𝑖−𝑛+1
𝑖−1
=
max 𝑐 𝑤𝑖−𝑛+1
𝑖
− 𝐷, 0
𝑐 𝑤𝑖−𝑛+1
𝑖−1
+
𝐷
𝑐 𝑤𝑖−𝑛+1
𝑖−1
𝑁1+ 𝑤𝑖−𝑛+1
𝑖−1
∙ 𝑃 𝑤𝑖 𝑤𝑖−𝑛+2
𝑖−1
• where
𝑤 𝑚
𝑛 = 𝑤 𝑚 ⋯ 𝑤 𝑛, 𝑁1+ 𝑤 𝑚
𝑛 ⋅ = 𝑤𝑖|𝑐 𝑤 𝑚
𝑛 𝑤𝑖 > 0
Number of
Discounting
Modified KN-Smoothing
[Chen+ 1999]
𝑃 𝑤𝑖 𝑤𝑖−𝑛+1
𝑖−1
=
𝑐 𝑤𝑖−𝑛+1
𝑖
− 𝐷 𝑤𝑖−𝑛+1
𝑖
𝑐 𝑤𝑖−𝑛+1
𝑖−1
+ 𝛾 𝑤𝑖−𝑛+1
𝑖−1
𝑃 𝑤𝑖 𝑤𝑖−𝑛+2
𝑖−1
• where 𝐷 𝑐 = 0 if 𝑐 = 0,
𝐷1 if 𝑐 = 1, 𝐷2 if 𝑐 = 2, _ 𝐷3+ if 𝑐 ≥ 3
𝛾 𝑤𝑖−𝑛+1
𝑖−1
=
[amount of discounting]
𝑐 𝑤𝑖−𝑛+1
𝑖−1
Weighted Discounting
(D_n are estimated by leave-1-out CV)
[Zhang+ ACL2014] Kneser-Ney
Smoothing on Expected Count
• When each sentence has fractional
weight
– Domain adaptation
– EM-algorithm on word alignment
• Propose KN-smoothing using expected
fractional counts
I’m interested in it!
Model
• 𝒖 means 𝑤𝑖−𝑛+1
𝑖−1
, and 𝒖′ means 𝑤𝑖−𝑛+2
𝑖−1
• A sequence 𝒖𝑤 occurs 𝑘 times and each
occurring has probability 𝑝𝑖
(𝑖 = 1, ⋯ , 𝑘) as weight,
• then count 𝑐(𝒖𝑤) is distributed according to
Poisson Binomial Distribution.
• 𝑝 𝑐 𝑢𝑤 = 𝑟 = 𝑠 𝑘, 𝑟 , where
𝑠 𝑘, 𝑟 =
𝑠 𝑘 − 1, 𝑟 1 − 𝑝 𝑘
+ 𝑠 𝑘 − 1, 𝑟 − 1 𝑝 𝑘
if 0 ≤ 𝑟 ≤ 𝑘
1 if 𝑘 = 𝑟 = 0
0 otherwise
MLE on this model
• Expectations
– 𝔼 𝑐 𝒖𝑤 = 𝑟 ⋅ 𝑝 𝑐 𝒖𝑤 = 𝑟𝑟
– 𝔼 𝑁𝑟 𝒖 ⋅ = 𝑝 𝑐 𝒖𝑤 = 𝑟𝑤
– 𝔼 𝑁𝑟+ 𝒖 ⋅ = 𝑝 𝑐 𝒖𝑤 ≥ 𝑟𝑤
• Maximize (expected) likelihood
– 𝔼 𝐿 = 𝔼 𝑐 𝒖𝑤 log 𝑝 𝑤 𝒖𝒖𝑤
= 𝔼 𝑐 𝒖𝑤 log 𝑝 𝑤 𝒖𝒖𝑤
– obtain 𝑝MLE 𝑤 𝒖 =
𝔼 𝑐 𝒖𝑤
𝔼 𝑐 𝒖⋅
Expected Kneser-Ney
• 𝑐 𝒖𝑤 =
max 0, 𝑐 𝒖𝑤 − 𝐷 + 𝑁1+ 𝒖 ⋅ 𝐷𝑝′(𝑤|𝒖′
)
• So, 𝔼 𝑐 𝒖𝑤 = 𝔼 𝑐 𝒖𝑤 − 𝑝 𝑐 𝒖𝑤 > 0 𝐷 +
𝔼 𝑁1+ 𝒖 ⋅ 𝐷𝑝′(𝑤|𝒖′
)
– where 𝑝′ 𝑤 𝒖′
=
𝔼 𝑁1+ ⋅𝒖′ 𝑤
𝔼 𝑁1+ ⋅𝒖′⋅
• then 𝑝 𝑤 𝒖 =
𝔼 𝑐 𝒖𝑤
𝔼 𝑐 𝒖⋅
Language model adaptation
• Our corpus consists on
– large general-domain data and
– small specific domain data
• Sentence 𝒘 ‘s weight:
– 𝑝 𝒘 is in − domain =
1
1+exp −𝐻 𝒘
– where 𝐻 𝒘 =
log 𝑝in 𝒘 −log 𝑝out 𝒘
𝒘
,
– 𝑝in:lang. model of in-domain, 𝑝out: out’s one
• Figure 1: On the language model adaptation task, expected KN outperforms all
other methods across all sizes of selected subsets. Integral KN is applied to
unweighted instances, while fractional WB, fractional KN and expected KN are
applied to weighted instances. (via [Zhang+ ACL2014])
from general-domain data
in-domain data
- training: 54k
- testing: 3k
192
162
156
148
Why isn't there
Modified KN as a
baseline?
[Pickhardt+ ACL2014] A Generalized Language Model
as the Comination of Skipped n-grams
and Modified Kneser-Ney Smoothing
• Higher-order n-grams are very sparse
– Especially remarkable on small data(e.g.
domain specific data!)
• Improve performance for small data
by skipped n-grams and Modified KN-
smoothing
– Perplexity reduces 25.7% for very small
training data of only 736KB text
“Generalized Language Models”
• 𝜕3 𝑤1 𝑤2 𝑤3 𝑤4 = 𝑤1 𝑤2_𝑤4
– “_” means a word placeholder
𝑃GLM 𝑤𝑖 𝑤𝑖−𝑛+1
𝑖−1
=
𝑐 𝑤𝑖−𝑛+1
𝑖
− 𝐷 𝑐 𝑤𝑖−𝑛+1
𝑖
𝑐 𝑤𝑖−𝑛+1
𝑖−1
+𝛾high 𝑤𝑖−𝑛+1
𝑖−1 1
𝑛 − 1
𝑃GLM
𝑛−1
𝑗=1
𝑤𝑖 𝜕𝑗 𝑤𝑖−𝑛+1
𝑖−1
𝑃GLM 𝑤𝑖 𝜕𝑗 𝑤𝑖−𝑛+1
𝑖−1
=
𝑁1+ 𝜕𝑗 𝑤𝑖−𝑛
𝑖
− 𝐷 𝑐 𝜕𝑗 𝑤𝑖−𝑛+1
𝑖
𝑁1+ 𝜕𝑗 𝑤𝑖−𝑛+1
𝑖−1
∗
+𝛾mid 𝜕𝑗 𝑤𝑖−𝑛+1
𝑖−1 1
𝑛 − 2
𝑃GLM 𝑤𝑖 𝜕𝑗 𝜕 𝑘 𝑤𝑖−𝑛+1
𝑖−1
𝑛−1
𝑘=1,𝑘≠𝑗
• The bold arrows correspond to interpolation of models in traditional
modified Kneser-Ney smoothing. The lighter arrows illustrate the
additional interpolations introduced by our generalized language
models. (via [Pickhardt+ ACL2014])
• shrunk training data
sets for the English
Wikipedia
small domain
specific data
Space Complexity
model size = 9.5GB
# of entries = 427M
model size = 15GB
# of entries = 742M
References
• [Zhang+ ACL2014] Kneser-Ney Smoothing
on Expected Count
• [Pickhardt+ ACL2014] A Generalized
Language Model as the Comination of
Skipped n-grams and Modified Kneser-Ney
Smoothing
• [Kneser+ 1995] Improved backing-off for m-
gram language modeling
• [Chen+ 1999] An Empirical Study of
Smoothing Techniques for Language Modeling

Weitere ähnliche Inhalte

Andere mochten auch

ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014Shuyo Nakatani
 
猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測Shuyo Nakatani
 
ドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoRドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoRShuyo Nakatani
 
人工知能と機械学習の違いって?
人工知能と機械学習の違いって?人工知能と機械学習の違いって?
人工知能と機械学習の違いって?Shuyo Nakatani
 
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5Shuyo Nakatani
 
KB + Text => Great KB な論文を多読してみた
KB + Text => Great KB な論文を多読してみたKB + Text => Great KB な論文を多読してみた
KB + Text => Great KB な論文を多読してみたKoji Matsuda
 
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013Shuyo Nakatani
 
Acl読み会2014
Acl読み会2014Acl読み会2014
Acl読み会2014tempra28
 
ACL2014読み会:Fast and Robust Neural Network Joint Models for Statistical Machin...
ACL2014読み会:Fast and Robust Neural Network Joint Models for Statistical Machin...ACL2014読み会:Fast and Robust Neural Network Joint Models for Statistical Machin...
ACL2014読み会:Fast and Robust Neural Network Joint Models for Statistical Machin...Hiroyuki TOKUNAGA
 
階層ディリクレ過程事前分布モデルによる画像領域分割
階層ディリクレ過程事前分布モデルによる画像領域分割階層ディリクレ過程事前分布モデルによる画像領域分割
階層ディリクレ過程事前分布モデルによる画像領域分割tn1031
 
ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"nozyh
 
Learning to automatically solve algebra word problems
Learning to automatically solve algebra word problemsLearning to automatically solve algebra word problems
Learning to automatically solve algebra word problemsNaoaki Okazaki
 
ACL読み会2014@PFI "Two Knives Cut Better Than One: Chinese Word Segmentation w...
ACL読み会2014@PFI  "Two Knives Cut Better Than One: Chinese Word Segmentation w...ACL読み会2014@PFI  "Two Knives Cut Better Than One: Chinese Word Segmentation w...
ACL読み会2014@PFI "Two Knives Cut Better Than One: Chinese Word Segmentation w...Preferred Networks
 
数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013Shuyo Nakatani
 
ノンパラベイズ入門の入門
ノンパラベイズ入門の入門ノンパラベイズ入門の入門
ノンパラベイズ入門の入門Shuyo Nakatani
 
Zipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLPZipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLPShuyo Nakatani
 
ACL読み会@PFI “How to make words with vectors: Phrase generation in distributio...
ACL読み会@PFI “How to make words with vectors: Phrase generation in distributio...ACL読み会@PFI “How to make words with vectors: Phrase generation in distributio...
ACL読み会@PFI “How to make words with vectors: Phrase generation in distributio...Yuya Unno
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門Shuyo Nakatani
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationNamHyuk Ahn
 

Andere mochten auch (20)

ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014ソーシャルメディアの多言語判定 #SoC2014
ソーシャルメディアの多言語判定 #SoC2014
 
猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測猫に教えてもらうルベーグ可測
猫に教えてもらうルベーグ可測
 
ドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoRドラえもんでわかる統計的因果推論 #TokyoR
ドラえもんでわかる統計的因果推論 #TokyoR
 
人工知能と機械学習の違いって?
人工知能と機械学習の違いって?人工知能と機械学習の違いって?
人工知能と機械学習の違いって?
 
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5アラビア語とペルシャ語の見分け方 #DSIRNLP 5
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
 
KB + Text => Great KB な論文を多読してみた
KB + Text => Great KB な論文を多読してみたKB + Text => Great KB な論文を多読してみた
KB + Text => Great KB な論文を多読してみた
 
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
 
Acl読み会2014
Acl読み会2014Acl読み会2014
Acl読み会2014
 
ACL2014読み会:Fast and Robust Neural Network Joint Models for Statistical Machin...
ACL2014読み会:Fast and Robust Neural Network Joint Models for Statistical Machin...ACL2014読み会:Fast and Robust Neural Network Joint Models for Statistical Machin...
ACL2014読み会:Fast and Robust Neural Network Joint Models for Statistical Machin...
 
階層ディリクレ過程事前分布モデルによる画像領域分割
階層ディリクレ過程事前分布モデルによる画像領域分割階層ディリクレ過程事前分布モデルによる画像領域分割
階層ディリクレ過程事前分布モデルによる画像領域分割
 
ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"
 
Learning to automatically solve algebra word problems
Learning to automatically solve algebra word problemsLearning to automatically solve algebra word problems
Learning to automatically solve algebra word problems
 
ACL読み会2014@PFI "Two Knives Cut Better Than One: Chinese Word Segmentation w...
ACL読み会2014@PFI  "Two Knives Cut Better Than One: Chinese Word Segmentation w...ACL読み会2014@PFI  "Two Knives Cut Better Than One: Chinese Word Segmentation w...
ACL読み会2014@PFI "Two Knives Cut Better Than One: Chinese Word Segmentation w...
 
数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013
 
ノンパラベイズ入門の入門
ノンパラベイズ入門の入門ノンパラベイズ入門の入門
ノンパラベイズ入門の入門
 
Zipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLPZipf? (ジップ則のひみつ?) #DSIRNLP
Zipf? (ジップ則のひみつ?) #DSIRNLP
 
ACL読み会@PFI “How to make words with vectors: Phrase generation in distributio...
ACL読み会@PFI “How to make words with vectors: Phrase generation in distributio...ACL読み会@PFI “How to make words with vectors: Phrase generation in distributio...
ACL読み会@PFI “How to make words with vectors: Phrase generation in distributio...
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門
 
LDA入門
LDA入門LDA入門
LDA入門
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
 

Ähnlich wie ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickhardt+] "A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing"

Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksPaper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksChenYiHuang5
 
Metrics for generativemodels
Metrics for generativemodelsMetrics for generativemodels
Metrics for generativemodelsDai-Hai Nguyen
 
Deep Learning in Finance
Deep Learning in FinanceDeep Learning in Finance
Deep Learning in FinanceAltoros
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
 
Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Manchor Ko
 
2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptx2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptxZhiwuGuo1
 
Coursera 2week
Coursera  2weekCoursera  2week
Coursera 2weekcsl9496
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machinesJinho Lee
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfsagayalavanya2
 
Encoding Generalized Quantifiers in Dependency-based Compositional Semantics
Encoding Generalized Quantifiers in Dependency-based Compositional SemanticsEncoding Generalized Quantifiers in Dependency-based Compositional Semantics
Encoding Generalized Quantifiers in Dependency-based Compositional SemanticsYubing Dong
 
13Kernel_Machines.pptx
13Kernel_Machines.pptx13Kernel_Machines.pptx
13Kernel_Machines.pptxKarasuLee
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةFares Al-Qunaieer
 
ICCV2013 reading: Learning to rank using privileged information
ICCV2013 reading: Learning to rank using privileged informationICCV2013 reading: Learning to rank using privileged information
ICCV2013 reading: Learning to rank using privileged informationAkisato Kimura
 
Optimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationOptimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationSantiagoGarridoBulln
 
Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]Asafak Husain
 

Ähnlich wie ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickhardt+] "A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing" (20)

QMC: Operator Splitting Workshop, Deeper Look at Deep Learning: A Geometric R...
QMC: Operator Splitting Workshop, Deeper Look at Deep Learning: A Geometric R...QMC: Operator Splitting Workshop, Deeper Look at Deep Learning: A Geometric R...
QMC: Operator Splitting Workshop, Deeper Look at Deep Learning: A Geometric R...
 
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksPaper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
 
NLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram ModelsNLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram Models
 
[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2[Book Reading] 機械翻訳 - Section 5 No.2
[Book Reading] 機械翻訳 - Section 5 No.2
 
Metrics for generativemodels
Metrics for generativemodelsMetrics for generativemodels
Metrics for generativemodels
 
Deep Learning in Finance
Deep Learning in FinanceDeep Learning in Finance
Deep Learning in Finance
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014
 
2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptx2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptx
 
Coursera 2week
Coursera  2weekCoursera  2week
Coursera 2week
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
暗認本読書会6
暗認本読書会6暗認本読書会6
暗認本読書会6
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdf
 
Encoding Generalized Quantifiers in Dependency-based Compositional Semantics
Encoding Generalized Quantifiers in Dependency-based Compositional SemanticsEncoding Generalized Quantifiers in Dependency-based Compositional Semantics
Encoding Generalized Quantifiers in Dependency-based Compositional Semantics
 
13Kernel_Machines.pptx
13Kernel_Machines.pptx13Kernel_Machines.pptx
13Kernel_Machines.pptx
 
Lec05.pptx
Lec05.pptxLec05.pptx
Lec05.pptx
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
 
ICCV2013 reading: Learning to rank using privileged information
ICCV2013 reading: Learning to rank using privileged informationICCV2013 reading: Learning to rank using privileged information
ICCV2013 reading: Learning to rank using privileged information
 
Optimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationOptimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimization
 
Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]
 

Mehr von Shuyo Nakatani

画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15Shuyo Nakatani
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networksShuyo Nakatani
 
無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)Shuyo Nakatani
 
Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)Shuyo Nakatani
 
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoRRとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoRShuyo Nakatani
 
星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章Shuyo Nakatani
 
星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章Shuyo Nakatani
 
言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyoShuyo Nakatani
 
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...Shuyo Nakatani
 
Short Text Language Detection with Infinity-Gram
Short Text Language Detection with Infinity-GramShort Text Language Detection with Infinity-Gram
Short Text Language Detection with Infinity-GramShuyo Nakatani
 
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing SystemsShuyo Nakatani
 
極大部分文字列を使った twitter 言語判定
極大部分文字列を使った twitter 言語判定極大部分文字列を使った twitter 言語判定
極大部分文字列を使った twitter 言語判定Shuyo Nakatani
 
人間言語判別 カタルーニャ語編
人間言語判別 カタルーニャ語編人間言語判別 カタルーニャ語編
人間言語判別 カタルーニャ語編Shuyo Nakatani
 
Extreme Extraction - Machine Reading in a Week
Extreme Extraction - Machine Reading in a WeekExtreme Extraction - Machine Reading in a Week
Extreme Extraction - Machine Reading in a WeekShuyo Nakatani
 
言語判定へのいざない
言語判定へのいざない言語判定へのいざない
言語判定へのいざないShuyo Nakatani
 
∞-gram を使った短文言語判定
∞-gram を使った短文言語判定∞-gram を使った短文言語判定
∞-gram を使った短文言語判定Shuyo Nakatani
 
CRF を使った Web 本文抽出 for WebDB Forum 2011
CRF を使った Web 本文抽出 for WebDB Forum 2011CRF を使った Web 本文抽出 for WebDB Forum 2011
CRF を使った Web 本文抽出 for WebDB Forum 2011Shuyo Nakatani
 
数式をnumpyに落としこむコツ
数式をnumpyに落としこむコツ数式をnumpyに落としこむコツ
数式をnumpyに落としこむコツShuyo Nakatani
 
CRF を使った Web 本文抽出
CRF を使った Web 本文抽出CRF を使った Web 本文抽出
CRF を使った Web 本文抽出Shuyo Nakatani
 

Mehr von Shuyo Nakatani (19)

画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)無限関係モデル (続・わかりやすいパターン認識 13章)
無限関係モデル (続・わかりやすいパターン認識 13章)
 
Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)Memory Networks (End-to-End Memory Networks の Chainer 実装)
Memory Networks (End-to-End Memory Networks の Chainer 実装)
 
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoRRとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
 
星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章
 
星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章
 
言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo
 
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
 
Short Text Language Detection with Infinity-Gram
Short Text Language Detection with Infinity-GramShort Text Language Detection with Infinity-Gram
Short Text Language Detection with Infinity-Gram
 
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
 
極大部分文字列を使った twitter 言語判定
極大部分文字列を使った twitter 言語判定極大部分文字列を使った twitter 言語判定
極大部分文字列を使った twitter 言語判定
 
人間言語判別 カタルーニャ語編
人間言語判別 カタルーニャ語編人間言語判別 カタルーニャ語編
人間言語判別 カタルーニャ語編
 
Extreme Extraction - Machine Reading in a Week
Extreme Extraction - Machine Reading in a WeekExtreme Extraction - Machine Reading in a Week
Extreme Extraction - Machine Reading in a Week
 
言語判定へのいざない
言語判定へのいざない言語判定へのいざない
言語判定へのいざない
 
∞-gram を使った短文言語判定
∞-gram を使った短文言語判定∞-gram を使った短文言語判定
∞-gram を使った短文言語判定
 
CRF を使った Web 本文抽出 for WebDB Forum 2011
CRF を使った Web 本文抽出 for WebDB Forum 2011CRF を使った Web 本文抽出 for WebDB Forum 2011
CRF を使った Web 本文抽出 for WebDB Forum 2011
 
数式をnumpyに落としこむコツ
数式をnumpyに落としこむコツ数式をnumpyに落としこむコツ
数式をnumpyに落としこむコツ
 
CRF を使った Web 本文抽出
CRF を使った Web 本文抽出CRF を使った Web 本文抽出
CRF を使った Web 本文抽出
 

Kürzlich hochgeladen

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Kürzlich hochgeladen (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickhardt+] "A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing"

  • 1. [Zhang+ ACL2014] Kneser-Ney Smoothing on Expected Count [Pickhardt+ ACL2014] A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing 2014/7/12 ACL Reading @ PFI Nakatani Shuyo, Cybozu Labs Inc.
  • 2. Kneser-Ney Smoothing [Kneser+ 1995] • Discounting & Interpolation 𝑃 𝑤𝑖 𝑤𝑖−𝑛+1 𝑖−1 = max 𝑐 𝑤𝑖−𝑛+1 𝑖 − 𝐷, 0 𝑐 𝑤𝑖−𝑛+1 𝑖−1 + 𝐷 𝑐 𝑤𝑖−𝑛+1 𝑖−1 𝑁1+ 𝑤𝑖−𝑛+1 𝑖−1 ∙ 𝑃 𝑤𝑖 𝑤𝑖−𝑛+2 𝑖−1 • where 𝑤 𝑚 𝑛 = 𝑤 𝑚 ⋯ 𝑤 𝑛, 𝑁1+ 𝑤 𝑚 𝑛 ⋅ = 𝑤𝑖|𝑐 𝑤 𝑚 𝑛 𝑤𝑖 > 0 Number of Discounting
  • 3. Modified KN-Smoothing [Chen+ 1999] 𝑃 𝑤𝑖 𝑤𝑖−𝑛+1 𝑖−1 = 𝑐 𝑤𝑖−𝑛+1 𝑖 − 𝐷 𝑤𝑖−𝑛+1 𝑖 𝑐 𝑤𝑖−𝑛+1 𝑖−1 + 𝛾 𝑤𝑖−𝑛+1 𝑖−1 𝑃 𝑤𝑖 𝑤𝑖−𝑛+2 𝑖−1 • where 𝐷 𝑐 = 0 if 𝑐 = 0, 𝐷1 if 𝑐 = 1, 𝐷2 if 𝑐 = 2, _ 𝐷3+ if 𝑐 ≥ 3 𝛾 𝑤𝑖−𝑛+1 𝑖−1 = [amount of discounting] 𝑐 𝑤𝑖−𝑛+1 𝑖−1 Weighted Discounting (D_n are estimated by leave-1-out CV)
  • 4. [Zhang+ ACL2014] Kneser-Ney Smoothing on Expected Count • When each sentence has fractional weight – Domain adaptation – EM-algorithm on word alignment • Propose KN-smoothing using expected fractional counts I’m interested in it!
  • 5. Model • 𝒖 means 𝑤𝑖−𝑛+1 𝑖−1 , and 𝒖′ means 𝑤𝑖−𝑛+2 𝑖−1 • A sequence 𝒖𝑤 occurs 𝑘 times and each occurring has probability 𝑝𝑖 (𝑖 = 1, ⋯ , 𝑘) as weight, • then count 𝑐(𝒖𝑤) is distributed according to Poisson Binomial Distribution. • 𝑝 𝑐 𝑢𝑤 = 𝑟 = 𝑠 𝑘, 𝑟 , where 𝑠 𝑘, 𝑟 = 𝑠 𝑘 − 1, 𝑟 1 − 𝑝 𝑘 + 𝑠 𝑘 − 1, 𝑟 − 1 𝑝 𝑘 if 0 ≤ 𝑟 ≤ 𝑘 1 if 𝑘 = 𝑟 = 0 0 otherwise
  • 6. MLE on this model • Expectations – 𝔼 𝑐 𝒖𝑤 = 𝑟 ⋅ 𝑝 𝑐 𝒖𝑤 = 𝑟𝑟 – 𝔼 𝑁𝑟 𝒖 ⋅ = 𝑝 𝑐 𝒖𝑤 = 𝑟𝑤 – 𝔼 𝑁𝑟+ 𝒖 ⋅ = 𝑝 𝑐 𝒖𝑤 ≥ 𝑟𝑤 • Maximize (expected) likelihood – 𝔼 𝐿 = 𝔼 𝑐 𝒖𝑤 log 𝑝 𝑤 𝒖𝒖𝑤 = 𝔼 𝑐 𝒖𝑤 log 𝑝 𝑤 𝒖𝒖𝑤 – obtain 𝑝MLE 𝑤 𝒖 = 𝔼 𝑐 𝒖𝑤 𝔼 𝑐 𝒖⋅
  • 7. Expected Kneser-Ney • 𝑐 𝒖𝑤 = max 0, 𝑐 𝒖𝑤 − 𝐷 + 𝑁1+ 𝒖 ⋅ 𝐷𝑝′(𝑤|𝒖′ ) • So, 𝔼 𝑐 𝒖𝑤 = 𝔼 𝑐 𝒖𝑤 − 𝑝 𝑐 𝒖𝑤 > 0 𝐷 + 𝔼 𝑁1+ 𝒖 ⋅ 𝐷𝑝′(𝑤|𝒖′ ) – where 𝑝′ 𝑤 𝒖′ = 𝔼 𝑁1+ ⋅𝒖′ 𝑤 𝔼 𝑁1+ ⋅𝒖′⋅ • then 𝑝 𝑤 𝒖 = 𝔼 𝑐 𝒖𝑤 𝔼 𝑐 𝒖⋅
  • 8. Language model adaptation • Our corpus consists on – large general-domain data and – small specific domain data • Sentence 𝒘 ‘s weight: – 𝑝 𝒘 is in − domain = 1 1+exp −𝐻 𝒘 – where 𝐻 𝒘 = log 𝑝in 𝒘 −log 𝑝out 𝒘 𝒘 , – 𝑝in:lang. model of in-domain, 𝑝out: out’s one
  • 9. • Figure 1: On the language model adaptation task, expected KN outperforms all other methods across all sizes of selected subsets. Integral KN is applied to unweighted instances, while fractional WB, fractional KN and expected KN are applied to weighted instances. (via [Zhang+ ACL2014]) from general-domain data in-domain data - training: 54k - testing: 3k 192 162 156 148 Why isn't there Modified KN as a baseline?
  • 10. [Pickhardt+ ACL2014] A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing • Higher-order n-grams are very sparse – Especially remarkable on small data(e.g. domain specific data!) • Improve performance for small data by skipped n-grams and Modified KN- smoothing – Perplexity reduces 25.7% for very small training data of only 736KB text
  • 11. “Generalized Language Models” • 𝜕3 𝑤1 𝑤2 𝑤3 𝑤4 = 𝑤1 𝑤2_𝑤4 – “_” means a word placeholder 𝑃GLM 𝑤𝑖 𝑤𝑖−𝑛+1 𝑖−1 = 𝑐 𝑤𝑖−𝑛+1 𝑖 − 𝐷 𝑐 𝑤𝑖−𝑛+1 𝑖 𝑐 𝑤𝑖−𝑛+1 𝑖−1 +𝛾high 𝑤𝑖−𝑛+1 𝑖−1 1 𝑛 − 1 𝑃GLM 𝑛−1 𝑗=1 𝑤𝑖 𝜕𝑗 𝑤𝑖−𝑛+1 𝑖−1 𝑃GLM 𝑤𝑖 𝜕𝑗 𝑤𝑖−𝑛+1 𝑖−1 = 𝑁1+ 𝜕𝑗 𝑤𝑖−𝑛 𝑖 − 𝐷 𝑐 𝜕𝑗 𝑤𝑖−𝑛+1 𝑖 𝑁1+ 𝜕𝑗 𝑤𝑖−𝑛+1 𝑖−1 ∗ +𝛾mid 𝜕𝑗 𝑤𝑖−𝑛+1 𝑖−1 1 𝑛 − 2 𝑃GLM 𝑤𝑖 𝜕𝑗 𝜕 𝑘 𝑤𝑖−𝑛+1 𝑖−1 𝑛−1 𝑘=1,𝑘≠𝑗
  • 12. • The bold arrows correspond to interpolation of models in traditional modified Kneser-Ney smoothing. The lighter arrows illustrate the additional interpolations introduced by our generalized language models. (via [Pickhardt+ ACL2014])
  • 13. • shrunk training data sets for the English Wikipedia small domain specific data
  • 14. Space Complexity model size = 9.5GB # of entries = 427M model size = 15GB # of entries = 742M
  • 15. References • [Zhang+ ACL2014] Kneser-Ney Smoothing on Expected Count • [Pickhardt+ ACL2014] A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing • [Kneser+ 1995] Improved backing-off for m- gram language modeling • [Chen+ 1999] An Empirical Study of Smoothing Techniques for Language Modeling