SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Non-parametric Bayesian
Learning in Discrete Data
Yueshen Xu
xyshzjucs@zju.edu.cn / xuyueshen@163.com
Middleware, CCNT, ZJU
Middleware, CCNT, ZJU5/10/2016
Statistics & Computational Linguistics
1Yueshen Xu
Outline
 Bayes’ Rule
 Parametric Bayesian Learning
 Concept & Example
 Discrete & Continuous Data
 Text Clustering & Topic Modeling
 Pros and Cons
 Some Important Concepts
 Non-parametric Bayesian Learning
 Dirichlet Process and Process Construction
 Dirichlet Process Mixture
 Hierarchical Dirichlet Process
 Chinese Restaurant Process
5/10/2016 2 Middleware, CCNT, ZJUYueshen Xu
 Example: Hierarchical Topic
Modeling
 Markov Chain Monte Carlo
 Reference
 Discussion
Bayes’ Rule
 Posterior = Prior * Likelihood
5/10/2016 Yueshen Xu 3 Middleware, CCNT, ZJU
𝑝 𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝐷𝑎𝑡𝑎 =
𝑝 𝐷𝑎𝑡𝑎 𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑝(𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠)
𝑝(𝐷𝑎𝑡𝑎)
Posterior
Likelihood Prior
Evidence
Update beliefs in hypotheses in response to data
 Parametric or Non-parametric
 The structure of hypothesis: constrain or not constrain
 We have examples later
 Your confidence to the prior
Parametric Bayesian Learning
5/10/2016 Yueshen Xu 4 Middleware, CCNT, ZJU
𝑝 𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝐷𝑎𝑡𝑎 ∝ 𝑝 𝐷𝑎𝑡𝑎 𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑝(𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠)
 Parametric or Non-parametric  Hypothesis
 Evidence is the fact
 Constant  No possibility  Trick commonly used
 Non-parametric != No parameters
Hyper-parameters
• Parameters of distributions
• Parameter vs. Variable
𝐷𝑖𝑟 𝜃 𝜶 =
Γ(𝛼0)
Γ 𝛼1 … Γ 𝛼 𝐾
𝑘=1
𝐾
𝜃 𝑘
𝛼 𝑘−1
Variable
Hyper-parameter Parameter
p(θ|X) ∝ p(X|θ)p(θ)
Parametric Bayesian Learning
 Some Examples
5/10/2016 Yueshen Xu 5 Middleware, CCNT, ZJU
Clustering Topic Modeling
K-Means/Medoid, NMF LSA, pLSA, LDA
Hierarchical Concept Building
Parametric Bayesian Learning
 Serious Problems
 How could we know
 the number of clusters?
 the number of topics?
 the number of layers?
5/10/2016 Yueshen Xu 6 Middleware, CCNT, ZJU
Heuristic pre-processing?
Guessing and Tuning
Parametric Bayesian Learning
 Some basics
 Discrete Data & Continuous Data
 Discrete Data: text  be modeled as natural numbers
 Continuous Data: stock, trading, signal, quality, rating  be
modeled as real numbers
5/10/2016 Yueshen Xu 7 Middleware, CCNT, ZJU
 Some important concepts (Also used in non-parametric case)
 Discrete distribution: 𝑋𝑖|𝜃~𝐷𝑖𝑠𝑐𝑟𝑒𝑡𝑒(𝜃)
𝑝 𝑋 𝜃 =
𝑖=1
𝑛
𝐷𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑋𝑖; 𝜃 =
𝑗=1
𝑚
𝜃𝑗
𝑁 𝑗
 Multinomial distribution: 𝑁|𝑛, 𝜃~𝑀𝑢𝑙𝑡𝑖(𝜃, 𝑛)
𝑝 𝑁 𝑛, 𝜃 =
𝑛!
𝑗=1
𝑚
𝑁𝑗!
𝑗=1
𝑚
𝜃𝑗
𝑁 𝑗
Computer Sciencers
often mix them up
Parametric Bayesian Learning
 Some important concepts (cont.)
 Dirichlet distribution:𝜃|𝜶~𝐷𝑖𝑟(𝜶)
𝐷𝑖𝑟 𝜃 𝜶 =
Γ(𝛼0)
Γ 𝛼1 … Γ 𝛼 𝐾
𝑘=1
𝐾
𝜃 𝑘
𝛼 𝑘−1
 Conjugate Prior
 the posterior p(θ|X) are in the same family as the p(θ), the prior is called
a conjugate prior of the likelihood p(X|θ)
 Examples
Binomial Distribution ←→ Beta Distribution
Multinomial Distribution ←→ Dirichlet Distribution
5/10/2016 Yueshen Xu 8 Middleware, CCNT, ZJU
𝑝 𝜃 𝑵, 𝜶 =𝐷𝑖𝑟 𝜃 𝑵 + 𝜶 =
Γ(𝛼0+𝑁)
Γ 𝛼1+𝑁1 …Γ 𝛼 𝐾+𝑁 𝐾
𝑘=1
𝐾
𝜃 𝑘
𝛼 𝑘−1+𝑁 𝑘
𝑝(𝜃|𝜶) 𝑝 𝑵 𝜃
Why should prior and
posterior better be
conjugate distributions?
 …
Parametric Bayesian Learning
 Some important concepts (cont.)
 Probabilistic Graphical Model
 Modeling Bayesian Network using plates and circles
5/10/2016 Yueshen Xu 9 Middleware, CCNT, ZJU
 Generative Model & Discriminative Model: 𝑝(𝜃|𝑋)
 Generative Model: p(θ|X) ∝ p(X|θ)p(θ)
 Naïve Bayes, GMM, pLSA, LDA, HMM, HDP… : Unsupervised Learning
 Discriminative Model: 𝑝(𝜃|𝑋)
 LR, KNN,SVM, Boosting, Decision Tree : Supervised Learning
Also have graphical model
representations
Non-parametric Bayesian Learning
 When we talk about non-parametric, what do we usually talk
about?
 Discrete Data: Dirichlet Distribution, Dirichlet Process, Chinese
Restaurant Process, Polya Urn, Pitman-Yor Process, Hierarchical
Dirichlet Process, Dirichlet Process Mixture, Dirichlet Process
Multinomial Model, Clustering, …
 Continuous Data: Gaussian Distribution, Gaussian Process,
Regression, Classification, Factorization, Gradient Descent,
Covariance Matrix…  Brownian Motion
5/10/2016 Yueshen Xu 10 Middleware, CCNT, ZJU
Infinite
∞
Non-parametric Bayesian Learning
 Dirichlet Process[Yee Whye Teh, etc]
 𝐺0 : probabilistic measure/distribution (base distribution), 𝛼0: real
number, (𝐴1, 𝐴2, … , 𝐴 𝑟) : partition of space, G: a probabilistic
distribution, iff
(𝐺 𝐴1 , … , 𝐺(𝐴 𝑟))~𝐷𝑖𝑟(𝛼0 𝐺0 𝐴1 , … , 𝛼0 𝐺0 𝐴 𝑟 )
then, 𝐺~DP(𝛼0, 𝐺0)
5/10/2016, Yueshen Xu 11 Middleware, CCNT, ZJU
 𝐺0 : which exact distribution is 𝐺0? We don’t know
 𝐺 : which exact distribution is 𝐺? We don’t know
Non-parametric Bayesian Learning
 Where is infinite?  Construction of DP  We need to construct
a DP, since it does not exist naturally
 Stick-breaking, Polya Urn Scheme, Chinese restaurant process
Middleware, CCNT, ZJU
 Stick-breaking construction
 (𝛽 𝑘) 𝑘=1
∞
,(𝜙 𝑘) 𝑘=1
∞
:iid sequence
𝑘=1
∞
𝜋 𝑘 = 1 𝛿 𝜙 𝑘
is the probability of 𝜙 𝑘
a distribution of positive integers
𝛽 𝑘|𝛼0~𝐵𝑒𝑡𝑎(1, 𝛼0)
𝜙 𝑘|𝛼0~𝐺0
𝜋 𝑘 = 𝛽 𝑘
𝑙=1
𝑘−1
(1 − 𝛽𝑙)
𝐺 =
𝑘=1
∞
𝜋 𝑘 𝛿 𝜙 𝑘
Why DP?  …
Non-parametric Bayesian Learning
 Chinese Restaurant Process
 A restaurant with an infinite number of tables, and customers
(word, generated from 𝜃𝑖, one-to-one) enter this restaurant
sequentially. The ith customer (𝜃𝑖) sits at a table (𝜙 𝑘) according to
the probability :
5/10/2016 Yueshen Xu 13 Middleware, CCNT, ZJU
new table
𝜙 𝑘: Clustering == 2/3 unsupervised learning  clustering, topic modeling (two layer
clustering), hierarchical concept building, collaborative filtering, similarity computation…
Non-parametric Bayesian Learning
 Dirichlet Process Mixture (DPM)
 You can draw the graphical model yourself  DP is not enough 
We need similarity instead of cloning  Mixture Models
Middleware, CCNT, ZJU
 Mixture Models: an element is generated from a mixture/group of
variables (usually latent variables)  ∶ GMM, LDA, pLSA…
 DPM: 𝜃𝑖|𝐺~𝐺, 𝑥𝑖|𝜃𝑖~𝐹(𝜃𝑖) For text data, 𝐹(𝜃𝑖) is Discrete/Multinomial
Intuitive but not helpful
Construction
𝛽 𝑘|𝛼0~𝐵𝑒𝑡𝑎(1, 𝛼0)
𝜙 𝑘|𝛼0~𝐺0
𝜋 𝑘 = 𝛽 𝑘
𝑙=1
𝑘−1
(1 − 𝛽𝑙)
𝐺 =
𝑘=1
∞
𝜋 𝑘 𝛿 𝜙 𝑘
Non-parametric Bayesian Learning
 Dirichlet Process Mixture (DPM)
5/10/2016 Yueshen Xu 15 Middleware, CCNT, ZJU
Finite
Dirichlet Multinomial
Mixture Model
What can DMMM do?
(0,0,0,Caption,0,0,0,0,0,0,USA,0,0,0,0,0,0,0,0,0,Action,0,0,0,0,0,0,0,Hero,0,0 0,0,0,0,….)
C l u s t e r i n g
Non-parametric Bayesian Learning
 Hierarchical Dirichlet Process (HDP)
5/10/2016 Yueshen Xu 16 Middleware, CCNT, ZJU
Construction
 HDP: 𝜃𝑗𝑖|𝐺~𝐺, 𝑥𝑗𝑖|𝜃𝑗𝑖~𝐹(𝜃𝑗𝑖)
LDA
A very natural model for
those statistics guys,
but for our computer
guys…hehe….
Finite (F: Mult)
LDA  Hierarchical
Dirichlet Multinomial
Mixture Model
Non-parametric Bayesian Learning
 Hierarchical Topic Modeling
 What we can get from reviews, blogs, question answers, twitter,
news……?  Only topics?  Far not enough
 What we really need is a hierarchy to illustrate what exactly the
text tells people, like
5/10/2016 Yueshen Xu 17 Middleware, CCNT, ZJU
Non-parametric Bayesian Learning
 Hierarchical Topic Modeling
 Prior: Nested CRP/DP (nCRP) [Blei and Jordan, NIPS, 04]
 NCRP: In a restaurant, at the 1st level, there is one table, which is linked
with an infinite number of tables at the 2nd level. Each table at the
second level is also linked with an infinite number of tables at the 3rd
level. Such a structure is repeated...
 CRP is the prior to choose a table to form a path
5/10/2016 Yueshen Xu Middleware, CCNT, ZJU
one document, one path
Doc 2
Matryoshka Doll
Non-parametric Bayesian Learning
 Hierarchical Topic Modeling
 Generative Process
1. Let 𝑐1 be the root restaurant (only one table)
2. For each level 𝑙 ∈ {2, … , 𝐿}:
Draw a table from restaurant 𝑐𝑙−1 using CRP. Set 𝑐𝑙 to be the restaurant referred to
by that table
3. Draw an 𝐿 -dimensional topic proportion vector 𝜃~𝐷𝑖𝑟(𝛼)
4. For each word 𝑤 𝑛:
Draw 𝑧 ∈ 1, … , 𝐿 ~ Mult(𝜃)
Draw 𝑤 𝑛 from the topic associated with restaurant 𝑐 𝑧
5/10/2016 Yueshen Xu
α
zm,n
N
c1
c2
cL
T
γ
wm,n
M
β
k


m


𝐿 can be infinite, but not necessary
Non-parametric Bayesian Learning
 What we can get
5/10/2016 Yueshen Xu 20 Middleware, CCNT, ZJU
Markov Chain Monte Carlo
 Markov Chain
 Initialization probability: 𝜋0 = {𝜋0 1 , 𝜋0 2 , … , 𝜋0(|𝑆|)}
 𝜋 𝑛 = 𝜋 𝑛−1 𝑃 = 𝜋 𝑛−2 𝑃2
= ⋯ = 𝜋0 𝑃 𝑛
: Chapman-Kolomogrov equation
 Central-limit Theorem: Under the premise of connectivity of P, lim
𝑛→∞
𝑃𝑖𝑗
𝑛
= 𝜋 𝑗 ; 𝜋 𝑗 = 𝑖=1
|𝑆|
𝜋 𝑖 𝑃𝑖𝑗
 lim
𝑛→∞
𝜋0 𝑃 𝑛
=
𝜋(1) … 𝜋(|𝑆|)
⋮ ⋮ ⋮
𝜋(1) 𝜋(|𝑆|)
 𝜋 = {𝜋 1 , 𝜋 2 , … , 𝜋 𝑗 , … , 𝜋(|𝑆|)}
5/10/2016 21 Middleware, CCNT, ZJU
Stationary Distribution
𝑋0~𝜋0 𝑥 −→ 𝑋1~𝜋1 𝑥 −→ ⋯ −→ 𝑋 𝑛~𝜋 𝑥 −→ 𝑋 𝑛+1~𝜋 𝑥 −→ 𝑋 𝑛+2~𝜋 𝑥 −→
sample
Convergence
Stationary Distribution
Yueshen Xu


















|)||(|...)2|(|)1|(|
)12(p...)22(p)12(p
|)|1(...)21()11(p
SSpSpSp
Spp
P

Xm
Xm+1
Markov Chain Monte Carlo
 Gibbs Sampling
5/10/2016 Yueshen Xu 22 Middleware, CCNT, ZJU
Step1: Initialize: 𝑋0 = 𝑥0 = {𝑥1: 𝑖 = 1,2, … 𝑛}
Step2: for t = 0, 1, 2, …
1. 𝑥1
(𝑡+1)
~𝑝 𝑥1 𝑥2
(𝑡)
, 𝑥3
(𝑡)
, … , 𝑥 𝑛
(𝑡)
;
2. 𝑥2
𝑡+1
~𝑝 𝑥2 𝑥1
(𝑡+1)
, 𝑥3
(𝑡)
, … , 𝑥 𝑛
(𝑡)
3. …
4. 𝑥𝑗
𝑡+1
~𝑝 𝑥𝑗 𝑥1
(𝑡+1)
, 𝑥𝑗−1
(𝑡+1)
, 𝑥𝑗+1
(𝑡)
… , 𝑥 𝑛
(𝑡)
5. …
6. 𝑥 𝑛
𝑡+1
~𝑝 𝑥 𝑛 𝑥1
(𝑡+1)
, 𝑥2
(𝑡+1)
, … , 𝑥 𝑛−1
(𝑡+1)
𝑥𝑖~𝑝 𝑥 𝑥−𝑖
A(x1,x1)
B(x1,x2)
C(x2,x1)
D
Metropolis-Hastings Sampling
You want to know ‘Gibbs sampling for HDP/DPM/nCRP’ ? You’d better understand
Gibbs sampling for ‘LDA and DMMM’
Reference
• Yee Whye Teh. Dirichlet Processes: Tutorial and Practical Course, 2007
• Yee Whye Teh, Jordan M I, etc. Hierarchical Dirichlet Processes, American Statistical
Association, 2006
• David Blei. Probabilstic topic models. Communications of the ACM, 2012
• David Blei, etc. Latent Dirichlet Allocation, JMLR, 2003
• David Blei, etc. The Nested Chinese Restaurant Process and Bayesian Inference of Topic
Hierarchies. Journal of the ACM, 2010
• Gregor Heinrich. Parameter Estimation for Text Analysis, 2008
• T.S., Ferguson. A Bayesian Analysis of Some Nonparametric Problems. The Annals of
Statistics, 1973
• Martin J. Wainwright. Graphical Models, Exponential Families, and Variational Inference
• Rick Durrett. Probability: Theory and Examples, 2010
• Christopher Bishop. Pattern Recognition and Machine Learning, 2007
• Vasilis Vryniotis. DatumBox: The Dirichlet Process Mixture Model, 2014
• David P. Williams. Gaussian Processes, Duke University, 2006
5/10/2016 Yueshen Xu 23 Middleware, CCNT, ZJU
Q&A
5/10/2016 Middleware, CCNT, ZJU24Yueshen Xu

Weitere ähnliche Inhalte

Was ist angesagt?

When Classifier Selection meets Information Theory: A Unifying View
When Classifier Selection meets Information Theory: A Unifying ViewWhen Classifier Selection meets Information Theory: A Unifying View
When Classifier Selection meets Information Theory: A Unifying ViewMohamed Farouk
 
slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16Christian Robert
 
Intuition – Based Teaching Mathematics for Engineers
Intuition – Based Teaching Mathematics for EngineersIntuition – Based Teaching Mathematics for Engineers
Intuition – Based Teaching Mathematics for EngineersIDES Editor
 
A Case Study of Teaching the Concept of Differential in Mathematics Teacher T...
A Case Study of Teaching the Concept of Differential in Mathematics Teacher T...A Case Study of Teaching the Concept of Differential in Mathematics Teacher T...
A Case Study of Teaching the Concept of Differential in Mathematics Teacher T...theijes
 

Was ist angesagt? (6)

When Classifier Selection meets Information Theory: A Unifying View
When Classifier Selection meets Information Theory: A Unifying ViewWhen Classifier Selection meets Information Theory: A Unifying View
When Classifier Selection meets Information Theory: A Unifying View
 
slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16
 
WSDM2019tutorial
WSDM2019tutorialWSDM2019tutorial
WSDM2019tutorial
 
Intuition – Based Teaching Mathematics for Engineers
Intuition – Based Teaching Mathematics for EngineersIntuition – Based Teaching Mathematics for Engineers
Intuition – Based Teaching Mathematics for Engineers
 
ecir2019tutorial
ecir2019tutorialecir2019tutorial
ecir2019tutorial
 
A Case Study of Teaching the Concept of Differential in Mathematics Teacher T...
A Case Study of Teaching the Concept of Differential in Mathematics Teacher T...A Case Study of Teaching the Concept of Differential in Mathematics Teacher T...
A Case Study of Teaching the Concept of Differential in Mathematics Teacher T...
 

Ähnlich wie Non parametric bayesian learning in discrete data

InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...Shuhei Yoshida
 
(Hierarchical) Topic Modeling_Yueshen Xu
(Hierarchical) Topic Modeling_Yueshen Xu(Hierarchical) Topic Modeling_Yueshen Xu
(Hierarchical) Topic Modeling_Yueshen XuYueshen Xu
 
GDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentGDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentElectronic Arts / DICE
 
"Naive Bayes Classifier" @ Papers We Love Bucharest
"Naive Bayes Classifier" @ Papers We Love Bucharest"Naive Bayes Classifier" @ Papers We Love Bucharest
"Naive Bayes Classifier" @ Papers We Love BucharestStefan Adam
 
Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...TELKOMNIKA JOURNAL
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introductionYueshen Xu
 
Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and StatisticsRoozbeh Sanaei
 
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...Gingles Caroline
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learningUniversity of Groningen
 
QTML2021 UAP Quantum Feature Map
QTML2021 UAP Quantum Feature MapQTML2021 UAP Quantum Feature Map
QTML2021 UAP Quantum Feature MapHa Phuong
 
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...Artificial Intelligence Institute at UofSC
 
Probability & Information theory
Probability & Information theoryProbability & Information theory
Probability & Information theory성재 최
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)Masahiro Suzuki
 
Confirmatory Bayesian Online Change Point Detection in the Covariance Structu...
Confirmatory Bayesian Online Change Point Detection in the Covariance Structu...Confirmatory Bayesian Online Change Point Detection in the Covariance Structu...
Confirmatory Bayesian Online Change Point Detection in the Covariance Structu...JeeyeonHan
 
20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptx20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptxssuser7807522
 
Naive_hehe.pptx
Naive_hehe.pptxNaive_hehe.pptx
Naive_hehe.pptxMahimMajee
 

Ähnlich wie Non parametric bayesian learning in discrete data (20)

InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
(Hierarchical) Topic Modeling_Yueshen Xu
(Hierarchical) Topic Modeling_Yueshen Xu(Hierarchical) Topic Modeling_Yueshen Xu
(Hierarchical) Topic Modeling_Yueshen Xu
 
GDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentGDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game Development
 
"Naive Bayes Classifier" @ Papers We Love Bucharest
"Naive Bayes Classifier" @ Papers We Love Bucharest"Naive Bayes Classifier" @ Papers We Love Bucharest
"Naive Bayes Classifier" @ Papers We Love Bucharest
 
Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introduction
 
Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and Statistics
 
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learning
 
QTML2021 UAP Quantum Feature Map
QTML2021 UAP Quantum Feature MapQTML2021 UAP Quantum Feature Map
QTML2021 UAP Quantum Feature Map
 
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
 
Probability & Information theory
Probability & Information theoryProbability & Information theory
Probability & Information theory
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
 
Confirmatory Bayesian Online Change Point Detection in the Covariance Structu...
Confirmatory Bayesian Online Change Point Detection in the Covariance Structu...Confirmatory Bayesian Online Change Point Detection in the Covariance Structu...
Confirmatory Bayesian Online Change Point Detection in the Covariance Structu...
 
20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptx20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptx
 
SASA 2016
SASA 2016SASA 2016
SASA 2016
 
Supervised algorithms
Supervised algorithmsSupervised algorithms
Supervised algorithms
 
Naive_hehe.pptx
Naive_hehe.pptxNaive_hehe.pptx
Naive_hehe.pptx
 
Devin Petersohn Poster
Devin Petersohn PosterDevin Petersohn Poster
Devin Petersohn Poster
 

Mehr von Yueshen Xu

Context aware service recommendation
Context aware service recommendationContext aware service recommendation
Context aware service recommendationYueshen Xu
 
Course review for ir class 本科课件
Course review for ir class 本科课件Course review for ir class 本科课件
Course review for ir class 本科课件Yueshen Xu
 
Semantic web 本科课件
Semantic web 本科课件Semantic web 本科课件
Semantic web 本科课件Yueshen Xu
 
Recommender system slides for undergraduate
Recommender system slides for undergraduateRecommender system slides for undergraduate
Recommender system slides for undergraduateYueshen Xu
 
推荐系统 本科课件
 推荐系统 本科课件 推荐系统 本科课件
推荐系统 本科课件Yueshen Xu
 
Text classification 本科课件
Text classification 本科课件Text classification 本科课件
Text classification 本科课件Yueshen Xu
 
Thinking in clustering yueshen xu
Thinking in clustering yueshen xuThinking in clustering yueshen xu
Thinking in clustering yueshen xuYueshen Xu
 
Text clustering (information retrieval, in chinese)
Text clustering (information retrieval, in chinese)Text clustering (information retrieval, in chinese)
Text clustering (information retrieval, in chinese)Yueshen Xu
 
(Hierarchical) topic modeling
(Hierarchical) topic modeling (Hierarchical) topic modeling
(Hierarchical) topic modeling Yueshen Xu
 
聚类 (Clustering)
聚类 (Clustering)聚类 (Clustering)
聚类 (Clustering)Yueshen Xu
 
徐悦甡简历
徐悦甡简历徐悦甡简历
徐悦甡简历Yueshen Xu
 
Learning to recommend with user generated content
Learning to recommend with user generated contentLearning to recommend with user generated content
Learning to recommend with user generated contentYueshen Xu
 
Social recommender system
Social recommender systemSocial recommender system
Social recommender systemYueshen Xu
 
Summary on the Conference of WISE 2013
Summary on the Conference of WISE 2013Summary on the Conference of WISE 2013
Summary on the Conference of WISE 2013Yueshen Xu
 
Acoustic modeling using deep belief networks
Acoustic modeling using deep belief networksAcoustic modeling using deep belief networks
Acoustic modeling using deep belief networksYueshen Xu
 
Summarization for dragon star program
Summarization for dragon  star programSummarization for dragon  star program
Summarization for dragon star programYueshen Xu
 
Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)Yueshen Xu
 
Aggregation computation over distributed data streams
Aggregation computation over distributed data streamsAggregation computation over distributed data streams
Aggregation computation over distributed data streamsYueshen Xu
 
Analysis on tcp ip protocol stack
Analysis on tcp ip protocol stackAnalysis on tcp ip protocol stack
Analysis on tcp ip protocol stackYueshen Xu
 

Mehr von Yueshen Xu (20)

Context aware service recommendation
Context aware service recommendationContext aware service recommendation
Context aware service recommendation
 
Course review for ir class 本科课件
Course review for ir class 本科课件Course review for ir class 本科课件
Course review for ir class 本科课件
 
Semantic web 本科课件
Semantic web 本科课件Semantic web 本科课件
Semantic web 本科课件
 
Recommender system slides for undergraduate
Recommender system slides for undergraduateRecommender system slides for undergraduate
Recommender system slides for undergraduate
 
推荐系统 本科课件
 推荐系统 本科课件 推荐系统 本科课件
推荐系统 本科课件
 
Text classification 本科课件
Text classification 本科课件Text classification 本科课件
Text classification 本科课件
 
Thinking in clustering yueshen xu
Thinking in clustering yueshen xuThinking in clustering yueshen xu
Thinking in clustering yueshen xu
 
Text clustering (information retrieval, in chinese)
Text clustering (information retrieval, in chinese)Text clustering (information retrieval, in chinese)
Text clustering (information retrieval, in chinese)
 
(Hierarchical) topic modeling
(Hierarchical) topic modeling (Hierarchical) topic modeling
(Hierarchical) topic modeling
 
聚类 (Clustering)
聚类 (Clustering)聚类 (Clustering)
聚类 (Clustering)
 
Yueshen xu cv
Yueshen xu cvYueshen xu cv
Yueshen xu cv
 
徐悦甡简历
徐悦甡简历徐悦甡简历
徐悦甡简历
 
Learning to recommend with user generated content
Learning to recommend with user generated contentLearning to recommend with user generated content
Learning to recommend with user generated content
 
Social recommender system
Social recommender systemSocial recommender system
Social recommender system
 
Summary on the Conference of WISE 2013
Summary on the Conference of WISE 2013Summary on the Conference of WISE 2013
Summary on the Conference of WISE 2013
 
Acoustic modeling using deep belief networks
Acoustic modeling using deep belief networksAcoustic modeling using deep belief networks
Acoustic modeling using deep belief networks
 
Summarization for dragon star program
Summarization for dragon  star programSummarization for dragon  star program
Summarization for dragon star program
 
Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)
 
Aggregation computation over distributed data streams
Aggregation computation over distributed data streamsAggregation computation over distributed data streams
Aggregation computation over distributed data streams
 
Analysis on tcp ip protocol stack
Analysis on tcp ip protocol stackAnalysis on tcp ip protocol stack
Analysis on tcp ip protocol stack
 

Kürzlich hochgeladen

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 

Kürzlich hochgeladen (20)

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 

Non parametric bayesian learning in discrete data

  • 1. Non-parametric Bayesian Learning in Discrete Data Yueshen Xu xyshzjucs@zju.edu.cn / xuyueshen@163.com Middleware, CCNT, ZJU Middleware, CCNT, ZJU5/10/2016 Statistics & Computational Linguistics 1Yueshen Xu
  • 2. Outline  Bayes’ Rule  Parametric Bayesian Learning  Concept & Example  Discrete & Continuous Data  Text Clustering & Topic Modeling  Pros and Cons  Some Important Concepts  Non-parametric Bayesian Learning  Dirichlet Process and Process Construction  Dirichlet Process Mixture  Hierarchical Dirichlet Process  Chinese Restaurant Process 5/10/2016 2 Middleware, CCNT, ZJUYueshen Xu  Example: Hierarchical Topic Modeling  Markov Chain Monte Carlo  Reference  Discussion
  • 3. Bayes’ Rule  Posterior = Prior * Likelihood 5/10/2016 Yueshen Xu 3 Middleware, CCNT, ZJU 𝑝 𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝐷𝑎𝑡𝑎 = 𝑝 𝐷𝑎𝑡𝑎 𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑝(𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠) 𝑝(𝐷𝑎𝑡𝑎) Posterior Likelihood Prior Evidence Update beliefs in hypotheses in response to data  Parametric or Non-parametric  The structure of hypothesis: constrain or not constrain  We have examples later  Your confidence to the prior
  • 4. Parametric Bayesian Learning 5/10/2016 Yueshen Xu 4 Middleware, CCNT, ZJU 𝑝 𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝐷𝑎𝑡𝑎 ∝ 𝑝 𝐷𝑎𝑡𝑎 𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑝(𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠)  Parametric or Non-parametric  Hypothesis  Evidence is the fact  Constant  No possibility  Trick commonly used  Non-parametric != No parameters Hyper-parameters • Parameters of distributions • Parameter vs. Variable 𝐷𝑖𝑟 𝜃 𝜶 = Γ(𝛼0) Γ 𝛼1 … Γ 𝛼 𝐾 𝑘=1 𝐾 𝜃 𝑘 𝛼 𝑘−1 Variable Hyper-parameter Parameter p(θ|X) ∝ p(X|θ)p(θ)
  • 5. Parametric Bayesian Learning  Some Examples 5/10/2016 Yueshen Xu 5 Middleware, CCNT, ZJU Clustering Topic Modeling K-Means/Medoid, NMF LSA, pLSA, LDA Hierarchical Concept Building
  • 6. Parametric Bayesian Learning  Serious Problems  How could we know  the number of clusters?  the number of topics?  the number of layers? 5/10/2016 Yueshen Xu 6 Middleware, CCNT, ZJU Heuristic pre-processing? Guessing and Tuning
  • 7. Parametric Bayesian Learning  Some basics  Discrete Data & Continuous Data  Discrete Data: text  be modeled as natural numbers  Continuous Data: stock, trading, signal, quality, rating  be modeled as real numbers 5/10/2016 Yueshen Xu 7 Middleware, CCNT, ZJU  Some important concepts (Also used in non-parametric case)  Discrete distribution: 𝑋𝑖|𝜃~𝐷𝑖𝑠𝑐𝑟𝑒𝑡𝑒(𝜃) 𝑝 𝑋 𝜃 = 𝑖=1 𝑛 𝐷𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑋𝑖; 𝜃 = 𝑗=1 𝑚 𝜃𝑗 𝑁 𝑗  Multinomial distribution: 𝑁|𝑛, 𝜃~𝑀𝑢𝑙𝑡𝑖(𝜃, 𝑛) 𝑝 𝑁 𝑛, 𝜃 = 𝑛! 𝑗=1 𝑚 𝑁𝑗! 𝑗=1 𝑚 𝜃𝑗 𝑁 𝑗 Computer Sciencers often mix them up
  • 8. Parametric Bayesian Learning  Some important concepts (cont.)  Dirichlet distribution:𝜃|𝜶~𝐷𝑖𝑟(𝜶) 𝐷𝑖𝑟 𝜃 𝜶 = Γ(𝛼0) Γ 𝛼1 … Γ 𝛼 𝐾 𝑘=1 𝐾 𝜃 𝑘 𝛼 𝑘−1  Conjugate Prior  the posterior p(θ|X) are in the same family as the p(θ), the prior is called a conjugate prior of the likelihood p(X|θ)  Examples Binomial Distribution ←→ Beta Distribution Multinomial Distribution ←→ Dirichlet Distribution 5/10/2016 Yueshen Xu 8 Middleware, CCNT, ZJU 𝑝 𝜃 𝑵, 𝜶 =𝐷𝑖𝑟 𝜃 𝑵 + 𝜶 = Γ(𝛼0+𝑁) Γ 𝛼1+𝑁1 …Γ 𝛼 𝐾+𝑁 𝐾 𝑘=1 𝐾 𝜃 𝑘 𝛼 𝑘−1+𝑁 𝑘 𝑝(𝜃|𝜶) 𝑝 𝑵 𝜃 Why should prior and posterior better be conjugate distributions?  …
  • 9. Parametric Bayesian Learning  Some important concepts (cont.)  Probabilistic Graphical Model  Modeling Bayesian Network using plates and circles 5/10/2016 Yueshen Xu 9 Middleware, CCNT, ZJU  Generative Model & Discriminative Model: 𝑝(𝜃|𝑋)  Generative Model: p(θ|X) ∝ p(X|θ)p(θ)  Naïve Bayes, GMM, pLSA, LDA, HMM, HDP… : Unsupervised Learning  Discriminative Model: 𝑝(𝜃|𝑋)  LR, KNN,SVM, Boosting, Decision Tree : Supervised Learning Also have graphical model representations
  • 10. Non-parametric Bayesian Learning  When we talk about non-parametric, what do we usually talk about?  Discrete Data: Dirichlet Distribution, Dirichlet Process, Chinese Restaurant Process, Polya Urn, Pitman-Yor Process, Hierarchical Dirichlet Process, Dirichlet Process Mixture, Dirichlet Process Multinomial Model, Clustering, …  Continuous Data: Gaussian Distribution, Gaussian Process, Regression, Classification, Factorization, Gradient Descent, Covariance Matrix…  Brownian Motion 5/10/2016 Yueshen Xu 10 Middleware, CCNT, ZJU Infinite ∞
  • 11. Non-parametric Bayesian Learning  Dirichlet Process[Yee Whye Teh, etc]  𝐺0 : probabilistic measure/distribution (base distribution), 𝛼0: real number, (𝐴1, 𝐴2, … , 𝐴 𝑟) : partition of space, G: a probabilistic distribution, iff (𝐺 𝐴1 , … , 𝐺(𝐴 𝑟))~𝐷𝑖𝑟(𝛼0 𝐺0 𝐴1 , … , 𝛼0 𝐺0 𝐴 𝑟 ) then, 𝐺~DP(𝛼0, 𝐺0) 5/10/2016, Yueshen Xu 11 Middleware, CCNT, ZJU  𝐺0 : which exact distribution is 𝐺0? We don’t know  𝐺 : which exact distribution is 𝐺? We don’t know
  • 12. Non-parametric Bayesian Learning  Where is infinite?  Construction of DP  We need to construct a DP, since it does not exist naturally  Stick-breaking, Polya Urn Scheme, Chinese restaurant process Middleware, CCNT, ZJU  Stick-breaking construction  (𝛽 𝑘) 𝑘=1 ∞ ,(𝜙 𝑘) 𝑘=1 ∞ :iid sequence 𝑘=1 ∞ 𝜋 𝑘 = 1 𝛿 𝜙 𝑘 is the probability of 𝜙 𝑘 a distribution of positive integers 𝛽 𝑘|𝛼0~𝐵𝑒𝑡𝑎(1, 𝛼0) 𝜙 𝑘|𝛼0~𝐺0 𝜋 𝑘 = 𝛽 𝑘 𝑙=1 𝑘−1 (1 − 𝛽𝑙) 𝐺 = 𝑘=1 ∞ 𝜋 𝑘 𝛿 𝜙 𝑘 Why DP?  …
  • 13. Non-parametric Bayesian Learning  Chinese Restaurant Process  A restaurant with an infinite number of tables, and customers (word, generated from 𝜃𝑖, one-to-one) enter this restaurant sequentially. The ith customer (𝜃𝑖) sits at a table (𝜙 𝑘) according to the probability : 5/10/2016 Yueshen Xu 13 Middleware, CCNT, ZJU new table 𝜙 𝑘: Clustering == 2/3 unsupervised learning  clustering, topic modeling (two layer clustering), hierarchical concept building, collaborative filtering, similarity computation…
  • 14. Non-parametric Bayesian Learning  Dirichlet Process Mixture (DPM)  You can draw the graphical model yourself  DP is not enough  We need similarity instead of cloning  Mixture Models Middleware, CCNT, ZJU  Mixture Models: an element is generated from a mixture/group of variables (usually latent variables)  ∶ GMM, LDA, pLSA…  DPM: 𝜃𝑖|𝐺~𝐺, 𝑥𝑖|𝜃𝑖~𝐹(𝜃𝑖) For text data, 𝐹(𝜃𝑖) is Discrete/Multinomial Intuitive but not helpful Construction 𝛽 𝑘|𝛼0~𝐵𝑒𝑡𝑎(1, 𝛼0) 𝜙 𝑘|𝛼0~𝐺0 𝜋 𝑘 = 𝛽 𝑘 𝑙=1 𝑘−1 (1 − 𝛽𝑙) 𝐺 = 𝑘=1 ∞ 𝜋 𝑘 𝛿 𝜙 𝑘
  • 15. Non-parametric Bayesian Learning  Dirichlet Process Mixture (DPM) 5/10/2016 Yueshen Xu 15 Middleware, CCNT, ZJU Finite Dirichlet Multinomial Mixture Model What can DMMM do? (0,0,0,Caption,0,0,0,0,0,0,USA,0,0,0,0,0,0,0,0,0,Action,0,0,0,0,0,0,0,Hero,0,0 0,0,0,0,….) C l u s t e r i n g
  • 16. Non-parametric Bayesian Learning  Hierarchical Dirichlet Process (HDP) 5/10/2016 Yueshen Xu 16 Middleware, CCNT, ZJU Construction  HDP: 𝜃𝑗𝑖|𝐺~𝐺, 𝑥𝑗𝑖|𝜃𝑗𝑖~𝐹(𝜃𝑗𝑖) LDA A very natural model for those statistics guys, but for our computer guys…hehe…. Finite (F: Mult) LDA  Hierarchical Dirichlet Multinomial Mixture Model
  • 17. Non-parametric Bayesian Learning  Hierarchical Topic Modeling  What we can get from reviews, blogs, question answers, twitter, news……?  Only topics?  Far not enough  What we really need is a hierarchy to illustrate what exactly the text tells people, like 5/10/2016 Yueshen Xu 17 Middleware, CCNT, ZJU
  • 18. Non-parametric Bayesian Learning  Hierarchical Topic Modeling  Prior: Nested CRP/DP (nCRP) [Blei and Jordan, NIPS, 04]  NCRP: In a restaurant, at the 1st level, there is one table, which is linked with an infinite number of tables at the 2nd level. Each table at the second level is also linked with an infinite number of tables at the 3rd level. Such a structure is repeated...  CRP is the prior to choose a table to form a path 5/10/2016 Yueshen Xu Middleware, CCNT, ZJU one document, one path Doc 2 Matryoshka Doll
  • 19. Non-parametric Bayesian Learning  Hierarchical Topic Modeling  Generative Process 1. Let 𝑐1 be the root restaurant (only one table) 2. For each level 𝑙 ∈ {2, … , 𝐿}: Draw a table from restaurant 𝑐𝑙−1 using CRP. Set 𝑐𝑙 to be the restaurant referred to by that table 3. Draw an 𝐿 -dimensional topic proportion vector 𝜃~𝐷𝑖𝑟(𝛼) 4. For each word 𝑤 𝑛: Draw 𝑧 ∈ 1, … , 𝐿 ~ Mult(𝜃) Draw 𝑤 𝑛 from the topic associated with restaurant 𝑐 𝑧 5/10/2016 Yueshen Xu α zm,n N c1 c2 cL T γ wm,n M β k   m   𝐿 can be infinite, but not necessary
  • 20. Non-parametric Bayesian Learning  What we can get 5/10/2016 Yueshen Xu 20 Middleware, CCNT, ZJU
  • 21. Markov Chain Monte Carlo  Markov Chain  Initialization probability: 𝜋0 = {𝜋0 1 , 𝜋0 2 , … , 𝜋0(|𝑆|)}  𝜋 𝑛 = 𝜋 𝑛−1 𝑃 = 𝜋 𝑛−2 𝑃2 = ⋯ = 𝜋0 𝑃 𝑛 : Chapman-Kolomogrov equation  Central-limit Theorem: Under the premise of connectivity of P, lim 𝑛→∞ 𝑃𝑖𝑗 𝑛 = 𝜋 𝑗 ; 𝜋 𝑗 = 𝑖=1 |𝑆| 𝜋 𝑖 𝑃𝑖𝑗  lim 𝑛→∞ 𝜋0 𝑃 𝑛 = 𝜋(1) … 𝜋(|𝑆|) ⋮ ⋮ ⋮ 𝜋(1) 𝜋(|𝑆|)  𝜋 = {𝜋 1 , 𝜋 2 , … , 𝜋 𝑗 , … , 𝜋(|𝑆|)} 5/10/2016 21 Middleware, CCNT, ZJU Stationary Distribution 𝑋0~𝜋0 𝑥 −→ 𝑋1~𝜋1 𝑥 −→ ⋯ −→ 𝑋 𝑛~𝜋 𝑥 −→ 𝑋 𝑛+1~𝜋 𝑥 −→ 𝑋 𝑛+2~𝜋 𝑥 −→ sample Convergence Stationary Distribution Yueshen Xu                   |)||(|...)2|(|)1|(| )12(p...)22(p)12(p |)|1(...)21()11(p SSpSpSp Spp P  Xm Xm+1
  • 22. Markov Chain Monte Carlo  Gibbs Sampling 5/10/2016 Yueshen Xu 22 Middleware, CCNT, ZJU Step1: Initialize: 𝑋0 = 𝑥0 = {𝑥1: 𝑖 = 1,2, … 𝑛} Step2: for t = 0, 1, 2, … 1. 𝑥1 (𝑡+1) ~𝑝 𝑥1 𝑥2 (𝑡) , 𝑥3 (𝑡) , … , 𝑥 𝑛 (𝑡) ; 2. 𝑥2 𝑡+1 ~𝑝 𝑥2 𝑥1 (𝑡+1) , 𝑥3 (𝑡) , … , 𝑥 𝑛 (𝑡) 3. … 4. 𝑥𝑗 𝑡+1 ~𝑝 𝑥𝑗 𝑥1 (𝑡+1) , 𝑥𝑗−1 (𝑡+1) , 𝑥𝑗+1 (𝑡) … , 𝑥 𝑛 (𝑡) 5. … 6. 𝑥 𝑛 𝑡+1 ~𝑝 𝑥 𝑛 𝑥1 (𝑡+1) , 𝑥2 (𝑡+1) , … , 𝑥 𝑛−1 (𝑡+1) 𝑥𝑖~𝑝 𝑥 𝑥−𝑖 A(x1,x1) B(x1,x2) C(x2,x1) D Metropolis-Hastings Sampling You want to know ‘Gibbs sampling for HDP/DPM/nCRP’ ? You’d better understand Gibbs sampling for ‘LDA and DMMM’
  • 23. Reference • Yee Whye Teh. Dirichlet Processes: Tutorial and Practical Course, 2007 • Yee Whye Teh, Jordan M I, etc. Hierarchical Dirichlet Processes, American Statistical Association, 2006 • David Blei. Probabilstic topic models. Communications of the ACM, 2012 • David Blei, etc. Latent Dirichlet Allocation, JMLR, 2003 • David Blei, etc. The Nested Chinese Restaurant Process and Bayesian Inference of Topic Hierarchies. Journal of the ACM, 2010 • Gregor Heinrich. Parameter Estimation for Text Analysis, 2008 • T.S., Ferguson. A Bayesian Analysis of Some Nonparametric Problems. The Annals of Statistics, 1973 • Martin J. Wainwright. Graphical Models, Exponential Families, and Variational Inference • Rick Durrett. Probability: Theory and Examples, 2010 • Christopher Bishop. Pattern Recognition and Machine Learning, 2007 • Vasilis Vryniotis. DatumBox: The Dirichlet Process Mixture Model, 2014 • David P. Williams. Gaussian Processes, Duke University, 2006 5/10/2016 Yueshen Xu 23 Middleware, CCNT, ZJU