SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Dirichlet Processes and Applications Saurav Jha
Machine Learning Engineer
Copyright © 2018 FactSet Research Systems Inc. All rights reserved. Confidential: Do not forward.
1. Probability 101: Mass & Density Functions
2. Probability 102: Simplex and its geometrical meaning
3. Dirichlet Distribution
4. Dirichlet Process
5. A demo
6. An application
Table of Contents
2
Probability 101
• PDF = probability that a continuous random variable has a particular range of values
• PMF = probability that a discrete random variable is exactly equal to some value
3
• In continuous setting:
∫b
a f(x)dx = prob. that outcome is between a and b
i.e., units of f(x) = prob. Per unit length (dx)
= how dense is probability per unit length near x
• In discrete setting:
f(x) = Pr(X = x)
i.e., units of f(x) = simple probability
= what is the mass of object X at point x
• Set of PMFs on entire sample space.
S = { x E Rn : xi >= 0, ∑i=1..n xi = 1}
Probability Mass Function (PMF) vs Density Function (PDF)
Probability Simplex
4
• A k-dimensional polytope ( a geometric object with flat sides) formed from convex hull of
its k+1 vertices.
Probability 102: K-Simplex – geometrical meaning
• Let u0, u1, …, uk E Rk be (k+1) points, then the simplex determined by them = set of points:
C = {Ɵ0u0 + … + Ɵkuk | ∑i = 0...k Ɵi = 1 and Ɵi >= 0 ∀ i }
 Looking at u0, u1, u2 as a disjoint set of possible events, such that their probs. sum to 1.
i.e. p0 + p1 + p2 = 1, where 0 <= pi <= 1
 Consider the three probabilities as points in Euclidean space (p1,p2,p3).
 Resulting shape outlines the perimeter of a triangle.
 While the set C lies in a k-dim. Space (k=3), the object it forms is (k-1)
dimensional.
 Each point pi in the simplex = a pmf in its own (i.e. each component of pi
= [0,1] and all its components sum up to 1).
Dirichlet distribution
5
• Let Q = [Q1, Q2, …, Qk] = a random pmf (i.e. Qi >= 0) for i = 1,2,…, k and ∑i=1..k Qi = 1.
• Let α = [α1, α2, . . . , αk], with αi > 0 for each i, and let α0 = ∑i=1..k αi
• Then, Q = a Dirichlet distribution with param. α and is denoted by Q ∼ Dir(α):
P(Q1, Q2, …, Qk) =
• A probability distribution whose samples lie in the (k-1) dimensional probability
simplex ∆k, i.e., a distribution over pmfs of length k.
• Ranges over possible parameters vectors for a multinomial distribution and is
the conjugate prior of multinomial distribution.
“A distribution of distributions”
Dirichlet distribution – an example use-case
• X = vector representing n draws of a random var. with 3 possible outcomes = [4,4,2]
• PMF of X = multinomial distribution = (p1n1* p2n2 * p3n3) * n!/ n1!*n2!*n3!
6
Q) What if p1, p2, p3 are unknown? i.e., no certainty over what the distribution of
categorical vars. is!
 Solution: use a Dirichlet distribution with params α1, α2, α3 to first draw a P ~ Dir(α), and then, draw
X ~ Multi(p).
• Introduces one level of indirection in the model for X – instead of saying what P generated X, use
params α1, α2, α3 to find likely prob. Distributions and then draw samples X acc. To random P.
• Since, sampling is directly from a prob. K-Simplex => the values of a k-dim. Dirichlet distribution =
mean value of the Dirichlet.
• Addition of the Dirichlet distribution = introducing prior beliefs about what X is likely to occur. i.e., a
random pmf has a Dirichlet distribution with param α. [1]
• Analogy 1: if a random pmf = a bag full of dice, then a sample from the Dirichlet = a specific dice.
Dirichlet Process
 Dirichlet Processes to the Rescue !
7
• In the dice analogy, the dice must have a finite no. of faces.
• Limitation of Dirichlet distribution = assumes a finite set of events.
• Enables working with an infinite set of events, and hence to model prob.
Distributions over infinite sample spaces.
Analogy 2:
• Asking a pedestrians on the street to choose their fav. Color out of {V,I,B,G,Y,O,R}.
• Based on answer, model each person as a pmf over 7 colors.
• Each person’s pmf = a realization of a draw from a Dirichlet distribution over 7 colors.
 What if the choices are no longer restricted to 7 colors?
• Modelling an individual’s pmfs (over infinite dim.) = a distribution over distributions over
an infinite samle space.
• One solution = a Dirichlet process.
Dirichlet Process – definition
 Input = H (a prob. Distribution a.k.a base distribution), α (a +ve real no. a.k.a
concentration param.)
 Draw A (i.e., nth element) from H.
 For n > 1:
 Assign A to a new category with the prob. α / (α + n – 1).
 Assign A to a pre-existing category x with prob. nx / (α + n – 1), where nx = no. of
random variables already assigned to x.
8
• Assign elements A,B,C to unknown no. of categories following the algorithm:
• Used when modelling data that tends to repeat previous values in a “rich get richer”
fashion.
• Can also be defined as a Chinese Restaurant Process.
• Applications: Morphological segmentation in NLP, Modelling mutation rates of genes in
evolutionary biology.
A demo [2]
9
An application: Learning of hierarchical Morphology paradigms [3]
• A paradigm = a pair (StemList, SuffixList) where, each Stem+Suffix string = a valid word.
• Can be modelled as a hierarchical structure.
10
• Morphologically similar words = close to
each other in the structure.
• Similarity metric = # common morphemes
• Notations: w = word, s = stem, m = suffix
• Assumption: Stems and suffixes
generated independently from each
other.
• Prob. of a word = p(w = s+m)
= p(s) * p(m)
An application: Learning of hierarchical Morphology paradigms [3]
11
1. Two Dirichlet processes generate stems and suffixes independently:
• βs = concentration parameter, i.e., no. of stem types generated by the DP
• If β = small, new stem/suffix types are less likely to be generated.
• β = large, more likely to generate new stem/suffix types, thus yielding more uniform distribution.
• Authors choose β < 1, i.e. to yield a more skewed distribution with sparse stems & suffixes.
• P = base distribution specifying prior prob. Distribution
for morpheme lengths.
• Joint prob. Of stems can then be calculated as:
References
1. Frigyik, Bela A. et al. “Introduction to the Dirichlet Distribution and Related Processes.”
(2010).
2. http://phyletica.org/dirichlet-process/
3. Can, Burcu and Suresh Manandhar. “Probabilistic Hierarchical Clustering of
Morphological Paradigms.” EACL (2012).
12
THANK YOU !
13

Weitere ähnliche Inhalte

Was ist angesagt?

Interaction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsInteraction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsKen Kuroki
 
Usage of Different Matrix Operation for MIMO Communication
Usage of Different Matrix Operation for MIMO CommunicationUsage of Different Matrix Operation for MIMO Communication
Usage of Different Matrix Operation for MIMO CommunicationVARUN KUMAR
 
Theorems on polynomial functions
Theorems on polynomial functionsTheorems on polynomial functions
Theorems on polynomial functionsLeo Crisologo
 
More theorems on polynomial functions
More theorems on polynomial functionsMore theorems on polynomial functions
More theorems on polynomial functionsLeo Crisologo
 
Common fixed point theorems of integral type in menger pm spaces
Common fixed point theorems of integral type in menger pm spacesCommon fixed point theorems of integral type in menger pm spaces
Common fixed point theorems of integral type in menger pm spacesAlexander Decker
 
Pigeonhole Principle,Cardinality,Countability
Pigeonhole Principle,Cardinality,CountabilityPigeonhole Principle,Cardinality,Countability
Pigeonhole Principle,Cardinality,CountabilityKiran Munir
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Toru Fujino
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBOYoonho Lee
 
Core 3 Numerical Methods 1
Core 3 Numerical Methods 1Core 3 Numerical Methods 1
Core 3 Numerical Methods 1davidmiles100
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modeljins0618
 
ABC short course: final chapters
ABC short course: final chaptersABC short course: final chapters
ABC short course: final chaptersChristian Robert
 
Linear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its GeneralizationLinear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its Generalization일상 온
 

Was ist angesagt? (20)

Interaction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsInteraction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and Physics
 
Unit 3
Unit 3Unit 3
Unit 3
 
Unit 4
Unit 4Unit 4
Unit 4
 
Usage of Different Matrix Operation for MIMO Communication
Usage of Different Matrix Operation for MIMO CommunicationUsage of Different Matrix Operation for MIMO Communication
Usage of Different Matrix Operation for MIMO Communication
 
Theorems on polynomial functions
Theorems on polynomial functionsTheorems on polynomial functions
Theorems on polynomial functions
 
Euler phi
Euler phiEuler phi
Euler phi
 
More theorems on polynomial functions
More theorems on polynomial functionsMore theorems on polynomial functions
More theorems on polynomial functions
 
Common fixed point theorems of integral type in menger pm spaces
Common fixed point theorems of integral type in menger pm spacesCommon fixed point theorems of integral type in menger pm spaces
Common fixed point theorems of integral type in menger pm spaces
 
Pigeonhole Principle,Cardinality,Countability
Pigeonhole Principle,Cardinality,CountabilityPigeonhole Principle,Cardinality,Countability
Pigeonhole Principle,Cardinality,Countability
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)
 
Intractable likelihoods
Intractable likelihoodsIntractable likelihoods
Intractable likelihoods
 
Unit 5
Unit 5Unit 5
Unit 5
 
Report in math 830
Report in math 830Report in math 830
Report in math 830
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBO
 
Core 3 Numerical Methods 1
Core 3 Numerical Methods 1Core 3 Numerical Methods 1
Core 3 Numerical Methods 1
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture model
 
Pigeon hole principle
Pigeon hole principlePigeon hole principle
Pigeon hole principle
 
ABC short course: final chapters
ABC short course: final chaptersABC short course: final chapters
ABC short course: final chapters
 
Linear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its GeneralizationLinear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its Generalization
 

Ähnlich wie Dirichlet processes and Applications

Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
Probability distribution
Probability distributionProbability distribution
Probability distributionRanjan Kumar
 
Discrete Structure Lecture #5 & 6.pdf
Discrete Structure Lecture #5 & 6.pdfDiscrete Structure Lecture #5 & 6.pdf
Discrete Structure Lecture #5 & 6.pdfMuhammadUmerIhtisham
 
Monte Carlo Methods
Monte Carlo MethodsMonte Carlo Methods
Monte Carlo MethodsJames Bell
 
R exam (B) given in Paris-Dauphine, Licence Mido, Jan. 11, 2013
R exam (B) given in Paris-Dauphine, Licence Mido, Jan. 11, 2013R exam (B) given in Paris-Dauphine, Licence Mido, Jan. 11, 2013
R exam (B) given in Paris-Dauphine, Licence Mido, Jan. 11, 2013Christian Robert
 
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...Huang Po Chun
 
04 random-variables-probability-distributionsrv
04 random-variables-probability-distributionsrv04 random-variables-probability-distributionsrv
04 random-variables-probability-distributionsrvPooja Sakhla
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative ModelsKenta Oono
 
Probability and Statistics : Binomial Distribution notes ppt.pdf
Probability and Statistics : Binomial Distribution notes ppt.pdfProbability and Statistics : Binomial Distribution notes ppt.pdf
Probability and Statistics : Binomial Distribution notes ppt.pdfnomovi6416
 
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrap
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrapStatistics (1): estimation, Chapter 2: Empirical distribution and bootstrap
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrapChristian Robert
 

Ähnlich wie Dirichlet processes and Applications (20)

lecture4.pdf
lecture4.pdflecture4.pdf
lecture4.pdf
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
Probability distribution
Probability distributionProbability distribution
Probability distribution
 
Statistics-2 : Elements of Inference
Statistics-2 : Elements of InferenceStatistics-2 : Elements of Inference
Statistics-2 : Elements of Inference
 
Discrete Structure Lecture #5 & 6.pdf
Discrete Structure Lecture #5 & 6.pdfDiscrete Structure Lecture #5 & 6.pdf
Discrete Structure Lecture #5 & 6.pdf
 
Monte Carlo Methods
Monte Carlo MethodsMonte Carlo Methods
Monte Carlo Methods
 
R exam (B) given in Paris-Dauphine, Licence Mido, Jan. 11, 2013
R exam (B) given in Paris-Dauphine, Licence Mido, Jan. 11, 2013R exam (B) given in Paris-Dauphine, Licence Mido, Jan. 11, 2013
R exam (B) given in Paris-Dauphine, Licence Mido, Jan. 11, 2013
 
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
 
2주차
2주차2주차
2주차
 
Lecture5.pptx
Lecture5.pptxLecture5.pptx
Lecture5.pptx
 
BAYSM'14, Wien, Austria
BAYSM'14, Wien, AustriaBAYSM'14, Wien, Austria
BAYSM'14, Wien, Austria
 
PTSP PPT.pdf
PTSP PPT.pdfPTSP PPT.pdf
PTSP PPT.pdf
 
Talk 2
Talk 2Talk 2
Talk 2
 
Montecarlophd
MontecarlophdMontecarlophd
Montecarlophd
 
04 random-variables-probability-distributionsrv
04 random-variables-probability-distributionsrv04 random-variables-probability-distributionsrv
04 random-variables-probability-distributionsrv
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative Models
 
Probability and Statistics : Binomial Distribution notes ppt.pdf
Probability and Statistics : Binomial Distribution notes ppt.pdfProbability and Statistics : Binomial Distribution notes ppt.pdf
Probability and Statistics : Binomial Distribution notes ppt.pdf
 
Ch5
Ch5Ch5
Ch5
 
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrap
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrapStatistics (1): estimation, Chapter 2: Empirical distribution and bootstrap
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrap
 

Kürzlich hochgeladen

Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 

Kürzlich hochgeladen (20)

Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 

Dirichlet processes and Applications

  • 1. Dirichlet Processes and Applications Saurav Jha Machine Learning Engineer Copyright © 2018 FactSet Research Systems Inc. All rights reserved. Confidential: Do not forward.
  • 2. 1. Probability 101: Mass & Density Functions 2. Probability 102: Simplex and its geometrical meaning 3. Dirichlet Distribution 4. Dirichlet Process 5. A demo 6. An application Table of Contents 2
  • 3. Probability 101 • PDF = probability that a continuous random variable has a particular range of values • PMF = probability that a discrete random variable is exactly equal to some value 3 • In continuous setting: ∫b a f(x)dx = prob. that outcome is between a and b i.e., units of f(x) = prob. Per unit length (dx) = how dense is probability per unit length near x • In discrete setting: f(x) = Pr(X = x) i.e., units of f(x) = simple probability = what is the mass of object X at point x • Set of PMFs on entire sample space. S = { x E Rn : xi >= 0, ∑i=1..n xi = 1} Probability Mass Function (PMF) vs Density Function (PDF) Probability Simplex
  • 4. 4 • A k-dimensional polytope ( a geometric object with flat sides) formed from convex hull of its k+1 vertices. Probability 102: K-Simplex – geometrical meaning • Let u0, u1, …, uk E Rk be (k+1) points, then the simplex determined by them = set of points: C = {Ɵ0u0 + … + Ɵkuk | ∑i = 0...k Ɵi = 1 and Ɵi >= 0 ∀ i }  Looking at u0, u1, u2 as a disjoint set of possible events, such that their probs. sum to 1. i.e. p0 + p1 + p2 = 1, where 0 <= pi <= 1  Consider the three probabilities as points in Euclidean space (p1,p2,p3).  Resulting shape outlines the perimeter of a triangle.  While the set C lies in a k-dim. Space (k=3), the object it forms is (k-1) dimensional.  Each point pi in the simplex = a pmf in its own (i.e. each component of pi = [0,1] and all its components sum up to 1).
  • 5. Dirichlet distribution 5 • Let Q = [Q1, Q2, …, Qk] = a random pmf (i.e. Qi >= 0) for i = 1,2,…, k and ∑i=1..k Qi = 1. • Let α = [α1, α2, . . . , αk], with αi > 0 for each i, and let α0 = ∑i=1..k αi • Then, Q = a Dirichlet distribution with param. α and is denoted by Q ∼ Dir(α): P(Q1, Q2, …, Qk) = • A probability distribution whose samples lie in the (k-1) dimensional probability simplex ∆k, i.e., a distribution over pmfs of length k. • Ranges over possible parameters vectors for a multinomial distribution and is the conjugate prior of multinomial distribution. “A distribution of distributions”
  • 6. Dirichlet distribution – an example use-case • X = vector representing n draws of a random var. with 3 possible outcomes = [4,4,2] • PMF of X = multinomial distribution = (p1n1* p2n2 * p3n3) * n!/ n1!*n2!*n3! 6 Q) What if p1, p2, p3 are unknown? i.e., no certainty over what the distribution of categorical vars. is!  Solution: use a Dirichlet distribution with params α1, α2, α3 to first draw a P ~ Dir(α), and then, draw X ~ Multi(p). • Introduces one level of indirection in the model for X – instead of saying what P generated X, use params α1, α2, α3 to find likely prob. Distributions and then draw samples X acc. To random P. • Since, sampling is directly from a prob. K-Simplex => the values of a k-dim. Dirichlet distribution = mean value of the Dirichlet. • Addition of the Dirichlet distribution = introducing prior beliefs about what X is likely to occur. i.e., a random pmf has a Dirichlet distribution with param α. [1] • Analogy 1: if a random pmf = a bag full of dice, then a sample from the Dirichlet = a specific dice.
  • 7. Dirichlet Process  Dirichlet Processes to the Rescue ! 7 • In the dice analogy, the dice must have a finite no. of faces. • Limitation of Dirichlet distribution = assumes a finite set of events. • Enables working with an infinite set of events, and hence to model prob. Distributions over infinite sample spaces. Analogy 2: • Asking a pedestrians on the street to choose their fav. Color out of {V,I,B,G,Y,O,R}. • Based on answer, model each person as a pmf over 7 colors. • Each person’s pmf = a realization of a draw from a Dirichlet distribution over 7 colors.  What if the choices are no longer restricted to 7 colors? • Modelling an individual’s pmfs (over infinite dim.) = a distribution over distributions over an infinite samle space. • One solution = a Dirichlet process.
  • 8. Dirichlet Process – definition  Input = H (a prob. Distribution a.k.a base distribution), α (a +ve real no. a.k.a concentration param.)  Draw A (i.e., nth element) from H.  For n > 1:  Assign A to a new category with the prob. α / (α + n – 1).  Assign A to a pre-existing category x with prob. nx / (α + n – 1), where nx = no. of random variables already assigned to x. 8 • Assign elements A,B,C to unknown no. of categories following the algorithm: • Used when modelling data that tends to repeat previous values in a “rich get richer” fashion. • Can also be defined as a Chinese Restaurant Process. • Applications: Morphological segmentation in NLP, Modelling mutation rates of genes in evolutionary biology.
  • 10. An application: Learning of hierarchical Morphology paradigms [3] • A paradigm = a pair (StemList, SuffixList) where, each Stem+Suffix string = a valid word. • Can be modelled as a hierarchical structure. 10 • Morphologically similar words = close to each other in the structure. • Similarity metric = # common morphemes • Notations: w = word, s = stem, m = suffix • Assumption: Stems and suffixes generated independently from each other. • Prob. of a word = p(w = s+m) = p(s) * p(m)
  • 11. An application: Learning of hierarchical Morphology paradigms [3] 11 1. Two Dirichlet processes generate stems and suffixes independently: • βs = concentration parameter, i.e., no. of stem types generated by the DP • If β = small, new stem/suffix types are less likely to be generated. • β = large, more likely to generate new stem/suffix types, thus yielding more uniform distribution. • Authors choose β < 1, i.e. to yield a more skewed distribution with sparse stems & suffixes. • P = base distribution specifying prior prob. Distribution for morpheme lengths. • Joint prob. Of stems can then be calculated as:
  • 12. References 1. Frigyik, Bela A. et al. “Introduction to the Dirichlet Distribution and Related Processes.” (2010). 2. http://phyletica.org/dirichlet-process/ 3. Can, Burcu and Suresh Manandhar. “Probabilistic Hierarchical Clustering of Morphological Paradigms.” EACL (2012). 12