SlideShare a Scribd company logo
1 of 31
Variational Inference
Natan Katz
Raanana AI
06/01/2020
What is Bayesian Inference?
Sampling
When NICE met VI
Rigorous Foundation
Two words on Numeric
WHAT IS
BAYESIAN
INFERENCE?
Bayesian Inference -Notations
4
The inputs:
Evidence – The Sample of length n (numbers, categories, vectors, images)
Hypothesis - An assumption about the prob. structure that creates the sample
Objective :
We wish to learn the conditional distribution of Hypothesis given the Evidence.
This probability is called Posterior or in mathematical terms P(H|E)
Z- R.V. that represents the hypothesis
X- R.V. that represents the evidence
Bayes formula:
P(Z|X) =
𝑷(𝒁,𝑿)
𝑷(𝑿)
Bayesian inference is therefore about working with the RHS terms.
In some case studying the denominator is intractable or extremely difficult
to calculate.
Let’s Formulate
5
We have K Gaussians
Draw μ 𝑘 ~ 𝑁 0, τ (τ is positive)
For each sample j =1…n
𝑧𝑗 ~Cat (1/K,1/K…1/K)
𝑥𝑗 ~ 𝑁(μ 𝑧 𝑗
, σ)
p(𝑥1….𝑛) = μ1:𝑘 𝑙=1
𝐾
𝑃(μ𝑙) 𝑖=1
𝑛
𝑧 𝑗
𝑝( 𝑧𝑗) P(𝑥𝑖|μ 𝑧 𝑗
) => 𝑃𝑟𝑒𝑡𝑡𝑦 𝑆ℎ𝑖𝑡
Example -GMM
6
SAMPLING
Traditionally posterior is learned using Markov Chain Monte Carlo
(MCMC) methods :
• Metropolis-Hastings
• Gibbs
• Hybrid Monte Carlo
Today we will talk about none of these methods!
Sampling
8
“AN INTRODUCTION TO VARIATIONAL METHODS FOR GRAPHICAL MODELS”
9
Sampling
Exact
slow
Small Data
Variance
Analytics
Fast
Big Data
Biased
Analytics Vs Sampling
10
WHEN
NICE
MET
VI
• 2017- Innovation authority project on a content traffic in networks
• Their objective was identifying global events by observing twits and
classifying them according to calculated topics.
Infomedia - Global Events
12
Event Extraction – Solution Overview
13
Separate stream of
tweets into topics
Build trend lines
for each topic Identify events
Event
Event
Corpus D every document of length N
N ∼ Poisson(ξ)
θ ∼ Dir(α).
Β -Topics (array of words)
For each of the N words 𝑤 𝑛:
a topic 𝑧 𝑛 ∼ Cat(θ).
𝑤 𝑛 ~ p(𝑤 𝑛|𝑧 𝑛, β)
β𝑖𝑗 = P(𝑤𝑖|𝑧𝑗 )
p(w|α, β) = 𝑃(θ|α) 𝑖=1
𝑛
𝑧 𝑛
(p(𝑧 𝑛|θ)p(wn|𝑧 𝑛, β))dθ
Latent Dirichlet Allocation-LDA (Blei 2003)
14
• At the beginning they used Gibbs from LDA library
It took nearly a day
• Then they tried VI of gensim (its genism.models.Ldamulticore engine)
The results have been preserved but been achieved in 2 hours
• “Variational inference is that thing you implement while waiting for your Gibbs sampler to
converge." Blei
Creating Topics
15
models.LdaMulticore
VARIATIONAL-INFERENCE
NERDS’ TIME
• Recall – Our objective is finding following distribution
𝑷(𝒁, 𝑿)
𝑷(𝑿)
We are searching for analytical solution
Constructing an Analytical Solution
17
What is needed in order to construct such a solution?
1.Being familiar with the frame work
2. Having a metric function over this space
3.Having an optimization methodology
Clause 1 is obvious : we are interested in distribution functions space
Constructing an Analytical Solution (cont)
18
• A domain in Math that is analog to calculus of functionals & functions space
Euler-Lagrange eq.
𝐹, 𝑦 functions-(with all the “extras”) and J functional
J(y)= 𝐹( 𝑦, 𝑦′
, 𝑡)𝑑𝑡 (𝑦 𝑖𝑠 𝑑𝑖𝑓𝑓. )
If y is an extremum of J it satisfies Euler-Lagrange eq.
𝑑𝐹
𝑑𝑦
-
𝑑
𝑑𝑡
(
𝑑𝐹
𝑑𝑦′
) = 0
• So we have an optimization….
Calculus of Variations
Euler –Lagrange
19
• A metric on distributions “On Information and Sufficiency” 1951 (Ann Math Statist)
Let P,Q distributions :
KL(P||Q) = 𝐸 𝑃 log
𝑃
𝑄
Major Properties:
1. Non-symmetric (It actually measures a subjective distance according to P
2. Positive where 0 is obtained only for KL(P,P)
(proof by concavity of log Lagrange multipliers))
KL Divergence
20
Cross Entropy
(P,Q)= H(P)+ KL(P||Q)
PMI Pointwise mutual information
• Let X,Y random variables
• PMI(X,Y)=Log[
𝑃(𝑋=𝑎,𝑌=𝑏)
𝑃 𝑋=𝑎 𝑃(𝑌=𝑏)
]
• KL(P(X/Y=a)||Q(x)) = 𝑥 𝑃(𝑋 = 𝑥|𝑌 = 𝑎)PMI(X=x, Y=a)
KL- Applications
21
Can we approximate P(Z|X)?
min KL(Q(z)|| P(Z|X))
We have:
log(P(X)) = 𝐸 𝑄 [log P(x, Z)] − 𝐸 𝑄 [log Q(Z)] + KL(Q(Z)||P(Z|X))
ELBO-Evidence Lower Bound
Remarks
1. LHS is indecent on Z
2. log(P(X)) ≥ ELBO (log concavity)
Hence: Maximizing ELBO =>minimizing KL
VI -Let’s Develop
22
𝐸𝐿𝐵𝑂 = 𝐸 𝑄[log P(X, Z)] − 𝐸 𝑄 [log Q(Z)] = 𝑄𝐿𝑜𝑔(
𝑃(𝑋,𝑍)
𝑄(𝑍)
)= J(Q)
Q may have enormous number of variables, can we do more?
VI Development
23
H (σ) = -h 𝑥 𝜎𝑥 − 𝑗 𝑦𝑥 𝜎𝑥 𝜎 𝑦 𝜎 𝑦 ∈ {-1, 1}
Using non correlation assumption, we can the equation becomes
H (σ) = -h 𝑥 𝜎𝑥 − 𝑗𝜎𝑥 𝑥 𝜎 𝑦
The we can replace for each term the sum by the mean off its neighbors
H (σ) = 𝐸0 -μ 𝑥 𝜎𝑥
The solution single Bolzman spin dist.:
P(𝑠𝑖) = 𝑒 𝑎∗𝑠 𝑖 /(𝑒 𝑎∗𝑠 𝑖 +𝑒−𝑎∗𝑠 𝑖)
Isig Model - MFT
24
• If if Ising & Lenz can do it .why don’t we?
• We assume independency rather non-correlation
𝐸𝐿𝐵𝑂 = 𝐸 𝑄[log P(X, Z)] − 𝐸 𝑄 [log Q(Z)]
Q becomes Q(z) = 𝑖=1
𝑛
𝑞𝑖(𝑧𝑖) (Obviously not true)
• We can use now Euler –Lagrange with the constrain
𝑞𝑖(z) =1
L𝑜𝑔(𝑞𝑖) = 𝑐𝑜𝑛𝑠𝑡 + 𝐸−𝑖[𝑝 𝑥, 𝑧 ] Bolzman Dist.! (as said . We are as good as Ising)
Back to VI
25
Hidden topics extraction in Twitter
26
NUMERIC
• Blei 2018 VI- A review for statistics
The basic step is set sequentially each 𝑞𝑖 to 𝐸−𝑖[𝑝 𝑥, 𝑧 ] +constant
No 𝑖 𝑡ℎ
coordinate in the RHS (Independency)
Simply update each q until a convergence of ELBO
Coordinate Ascent Variational Inference
CAVI
28
29
• CAVI does not work well for big data (update for every item)
• Stochastic VI- rather updating the q’s, we calculate the gradient of the ELBO, and
optimize its parameters (similar to EM)
• Used in LDA applications (David Blei et al)
• http://www.columbia.edu/~jwp2128/Papers/HoffmanBleiWangPaisley2013.pdf
• https://www.cs.princeton.edu/courses/archive/fall11/cos597C/reading/Blei2011.pdf
Stochastic VI
30
31

More Related Content

What's hot

GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesNatan Katz
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
 
20070823
2007082320070823
20070823neostar
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsStefano Cabras
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsCaleb (Shiqiang) Jin
 
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...Lushanthan Sivaneasharajah
 
"reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli..."reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli...Christian Robert
 
Cs229 notes11
Cs229 notes11Cs229 notes11
Cs229 notes11VuTran231
 
Machine learning (12)
Machine learning (12)Machine learning (12)
Machine learning (12)NYversity
 
Probabilistic Control of Switched Linear Systems with Chance Constraints
Probabilistic Control of Switched Linear Systems with Chance ConstraintsProbabilistic Control of Switched Linear Systems with Chance Constraints
Probabilistic Control of Switched Linear Systems with Chance ConstraintsLeo Asselborn
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testingChristian Robert
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big DataChristian Robert
 
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)Dahua Lin
 
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015Christian Robert
 

What's hot (20)

GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectives
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
20070823
2007082320070823
20070823
 
Big model, big data
Big model, big dataBig model, big data
Big model, big data
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
 
Nested sampling
Nested samplingNested sampling
Nested sampling
 
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
 
"reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli..."reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli...
 
Cs229 notes11
Cs229 notes11Cs229 notes11
Cs229 notes11
 
Machine learning (12)
Machine learning (12)Machine learning (12)
Machine learning (12)
 
Probabilistic Control of Switched Linear Systems with Chance Constraints
Probabilistic Control of Switched Linear Systems with Chance ConstraintsProbabilistic Control of Switched Linear Systems with Chance Constraints
Probabilistic Control of Switched Linear Systems with Chance Constraints
 
8803-09-lec16.pdf
8803-09-lec16.pdf8803-09-lec16.pdf
8803-09-lec16.pdf
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testing
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)
 
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
 

Similar to NICE Research -Variational inference project

1004_theorem_proving_2018.pptx on the to
1004_theorem_proving_2018.pptx on the to1004_theorem_proving_2018.pptx on the to
1004_theorem_proving_2018.pptx on the tofariyaPatel
 
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsMethods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsRyan B Harvey, CSDP, CSM
 
More on randomization semi-definite programming and derandomization
More on randomization semi-definite programming and derandomizationMore on randomization semi-definite programming and derandomization
More on randomization semi-definite programming and derandomizationAbner Chih Yi Huang
 
Optimized Classroom Scheduling at LaGrange College
Optimized Classroom Scheduling at LaGrange CollegeOptimized Classroom Scheduling at LaGrange College
Optimized Classroom Scheduling at LaGrange CollegeJon Ernstberger
 
Universal Approximation Theorem
Universal Approximation TheoremUniversal Approximation Theorem
Universal Approximation TheoremJamie Seol
 
Finite mathematics
Finite mathematicsFinite mathematics
Finite mathematicsIgor Rivin
 
Lie Convexity for Super-Standard Arrow
Lie Convexity for Super-Standard ArrowLie Convexity for Super-Standard Arrow
Lie Convexity for Super-Standard Arrowjorgerodriguessimao
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validationgmorishita
 
Stochastic Processes Homework Help
Stochastic Processes Homework HelpStochastic Processes Homework Help
Stochastic Processes Homework HelpExcel Homework Help
 
Discrete Structure Lecture #5 & 6.pdf
Discrete Structure Lecture #5 & 6.pdfDiscrete Structure Lecture #5 & 6.pdf
Discrete Structure Lecture #5 & 6.pdfMuhammadUmerIhtisham
 
Ck31369376
Ck31369376Ck31369376
Ck31369376IJMER
 
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...AIST
 
Wide-Coverage CCG Parsing with Quantifier Scope
Wide-Coverage CCG Parsing with Quantifier ScopeWide-Coverage CCG Parsing with Quantifier Scope
Wide-Coverage CCG Parsing with Quantifier Scopedimkart
 
2 borgs
2 borgs2 borgs
2 borgsYandex
 
Generalizing Addition and Multiplication to an Operator Parametrized by a Rea...
Generalizing Addition and Multiplication to an Operator Parametrized by a Rea...Generalizing Addition and Multiplication to an Operator Parametrized by a Rea...
Generalizing Addition and Multiplication to an Operator Parametrized by a Rea...Daren Scot Wilson
 
Explaining the idea behind automatic relevance determination and bayesian int...
Explaining the idea behind automatic relevance determination and bayesian int...Explaining the idea behind automatic relevance determination and bayesian int...
Explaining the idea behind automatic relevance determination and bayesian int...Florian Wilhelm
 

Similar to NICE Research -Variational inference project (20)

1004_theorem_proving_2018.pptx on the to
1004_theorem_proving_2018.pptx on the to1004_theorem_proving_2018.pptx on the to
1004_theorem_proving_2018.pptx on the to
 
dma_ppt.pdf
dma_ppt.pdfdma_ppt.pdf
dma_ppt.pdf
 
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsMethods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
 
More on randomization semi-definite programming and derandomization
More on randomization semi-definite programming and derandomizationMore on randomization semi-definite programming and derandomization
More on randomization semi-definite programming and derandomization
 
Optimized Classroom Scheduling at LaGrange College
Optimized Classroom Scheduling at LaGrange CollegeOptimized Classroom Scheduling at LaGrange College
Optimized Classroom Scheduling at LaGrange College
 
Universal Approximation Theorem
Universal Approximation TheoremUniversal Approximation Theorem
Universal Approximation Theorem
 
Finite mathematics
Finite mathematicsFinite mathematics
Finite mathematics
 
Lie Convexity for Super-Standard Arrow
Lie Convexity for Super-Standard ArrowLie Convexity for Super-Standard Arrow
Lie Convexity for Super-Standard Arrow
 
Stochastic Processes Homework Help
Stochastic Processes Homework Help Stochastic Processes Homework Help
Stochastic Processes Homework Help
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
 
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof..."Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
 
nber_slides.pdf
nber_slides.pdfnber_slides.pdf
nber_slides.pdf
 
Stochastic Processes Homework Help
Stochastic Processes Homework HelpStochastic Processes Homework Help
Stochastic Processes Homework Help
 
Discrete Structure Lecture #5 & 6.pdf
Discrete Structure Lecture #5 & 6.pdfDiscrete Structure Lecture #5 & 6.pdf
Discrete Structure Lecture #5 & 6.pdf
 
Ck31369376
Ck31369376Ck31369376
Ck31369376
 
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...
 
Wide-Coverage CCG Parsing with Quantifier Scope
Wide-Coverage CCG Parsing with Quantifier ScopeWide-Coverage CCG Parsing with Quantifier Scope
Wide-Coverage CCG Parsing with Quantifier Scope
 
2 borgs
2 borgs2 borgs
2 borgs
 
Generalizing Addition and Multiplication to an Operator Parametrized by a Rea...
Generalizing Addition and Multiplication to an Operator Parametrized by a Rea...Generalizing Addition and Multiplication to an Operator Parametrized by a Rea...
Generalizing Addition and Multiplication to an Operator Parametrized by a Rea...
 
Explaining the idea behind automatic relevance determination and bayesian int...
Explaining the idea behind automatic relevance determination and bayesian int...Explaining the idea behind automatic relevance determination and bayesian int...
Explaining the idea behind automatic relevance determination and bayesian int...
 

More from Natan Katz

AI for PM.pptx
AI for PM.pptxAI for PM.pptx
AI for PM.pptxNatan Katz
 
SGLD Berlin ML GROUP
SGLD Berlin ML GROUPSGLD Berlin ML GROUP
SGLD Berlin ML GROUPNatan Katz
 
Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Natan Katz
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihoodNatan Katz
 
NICE Implementations of Variational Inference
NICE Implementations of Variational Inference NICE Implementations of Variational Inference
NICE Implementations of Variational Inference Natan Katz
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 

More from Natan Katz (13)

final_v.pptx
final_v.pptxfinal_v.pptx
final_v.pptx
 
AI for PM.pptx
AI for PM.pptxAI for PM.pptx
AI for PM.pptx
 
SGLD Berlin ML GROUP
SGLD Berlin ML GROUPSGLD Berlin ML GROUP
SGLD Berlin ML GROUP
 
Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs
 
Cyn meetup
Cyn meetupCyn meetup
Cyn meetup
 
Finalver
FinalverFinalver
Finalver
 
Quant2a
Quant2aQuant2a
Quant2a
 
Bismark
BismarkBismark
Bismark
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihood
 
NICE Implementations of Variational Inference
NICE Implementations of Variational Inference NICE Implementations of Variational Inference
NICE Implementations of Variational Inference
 
Ucb
UcbUcb
Ucb
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
Neural ODE
Neural ODENeural ODE
Neural ODE
 

Recently uploaded

(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
Introduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxIntroduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxrohankumarsinghrore1
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Silpa
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptxryanrooker
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfSumit Kumar yadav
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)AkefAfaneh2
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 

Recently uploaded (20)

(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Introduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxIntroduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptx
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 

NICE Research -Variational inference project

  • 2. What is Bayesian Inference? Sampling When NICE met VI Rigorous Foundation Two words on Numeric
  • 4. Bayesian Inference -Notations 4 The inputs: Evidence – The Sample of length n (numbers, categories, vectors, images) Hypothesis - An assumption about the prob. structure that creates the sample Objective : We wish to learn the conditional distribution of Hypothesis given the Evidence. This probability is called Posterior or in mathematical terms P(H|E)
  • 5. Z- R.V. that represents the hypothesis X- R.V. that represents the evidence Bayes formula: P(Z|X) = 𝑷(𝒁,𝑿) 𝑷(𝑿) Bayesian inference is therefore about working with the RHS terms. In some case studying the denominator is intractable or extremely difficult to calculate. Let’s Formulate 5
  • 6. We have K Gaussians Draw μ 𝑘 ~ 𝑁 0, τ (τ is positive) For each sample j =1…n 𝑧𝑗 ~Cat (1/K,1/K…1/K) 𝑥𝑗 ~ 𝑁(μ 𝑧 𝑗 , σ) p(𝑥1….𝑛) = μ1:𝑘 𝑙=1 𝐾 𝑃(μ𝑙) 𝑖=1 𝑛 𝑧 𝑗 𝑝( 𝑧𝑗) P(𝑥𝑖|μ 𝑧 𝑗 ) => 𝑃𝑟𝑒𝑡𝑡𝑦 𝑆ℎ𝑖𝑡 Example -GMM 6
  • 8. Traditionally posterior is learned using Markov Chain Monte Carlo (MCMC) methods : • Metropolis-Hastings • Gibbs • Hybrid Monte Carlo Today we will talk about none of these methods! Sampling 8
  • 9. “AN INTRODUCTION TO VARIATIONAL METHODS FOR GRAPHICAL MODELS” 9
  • 12. • 2017- Innovation authority project on a content traffic in networks • Their objective was identifying global events by observing twits and classifying them according to calculated topics. Infomedia - Global Events 12
  • 13. Event Extraction – Solution Overview 13 Separate stream of tweets into topics Build trend lines for each topic Identify events Event Event
  • 14. Corpus D every document of length N N ∼ Poisson(ξ) θ ∼ Dir(α). Β -Topics (array of words) For each of the N words 𝑤 𝑛: a topic 𝑧 𝑛 ∼ Cat(θ). 𝑤 𝑛 ~ p(𝑤 𝑛|𝑧 𝑛, β) β𝑖𝑗 = P(𝑤𝑖|𝑧𝑗 ) p(w|α, β) = 𝑃(θ|α) 𝑖=1 𝑛 𝑧 𝑛 (p(𝑧 𝑛|θ)p(wn|𝑧 𝑛, β))dθ Latent Dirichlet Allocation-LDA (Blei 2003) 14
  • 15. • At the beginning they used Gibbs from LDA library It took nearly a day • Then they tried VI of gensim (its genism.models.Ldamulticore engine) The results have been preserved but been achieved in 2 hours • “Variational inference is that thing you implement while waiting for your Gibbs sampler to converge." Blei Creating Topics 15 models.LdaMulticore
  • 17. • Recall – Our objective is finding following distribution 𝑷(𝒁, 𝑿) 𝑷(𝑿) We are searching for analytical solution Constructing an Analytical Solution 17
  • 18. What is needed in order to construct such a solution? 1.Being familiar with the frame work 2. Having a metric function over this space 3.Having an optimization methodology Clause 1 is obvious : we are interested in distribution functions space Constructing an Analytical Solution (cont) 18
  • 19. • A domain in Math that is analog to calculus of functionals & functions space Euler-Lagrange eq. 𝐹, 𝑦 functions-(with all the “extras”) and J functional J(y)= 𝐹( 𝑦, 𝑦′ , 𝑡)𝑑𝑡 (𝑦 𝑖𝑠 𝑑𝑖𝑓𝑓. ) If y is an extremum of J it satisfies Euler-Lagrange eq. 𝑑𝐹 𝑑𝑦 - 𝑑 𝑑𝑡 ( 𝑑𝐹 𝑑𝑦′ ) = 0 • So we have an optimization…. Calculus of Variations Euler –Lagrange 19
  • 20. • A metric on distributions “On Information and Sufficiency” 1951 (Ann Math Statist) Let P,Q distributions : KL(P||Q) = 𝐸 𝑃 log 𝑃 𝑄 Major Properties: 1. Non-symmetric (It actually measures a subjective distance according to P 2. Positive where 0 is obtained only for KL(P,P) (proof by concavity of log Lagrange multipliers)) KL Divergence 20
  • 21. Cross Entropy (P,Q)= H(P)+ KL(P||Q) PMI Pointwise mutual information • Let X,Y random variables • PMI(X,Y)=Log[ 𝑃(𝑋=𝑎,𝑌=𝑏) 𝑃 𝑋=𝑎 𝑃(𝑌=𝑏) ] • KL(P(X/Y=a)||Q(x)) = 𝑥 𝑃(𝑋 = 𝑥|𝑌 = 𝑎)PMI(X=x, Y=a) KL- Applications 21
  • 22. Can we approximate P(Z|X)? min KL(Q(z)|| P(Z|X)) We have: log(P(X)) = 𝐸 𝑄 [log P(x, Z)] − 𝐸 𝑄 [log Q(Z)] + KL(Q(Z)||P(Z|X)) ELBO-Evidence Lower Bound Remarks 1. LHS is indecent on Z 2. log(P(X)) ≥ ELBO (log concavity) Hence: Maximizing ELBO =>minimizing KL VI -Let’s Develop 22
  • 23. 𝐸𝐿𝐵𝑂 = 𝐸 𝑄[log P(X, Z)] − 𝐸 𝑄 [log Q(Z)] = 𝑄𝐿𝑜𝑔( 𝑃(𝑋,𝑍) 𝑄(𝑍) )= J(Q) Q may have enormous number of variables, can we do more? VI Development 23
  • 24. H (σ) = -h 𝑥 𝜎𝑥 − 𝑗 𝑦𝑥 𝜎𝑥 𝜎 𝑦 𝜎 𝑦 ∈ {-1, 1} Using non correlation assumption, we can the equation becomes H (σ) = -h 𝑥 𝜎𝑥 − 𝑗𝜎𝑥 𝑥 𝜎 𝑦 The we can replace for each term the sum by the mean off its neighbors H (σ) = 𝐸0 -μ 𝑥 𝜎𝑥 The solution single Bolzman spin dist.: P(𝑠𝑖) = 𝑒 𝑎∗𝑠 𝑖 /(𝑒 𝑎∗𝑠 𝑖 +𝑒−𝑎∗𝑠 𝑖) Isig Model - MFT 24
  • 25. • If if Ising & Lenz can do it .why don’t we? • We assume independency rather non-correlation 𝐸𝐿𝐵𝑂 = 𝐸 𝑄[log P(X, Z)] − 𝐸 𝑄 [log Q(Z)] Q becomes Q(z) = 𝑖=1 𝑛 𝑞𝑖(𝑧𝑖) (Obviously not true) • We can use now Euler –Lagrange with the constrain 𝑞𝑖(z) =1 L𝑜𝑔(𝑞𝑖) = 𝑐𝑜𝑛𝑠𝑡 + 𝐸−𝑖[𝑝 𝑥, 𝑧 ] Bolzman Dist.! (as said . We are as good as Ising) Back to VI 25
  • 26. Hidden topics extraction in Twitter 26
  • 28. • Blei 2018 VI- A review for statistics The basic step is set sequentially each 𝑞𝑖 to 𝐸−𝑖[𝑝 𝑥, 𝑧 ] +constant No 𝑖 𝑡ℎ coordinate in the RHS (Independency) Simply update each q until a convergence of ELBO Coordinate Ascent Variational Inference CAVI 28
  • 29. 29
  • 30. • CAVI does not work well for big data (update for every item) • Stochastic VI- rather updating the q’s, we calculate the gradient of the ELBO, and optimize its parameters (similar to EM) • Used in LDA applications (David Blei et al) • http://www.columbia.edu/~jwp2128/Papers/HoffmanBleiWangPaisley2013.pdf • https://www.cs.princeton.edu/courses/archive/fall11/cos597C/reading/Blei2011.pdf Stochastic VI 30
  • 31. 31

Editor's Notes

  1. Use case A: The problem with this use case: