SlideShare a Scribd company logo
1 of 31
Variational Inference
Note: Much (meaning almost all) of this has been
liberated from John Winn and Matthew Beal’s theses,
and David McKay’s book.
Overview
• Probabilistic models & Bayesian
inference
• Variational Inference
• Univariate Gaussian Example
• GMM Example
• Variational Message Passing
Bayesian networks
• Directed graph
• Nodes represent
variables
• Links show dependencies
• Conditional distribution at
each node
• Defines a joint
distribution:
.
P(C,L,S,I)=P(L) P(C)
P(S|C) P(I|L,S)
Lighting
color
Surface
color
Image
color
Object class
C
SL
I
P(L)
P(C)
P(S|C)
P(I|L,S)
Lighting
color
Hidden
Bayesian inference
Observed
• Observed variables D
and hidden variables H.
• Hidden variables include
parameters and latent
variables.
• Learning/inference
involves finding:
• P(H1, H2…| D), or
• P(H,Θ|D,M) - explicitly for
generative model.
Surface
color
Image
color
C
SL
I
Object class
Bayesian inference vs. ML/MAP
• Consider learning one parameter θ
)(
)()|(
)|(
DP
PDP
DP
θθ
θ =
• How should we represent this posterior distribution?
)()|( θθ PDP∝
Bayesian inference vs. ML/MAP
θMAP
θ
Maximum of P(V| θ) P(θ)
• Consider learning one parameter θ
P(D| θ) P(θ)
Bayesian inference vs. ML/MAP
P(D| θ) P(θ)
θMAP
θ
High probability mass
High probability density
• Consider learning one parameter θ
Bayesian inference vs. ML/MAP
θML
θ
Samples
• Consider learning one parameter θ
P(D| θ) P(θ)
Bayesian inference vs. ML/MAP
θML
θ
Variational
approximation
)(θQ
• Consider learning one parameter θ
P(D| θ) P(θ)
Variational Inference
1. Choose a family of variational
distributions Q(H).
2. Use Kullback-Leibler divergence
KL(Q||P) as a measure of ‘distance’
between P(H|D) and Q(H).
3. Find Q which minimizes divergence.
(in three easy steps…)
Choose Variational Distribution
• P(H|D) ~ Q(H).
• If P is so complex how do we choose Q?
• Any Q is better than an ML or MAP point
estimate.
• Choose Q so it can “get” close to P and is
tractable – factorize, conjugate.
Kullback-Leibler Divergence
• Derived from Variational Free Energy by Feynman and
Bobliubov
• Relative Entropy between two probability distributions
• KL(Q||P) > 0 , for any Q (Jensen’s inequality)
• KL(Q||P) = 0 iff P = Q.
• Not true distance measure, not symmetric
∑=
X xP
xQ
xQPQKL
)(
)(
ln)()||(
Kullback-Leibler Divergence
Minimising
KL(Q||P)
P
Q
Q Exclusive
∑=
H DHP
HQ
HQ
)|(
)(
ln)(
Minimising
KL(P||Q) P
∑=
H HQ
DHP
DHP
)(
)|(
ln)|(
Inclusive
Kullback-Leibler Divergence
∑=
H DHP
HQ
HQPQKL
)|(
)(
ln)()||(
∑ ∑+=
H H
DPHQ
DHP
HQ
HQPQKL )(ln)(
),(
)(
ln)()||(
∑=
H DHP
DPHQ
HQPQKL
),(
)()(
ln)()||(
∑=
H HQ
DHP
DHPQPK
)(
)|(
ln)|()||(
∑ +=
H
DP
DHP
HQ
HQPQKL )(ln
),(
)(
ln)()||(
∑=
H DHP
HQ
HQPQKL
)|(
)(
ln)()||(
Bayes Rules
Log property
Sum over H
Kullback-Leibler Divergence
∑ ∑−≡
H H
HQHQDHPHQQL )(ln)(),(ln)()( DEFINE
• L is the difference between: expectation of the marginal
likelihood with respect to Q, and the entropy of Q
• Maximize L(Q) is equivalent to minimizing KL Divergence
• We could not do the same trick for KL(P||Q), thus we will
approximate likelihood with a function that has it’s mass where
the likelihood is most probable (exclusive).
∑ +=
H
DP
DHP
HQ
HQPQKL )(ln
),(
)(
ln)()||(
)()(ln)||( QLDPPQKL −=
Summarize
)||(KL)()(ln PQQLDP +=
∑ 





=
H HQ
DHP
HQQL
)(
),(
ln)()(
where
• For arbitrary Q(H)
• We choose a family of Q distributions
where L(Q) is tractable to compute.
maximisefixed minimise
Still difficult in general to calculate
Minimising the KL divergence
L(Q)
KL(Q || P)
ln P(D)maximise
fixed
Minimising the KL divergence
L(Q)
KL(Q || P)
ln P(D)
maximise
fixed
Minimising the KL divergence
L(Q)
KL(Q || P)
ln P(D)
maximise
fixed
Minimising the KL divergence
L(Q)
KL(Q || P)
ln P(D)
maximise
fixed
Minimising the KL divergence
L(Q)
KL(Q || P)
ln P(D)
maximise
fixed
Factorised Approximation
• Assume Q factorises
• Optimal solution for one factor given by
• Given the form of Q, find the best H in KL sense
• Choose conjugate priors P(H) to give from of Q
• Do it iteratively of each Qi(Hi)
const.),(ln)(ln *
+= ≠ijii DHPHQ
∏=
i
ii HQHQ )()(
∏∑≠
=
ji H
iiijj
i
DHPHQ
Z
HQ )),(ln)(exp(
1
)(*
Derivation
∏∑≠
=
ji H
iijj
i
DHPHQ
Z
HQ )),(ln)(exp(
1
)(*
∑ ∑−≡
H H
HQHQDHPHQQL )(ln)(),(ln)()(
∑ ∑ ∏∏∏ −=
H H j
jj
i
ii
i
ii HQHQDHPHQ )(ln)(),(ln)(
∑ ∑∏ ∑∏ −=
H H i j
jjii
i
ii HQHQDHPHQ )(ln)(),(ln)(
∑ ∑∑∏ −=
H i H
iiii
i
ii
i
HQHQDHPHQ )(ln)(),(ln)(
∑ ∑∑∑∏ ≠≠
−−=
H ji H
iiii
H
jjjj
ji
iijj
ij
HQHQHQHQDHPHQHQ )(ln)()(ln)(),(ln)()(
Log property
Substitution
Factor one term Qj
Not a Function of Qj
Idea: Use factoring of Q
to isolate Qj and
maximize L wrt Qj
ZQQKL jj log)||( *
−−=
Example: Univariate Gaussian
• Normal distribution
• Find P(µ,γ | x)
• Conjugate prior
• Factorized variational
distribution
• Q distribution same form as
prior distributions
• Inference involves updating
these hidden parameters
Example: Univariate Gaussian
• Use Q* to derive:
• Where <> is the expectation over Q function
• Iteratively solve
Example: Univariate Gaussian
• Estimate of log evidence can be found by
calculating L(Q):
• Where <.> are expectations wrt to Q(.)
Example
Take four data samples form
Gaussian (Thick Line) to find
posterior. Dashed lines distribution
from sampled variational.
Variational and True posterior from
Gaussian given four samples. P(µ) =
N(0,1000). P(γ) = Gamma(.001,.001).
VB with Image Segmentation
20 40 60 80 100 120 140 160 180
20
40
60
80
100
120
0 100 200 300
0
100
200
0 100 200 300
0
50
100
0 100 200 300
0
100
200
300
0 100 200 300
0
50
100
0 100 200 300
0
50
100
150
0 100 200 300
0
50
100
RGB histogram of two pixel
locations.
“VB at the pixel level will give
better results.”
Feature vector (x,y,Vx,Vy,r,g,b) -
will have issues with data
association.
VB with GMM will be complex –
doing this in real time will be
execrable.
Lower Bound for GMM-Ugly
Variational Equations for GMM-Ugly
Brings Up VMP – Efficient Computation
Lighting
color
Surface
color
Image
color
Object class
C
SL
I
P(L)
P(C)
P(S|C)
P(I|L,S)

More Related Content

What's hot

PRML第6章「カーネル法」
PRML第6章「カーネル法」PRML第6章「カーネル法」
PRML第6章「カーネル法」Keisuke Sugawara
 
混合ガウスモデルとEMアルゴリスム
混合ガウスモデルとEMアルゴリスム混合ガウスモデルとEMアルゴリスム
混合ガウスモデルとEMアルゴリスム貴之 八木
 
[Ridge-i 論文よみかい] Wasserstein auto encoder
[Ridge-i 論文よみかい] Wasserstein auto encoder[Ridge-i 論文よみかい] Wasserstein auto encoder
[Ridge-i 論文よみかい] Wasserstein auto encoderMasanari Kimura
 
Variational Bayes: A Gentle Introduction
Variational Bayes: A Gentle IntroductionVariational Bayes: A Gentle Introduction
Variational Bayes: A Gentle IntroductionFlavio Morelli
 
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...Deep Learning JP
 
[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習Deep Learning JP
 
[DL輪読会]Flow-based Deep Generative Models
[DL輪読会]Flow-based Deep Generative Models[DL輪読会]Flow-based Deep Generative Models
[DL輪読会]Flow-based Deep Generative ModelsDeep Learning JP
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep LearningRayKim51
 
ベイズ統計学の概論的紹介
ベイズ統計学の概論的紹介ベイズ統計学の概論的紹介
ベイズ統計学の概論的紹介Naoki Hayashi
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
PRML EP法 10.7 10.7.2
PRML EP法 10.7 10.7.2 PRML EP法 10.7 10.7.2
PRML EP法 10.7 10.7.2 tmtm otm
 
Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2Fabian Pedregosa
 
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and EditingDeep Learning JP
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion ModelsSangwoo Mo
 
猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoderSho Tatsuno
 
Stochastic Gradient MCMC
Stochastic Gradient MCMCStochastic Gradient MCMC
Stochastic Gradient MCMCKenta Oono
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models Chia-Wen Cheng
 

What's hot (20)

VQ-VAE
VQ-VAEVQ-VAE
VQ-VAE
 
PRML第6章「カーネル法」
PRML第6章「カーネル法」PRML第6章「カーネル法」
PRML第6章「カーネル法」
 
混合ガウスモデルとEMアルゴリスム
混合ガウスモデルとEMアルゴリスム混合ガウスモデルとEMアルゴリスム
混合ガウスモデルとEMアルゴリスム
 
[Ridge-i 論文よみかい] Wasserstein auto encoder
[Ridge-i 論文よみかい] Wasserstein auto encoder[Ridge-i 論文よみかい] Wasserstein auto encoder
[Ridge-i 論文よみかい] Wasserstein auto encoder
 
Variational Bayes: A Gentle Introduction
Variational Bayes: A Gentle IntroductionVariational Bayes: A Gentle Introduction
Variational Bayes: A Gentle Introduction
 
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
 
[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習
 
[DL輪読会]Flow-based Deep Generative Models
[DL輪読会]Flow-based Deep Generative Models[DL輪読会]Flow-based Deep Generative Models
[DL輪読会]Flow-based Deep Generative Models
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
ベイズ統計学の概論的紹介
ベイズ統計学の概論的紹介ベイズ統計学の概論的紹介
ベイズ統計学の概論的紹介
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
PRML 第4章
PRML 第4章PRML 第4章
PRML 第4章
 
PRML EP法 10.7 10.7.2
PRML EP法 10.7 10.7.2 PRML EP法 10.7 10.7.2
PRML EP法 10.7 10.7.2
 
Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2
 
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and Editing
 
Jokyokai
JokyokaiJokyokai
Jokyokai
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
 
猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder
 
Stochastic Gradient MCMC
Stochastic Gradient MCMCStochastic Gradient MCMC
Stochastic Gradient MCMC
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 

Similar to Variational Inference

Some Thoughts on Sampling
Some Thoughts on SamplingSome Thoughts on Sampling
Some Thoughts on SamplingDon Sheehy
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi DivergenceMasahiro Suzuki
 
Harmonic Analysis and Deep Learning
Harmonic Analysis and Deep LearningHarmonic Analysis and Deep Learning
Harmonic Analysis and Deep LearningSungbin Lim
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clusteringFrank Nielsen
 
Clustering in Hilbert geometry for machine learning
Clustering in Hilbert geometry for machine learningClustering in Hilbert geometry for machine learning
Clustering in Hilbert geometry for machine learningFrank Nielsen
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applicationsFrank Nielsen
 
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 On Clustering Histograms with k-Means by Using Mixed α-Divergences On Clustering Histograms with k-Means by Using Mixed α-Divergences
On Clustering Histograms with k-Means by Using Mixed α-DivergencesFrank Nielsen
 
On complementarity in qec and quantum cryptography
On complementarity in qec and quantum cryptographyOn complementarity in qec and quantum cryptography
On complementarity in qec and quantum cryptographywtyru1989
 
Locality-sensitive hashing for search in metric space
Locality-sensitive hashing for search in metric space Locality-sensitive hashing for search in metric space
Locality-sensitive hashing for search in metric space Eliezer Silva
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheetSuvrat Mishra
 
Variational inference
Variational inference  Variational inference
Variational inference Natan Katz
 
Linear Bayesian update surrogate for updating PCE coefficients
Linear Bayesian update surrogate for updating PCE coefficientsLinear Bayesian update surrogate for updating PCE coefficients
Linear Bayesian update surrogate for updating PCE coefficientsAlexander Litvinenko
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheetJoachim Gwoke
 
Montpellier Math Colloquium
Montpellier Math ColloquiumMontpellier Math Colloquium
Montpellier Math ColloquiumChristian Robert
 

Similar to Variational Inference (20)

QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
 
Some Thoughts on Sampling
Some Thoughts on SamplingSome Thoughts on Sampling
Some Thoughts on Sampling
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence
 
Harmonic Analysis and Deep Learning
Harmonic Analysis and Deep LearningHarmonic Analysis and Deep Learning
Harmonic Analysis and Deep Learning
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clustering
 
Clustering in Hilbert geometry for machine learning
Clustering in Hilbert geometry for machine learningClustering in Hilbert geometry for machine learning
Clustering in Hilbert geometry for machine learning
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applications
 
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 On Clustering Histograms with k-Means by Using Mixed α-Divergences On Clustering Histograms with k-Means by Using Mixed α-Divergences
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 
On complementarity in qec and quantum cryptography
On complementarity in qec and quantum cryptographyOn complementarity in qec and quantum cryptography
On complementarity in qec and quantum cryptography
 
Athens workshop on MCMC
Athens workshop on MCMCAthens workshop on MCMC
Athens workshop on MCMC
 
Probability Cheatsheet.pdf
Probability Cheatsheet.pdfProbability Cheatsheet.pdf
Probability Cheatsheet.pdf
 
Locality-sensitive hashing for search in metric space
Locality-sensitive hashing for search in metric space Locality-sensitive hashing for search in metric space
Locality-sensitive hashing for search in metric space
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Variational inference
Variational inference  Variational inference
Variational inference
 
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
 
Linear Bayesian update surrogate for updating PCE coefficients
Linear Bayesian update surrogate for updating PCE coefficientsLinear Bayesian update surrogate for updating PCE coefficients
Linear Bayesian update surrogate for updating PCE coefficients
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Montpellier Math Colloquium
Montpellier Math ColloquiumMontpellier Math Colloquium
Montpellier Math Colloquium
 
LieGroup
LieGroupLieGroup
LieGroup
 
HPWFcorePRES--FUR2016
HPWFcorePRES--FUR2016HPWFcorePRES--FUR2016
HPWFcorePRES--FUR2016
 

More from Tushar Tank

Image Processing Background Elimination in Video Editting
Image Processing Background Elimination in Video EdittingImage Processing Background Elimination in Video Editting
Image Processing Background Elimination in Video EdittingTushar Tank
 
Intuition behind Monte Carlo Markov Chains
Intuition behind Monte Carlo Markov ChainsIntuition behind Monte Carlo Markov Chains
Intuition behind Monte Carlo Markov ChainsTushar Tank
 
Bayesian Analysis Fundamentals with Examples
Bayesian Analysis Fundamentals with ExamplesBayesian Analysis Fundamentals with Examples
Bayesian Analysis Fundamentals with ExamplesTushar Tank
 
Review of CausalImpact / Bayesian Structural Time-Series Analysis
Review of CausalImpact / Bayesian Structural Time-Series AnalysisReview of CausalImpact / Bayesian Structural Time-Series Analysis
Review of CausalImpact / Bayesian Structural Time-Series AnalysisTushar Tank
 
Tech Talk overview of xgboost and review of paper
Tech Talk overview of xgboost and review of paperTech Talk overview of xgboost and review of paper
Tech Talk overview of xgboost and review of paperTushar Tank
 
Shapley Tech Talk - SHAP and Shapley Discussion
Shapley Tech Talk - SHAP and Shapley DiscussionShapley Tech Talk - SHAP and Shapley Discussion
Shapley Tech Talk - SHAP and Shapley DiscussionTushar Tank
 
Statistical Clustering
Statistical ClusteringStatistical Clustering
Statistical ClusteringTushar Tank
 
Time Frequency Analysis for Poets
Time Frequency Analysis for PoetsTime Frequency Analysis for Poets
Time Frequency Analysis for PoetsTushar Tank
 
Kalman filter upload
Kalman filter uploadKalman filter upload
Kalman filter uploadTushar Tank
 

More from Tushar Tank (10)

Image Processing Background Elimination in Video Editting
Image Processing Background Elimination in Video EdittingImage Processing Background Elimination in Video Editting
Image Processing Background Elimination in Video Editting
 
Intuition behind Monte Carlo Markov Chains
Intuition behind Monte Carlo Markov ChainsIntuition behind Monte Carlo Markov Chains
Intuition behind Monte Carlo Markov Chains
 
Bayesian Analysis Fundamentals with Examples
Bayesian Analysis Fundamentals with ExamplesBayesian Analysis Fundamentals with Examples
Bayesian Analysis Fundamentals with Examples
 
Review of CausalImpact / Bayesian Structural Time-Series Analysis
Review of CausalImpact / Bayesian Structural Time-Series AnalysisReview of CausalImpact / Bayesian Structural Time-Series Analysis
Review of CausalImpact / Bayesian Structural Time-Series Analysis
 
Tech Talk overview of xgboost and review of paper
Tech Talk overview of xgboost and review of paperTech Talk overview of xgboost and review of paper
Tech Talk overview of xgboost and review of paper
 
Shapley Tech Talk - SHAP and Shapley Discussion
Shapley Tech Talk - SHAP and Shapley DiscussionShapley Tech Talk - SHAP and Shapley Discussion
Shapley Tech Talk - SHAP and Shapley Discussion
 
Hindu ABC Book
Hindu ABC BookHindu ABC Book
Hindu ABC Book
 
Statistical Clustering
Statistical ClusteringStatistical Clustering
Statistical Clustering
 
Time Frequency Analysis for Poets
Time Frequency Analysis for PoetsTime Frequency Analysis for Poets
Time Frequency Analysis for Poets
 
Kalman filter upload
Kalman filter uploadKalman filter upload
Kalman filter upload
 

Recently uploaded

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 

Recently uploaded (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Variational Inference

  • 1. Variational Inference Note: Much (meaning almost all) of this has been liberated from John Winn and Matthew Beal’s theses, and David McKay’s book.
  • 2. Overview • Probabilistic models & Bayesian inference • Variational Inference • Univariate Gaussian Example • GMM Example • Variational Message Passing
  • 3. Bayesian networks • Directed graph • Nodes represent variables • Links show dependencies • Conditional distribution at each node • Defines a joint distribution: . P(C,L,S,I)=P(L) P(C) P(S|C) P(I|L,S) Lighting color Surface color Image color Object class C SL I P(L) P(C) P(S|C) P(I|L,S)
  • 4. Lighting color Hidden Bayesian inference Observed • Observed variables D and hidden variables H. • Hidden variables include parameters and latent variables. • Learning/inference involves finding: • P(H1, H2…| D), or • P(H,Θ|D,M) - explicitly for generative model. Surface color Image color C SL I Object class
  • 5. Bayesian inference vs. ML/MAP • Consider learning one parameter θ )( )()|( )|( DP PDP DP θθ θ = • How should we represent this posterior distribution? )()|( θθ PDP∝
  • 6. Bayesian inference vs. ML/MAP θMAP θ Maximum of P(V| θ) P(θ) • Consider learning one parameter θ P(D| θ) P(θ)
  • 7. Bayesian inference vs. ML/MAP P(D| θ) P(θ) θMAP θ High probability mass High probability density • Consider learning one parameter θ
  • 8. Bayesian inference vs. ML/MAP θML θ Samples • Consider learning one parameter θ P(D| θ) P(θ)
  • 9. Bayesian inference vs. ML/MAP θML θ Variational approximation )(θQ • Consider learning one parameter θ P(D| θ) P(θ)
  • 10. Variational Inference 1. Choose a family of variational distributions Q(H). 2. Use Kullback-Leibler divergence KL(Q||P) as a measure of ‘distance’ between P(H|D) and Q(H). 3. Find Q which minimizes divergence. (in three easy steps…)
  • 11. Choose Variational Distribution • P(H|D) ~ Q(H). • If P is so complex how do we choose Q? • Any Q is better than an ML or MAP point estimate. • Choose Q so it can “get” close to P and is tractable – factorize, conjugate.
  • 12. Kullback-Leibler Divergence • Derived from Variational Free Energy by Feynman and Bobliubov • Relative Entropy between two probability distributions • KL(Q||P) > 0 , for any Q (Jensen’s inequality) • KL(Q||P) = 0 iff P = Q. • Not true distance measure, not symmetric ∑= X xP xQ xQPQKL )( )( ln)()||(
  • 13. Kullback-Leibler Divergence Minimising KL(Q||P) P Q Q Exclusive ∑= H DHP HQ HQ )|( )( ln)( Minimising KL(P||Q) P ∑= H HQ DHP DHP )( )|( ln)|( Inclusive
  • 14. Kullback-Leibler Divergence ∑= H DHP HQ HQPQKL )|( )( ln)()||( ∑ ∑+= H H DPHQ DHP HQ HQPQKL )(ln)( ),( )( ln)()||( ∑= H DHP DPHQ HQPQKL ),( )()( ln)()||( ∑= H HQ DHP DHPQPK )( )|( ln)|()||( ∑ += H DP DHP HQ HQPQKL )(ln ),( )( ln)()||( ∑= H DHP HQ HQPQKL )|( )( ln)()||( Bayes Rules Log property Sum over H
  • 15. Kullback-Leibler Divergence ∑ ∑−≡ H H HQHQDHPHQQL )(ln)(),(ln)()( DEFINE • L is the difference between: expectation of the marginal likelihood with respect to Q, and the entropy of Q • Maximize L(Q) is equivalent to minimizing KL Divergence • We could not do the same trick for KL(P||Q), thus we will approximate likelihood with a function that has it’s mass where the likelihood is most probable (exclusive). ∑ += H DP DHP HQ HQPQKL )(ln ),( )( ln)()||( )()(ln)||( QLDPPQKL −=
  • 16. Summarize )||(KL)()(ln PQQLDP += ∑       = H HQ DHP HQQL )( ),( ln)()( where • For arbitrary Q(H) • We choose a family of Q distributions where L(Q) is tractable to compute. maximisefixed minimise Still difficult in general to calculate
  • 17. Minimising the KL divergence L(Q) KL(Q || P) ln P(D)maximise fixed
  • 18. Minimising the KL divergence L(Q) KL(Q || P) ln P(D) maximise fixed
  • 19. Minimising the KL divergence L(Q) KL(Q || P) ln P(D) maximise fixed
  • 20. Minimising the KL divergence L(Q) KL(Q || P) ln P(D) maximise fixed
  • 21. Minimising the KL divergence L(Q) KL(Q || P) ln P(D) maximise fixed
  • 22. Factorised Approximation • Assume Q factorises • Optimal solution for one factor given by • Given the form of Q, find the best H in KL sense • Choose conjugate priors P(H) to give from of Q • Do it iteratively of each Qi(Hi) const.),(ln)(ln * += ≠ijii DHPHQ ∏= i ii HQHQ )()( ∏∑≠ = ji H iiijj i DHPHQ Z HQ )),(ln)(exp( 1 )(*
  • 23. Derivation ∏∑≠ = ji H iijj i DHPHQ Z HQ )),(ln)(exp( 1 )(* ∑ ∑−≡ H H HQHQDHPHQQL )(ln)(),(ln)()( ∑ ∑ ∏∏∏ −= H H j jj i ii i ii HQHQDHPHQ )(ln)(),(ln)( ∑ ∑∏ ∑∏ −= H H i j jjii i ii HQHQDHPHQ )(ln)(),(ln)( ∑ ∑∑∏ −= H i H iiii i ii i HQHQDHPHQ )(ln)(),(ln)( ∑ ∑∑∑∏ ≠≠ −−= H ji H iiii H jjjj ji iijj ij HQHQHQHQDHPHQHQ )(ln)()(ln)(),(ln)()( Log property Substitution Factor one term Qj Not a Function of Qj Idea: Use factoring of Q to isolate Qj and maximize L wrt Qj ZQQKL jj log)||( * −−=
  • 24. Example: Univariate Gaussian • Normal distribution • Find P(µ,γ | x) • Conjugate prior • Factorized variational distribution • Q distribution same form as prior distributions • Inference involves updating these hidden parameters
  • 25. Example: Univariate Gaussian • Use Q* to derive: • Where <> is the expectation over Q function • Iteratively solve
  • 26. Example: Univariate Gaussian • Estimate of log evidence can be found by calculating L(Q): • Where <.> are expectations wrt to Q(.)
  • 27. Example Take four data samples form Gaussian (Thick Line) to find posterior. Dashed lines distribution from sampled variational. Variational and True posterior from Gaussian given four samples. P(µ) = N(0,1000). P(γ) = Gamma(.001,.001).
  • 28. VB with Image Segmentation 20 40 60 80 100 120 140 160 180 20 40 60 80 100 120 0 100 200 300 0 100 200 0 100 200 300 0 50 100 0 100 200 300 0 100 200 300 0 100 200 300 0 50 100 0 100 200 300 0 50 100 150 0 100 200 300 0 50 100 RGB histogram of two pixel locations. “VB at the pixel level will give better results.” Feature vector (x,y,Vx,Vy,r,g,b) - will have issues with data association. VB with GMM will be complex – doing this in real time will be execrable.
  • 29. Lower Bound for GMM-Ugly
  • 31. Brings Up VMP – Efficient Computation Lighting color Surface color Image color Object class C SL I P(L) P(C) P(S|C) P(I|L,S)

Editor's Notes

  1. Illustration ML vs. Bayesian – for Bayesian methods, mention sampling WRITE CONCLUSION SLIDE!! Maximum likelihood/MAP Finds point estimates of hidden variables Vulnerable to over-fitting Variational inference Finds posterior distributions over hidden variables Allows direct model comparison
  2. Illustration ML vs. Bayesian – for Bayesian methods, mention sampling WRITE CONCLUSION SLIDE!! Maximum likelihood/MAP Finds point estimates of hidden variables Vulnerable to over-fitting Variational inference Finds posterior distributions over hidden variables Allows direct model comparison
  3. Illustration ML vs. Bayesian – for Bayesian methods, mention sampling WRITE CONCLUSION SLIDE!! Maximum likelihood/MAP Finds point estimates of hidden variables Vulnerable to over-fitting Variational inference Finds posterior distributions over hidden variables Allows direct model comparison
  4. Illustration ML vs. Bayesian – for Bayesian methods, mention sampling WRITE CONCLUSION SLIDE!! Maximum likelihood/MAP Finds point estimates of hidden variables Vulnerable to over-fitting Variational inference Finds posterior distributions over hidden variables Allows direct model comparison
  5. Illustration ML vs. Bayesian – for Bayesian methods, mention sampling WRITE CONCLUSION SLIDE!! Maximum likelihood/MAP Finds point estimates of hidden variables Vulnerable to over-fitting Variational inference Finds posterior distributions over hidden variables Allows direct model comparison
  6. Guarantees to increase the lower bound – unless already at a maximum.