SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Bayesian Posterior Inference
in the
Big Data Arena
Max Welling
Anoop Korattikara
Outline
• Introduction
• Stochastic Variational Inference
– Variational Inference 101
– Stochastic Variational Inference
– Deep Generative Models with SVB
• MCMC with mini-batches
– MCMC 101
– MCMC using noisy gradients
– MCMC using noisy Metropolis-Hastings
– Theoretical results
• Conclusion
Big Data (mine is bigger than yours)
Square Kilometer Array (SKA) produces 1 Exabyte per day by 2024…
(interested to do approximate inference on this data, talk to me)
Introduction
Why do we need posterior inference if the datasets are BIG?
p>>N
Big data may mean large p, small n
Gene expression data
fMRI data
5
Planning
Planning against uncertainty needs probabilities
6
Little data inside Big data
Not every data-case carries information about every model component
New user with no ratings
(cold start problem)
7
1943: First NN
(+/- N=10)
1988: NetTalk
(+/- N=20K)
2009: Hinton’s
Deep Belief Net
(+/- N=10M)
2013: Google/Y!
(N=+/- 10B)
Big Models!
Models grow faster than useful information in data
8
Two Ingredients for Big Data Bayes
Any big data posterior inference algorithm should:
1. easily run on a distributed architecture.
2. only use a small mini-batch of the data at every iteration.
Bayesian Posterior Inference
Variational Inference Sampling
Variational Family Q
All probability distributions
• Deterministic
• Biased
• Local minima
• Easy to assess convergence
• Stochastic (sample error)
• Unbiased
• Hard to mix between modes
• Hard to assess convergence
Variational Bayes
11
Hinton & van Camp (1993)
Neal & Hinton (1999)
Saul & Jordan (1996)
Saul, Jaakkola & Jordan (1996)
Attias (1999,2000)
Wiegerinck (2000)
Ghahramani & Beal (2000,2001)
Coordinate descent on Q
P
Q
(Bishop, Pattern Recognition
and Machine Learning)
Stochastic VB Hoffman, Blei & Bach, 2010
Stochastic natural gradient descent on Q
12
• P and Q in exponential family.
• Q factorized:
• At every iteration: subsample n<<N data-cases:
• solve analytically.
• update parameter using stochastic natural gradient descent.
General SVB
very high variance
sample
13
subsample X
(ignoring latent variables Z)
Reparameterization Trick
14
-Variational Bayesian Inference with Stochastic Search [D.M. Blei, M.I. Jordan and J.W. Paisley, 2012]
-Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression [T. Salimans and A. Knowles, 2013].
-Black Box Variational Inference. [R. Ranganath, S. Gerrish and D.M. Blei. 2013]
-Stochastic Variational Inference [M.D. Hoffman, D. Blei, C. Wang and J. Paisley, 2013]
-Estimating or propagating gradients through stochastic neurons. [Y. Bengio, 2013].
-Neural Variational Inference and Learning in Belief Networks. [A. Mnih and K. Gregor, 2014]
Kingma 2013, Bengio 2013, Kingma & W. 2014
Other solutions to solve the same "large variance problem":
Talk Monday June 23, 15:20
In Track F (Deep Learning II)
“Efficient Gradient Based Inference through Transformations between
Bayes Nets and Neural Nets”
Auto Encoding Variational Bayes
Both P(X|Z) and Q(Z|X) are general models
(e.g. deep neural net)
Kingma & W., 2013, Rezende et al 2014
15
The Helmholtz machine
Wake/Sleep algorithm
Dayan, Hinton, Neal, Zemel, 1995
Z
X
Q(Z|X)
P(X|Z)P(Z)
The VB Landscape
SVB SSVB
AEVB FSSVB
Stochastic
Variational Bayes
Auto-Encoding
Variational Bayes
Structured Stoch.
Variational Bayes
Fully Struc. Stoch.
Variational Bayes (ICML 2015)
Variational Auto-Encoder
(with 2 latent variables)
17
Face Model
Semi-supervised Model
Z
X
Y
Q(Y,Z|X) = Q(Z|Y,X)Q(Y|X)
Analogies: Fix Z, vary Y, sample X|Z,Y
P(X,Z,Y) = P(X|Z,Y)P(Y)P(Z)
Kingma, Rezende, Mohamed, Wierstra, W., 2014
REFERENCES SVB:
-Practical Variational Inference for Neural Networks [Alex Graves, 2011]
-Variational Bayesian Inference with Stochastic Search [D.M. Blei, M.I. Jordan and J.W. Paisley, 2012]
-Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression. Bayesian Analysis [T. Salimans and A. Knowles, 2013].
-Black Box Variational Inference. [R. Ranganath, S. Gerrish and D.M. Blei. 2013]
-Stochastic Variational Inference [M.D. Hoffman, D. Blei, C. Wang and J. Paisley, 2013]
-Stochastic Structured Mean Field Variational Inference [Matthew Hoffman, 2013]
-Doubly Stochastic Variational Bayes for non-Conjugate Inference [M. K. Titsias and M. Lázaro-Gredilla, 2014]
REFERENCES STOCHASTIC BACKPROP AND DEEP GENERATIVE MODELS
-Fast Gradient-Based Inference with Continuous Latent Variable Models in Auxiliary Form. [D.P. Kingma, 2013].
-Estimating or propagating gradients through stochastic neurons. [Y. Bengio, 2013].
-Auto-Encoding Variational Bayes [D.P. Kingma and M. W., 2013].
-Semi-supervised Learning with Deep Generative Models [D.P. Kingma, D.J. Rezende, S. Mohamed, M. W., 2014]
-Efficient Gradient-Based Inference through Transformations between Bayes Nets and Neural Nets [D.P. Kingma and M. W., 2014]
-Deep Generative Stochastic Networks Trainable by Backprop [Y. Bengio, E. Laufer, G. Alain, J, Yosinski, 2014]
-Stochastic Back-propagation and Approximate Inference in Deep Generative Models [D.J. Rezende, S. Mohamed and D. Wierstra, 2014]
-Deep AutoRegressive Networks [K. Gregor, A. Mnih and D. Wierstra, 2014].
-Neural Variational Inference and Learning in Belief Networks. [A. Mnih and K. Gregor, 2014].
References: Lots of action at ICML 2014!
Sampling 101 – Why MCMC?
Generating Independent Samples
Sample from g and suppress samples with low p(θ|X)
e.g. a) Rejection Sampling b) Importance Sampling
- Does not scale to high dimensions
Markov Chain Monte Carlo
• Make steps by perturbing previous sample
• Probability of visiting a state is equal to P(θ|X)
Sampling 101 – What is MCMC?
Burn-in ( Throw away) Samples from S0
Auto correlation time
0 200 400 600 800 1000
−3−2−10123
iteration
lastpositioncoordinate
Random−walk Metropolis
0 200 400 600 800 1000
−3−2−10123
iteration
lastpositioncoordinate
Hamiltonian Monte Carlo
0 200 400 600 800 1000
−3−2−10123
iteration
lastpositioncoordinate
Random−walk Metropolis
0 200 400 600 800 1000
−3−2−10123
iteration
lastpositioncoordinate
Hamiltonian Monte Carlo
High τ Low τ
Sampling 101 – Metropolis-Hastings
Transition Kernel T(θt+1|θt)
Accept/Reject TestPropose
Is the new state
more probable?
Is it easy to come back
to the current state?
For Bayesian Posterior Inference,
2) is too high.
1) Burn-in is unnecessarily slow.
Approximate MCMC
Low
Variance
( Fast )
High
Variance
( Slow )
High Bias Low Bias
xx
x
x
x x
x xx x
x x
x
x
x
x
xx
x
x
x x
x
x
x
x
x
x
x x
x
x
x
x
Decreasing ϵ
Minimizing Risk
X Axis – ϵ, Y Axis – Bias2, Variance, Risk
Computational Time
25
Risk Bias Variance
= +
2
Given finite sampling
time, ϵ=0 is not the
optimal setting.
Designing fast MCMC samplers
Method 2
Develop a proposal with
acceptance probability ≈ 1
and avoid the expensive
accept/reject test
Propose Accept/Reject
O(N)
Method 1
Develop an approximate
accept/reject test that uses
only a fraction of the data
Stochastic Gradient Langevin Dynamics
Langevin Dynamics
Stochastic Gradient Langevin Dynamics (SGLD)
θt+1 is then accepted /rejected using a Metropolis-Hastings test
Avoid expensive Metropolis-Hastings test by keeping ε small
W. & Teh, 2011
SGLD & Optimization
28
SGLD & Optimization
29
The SGLD Knob
Burn-in using
SGA
Biased
sampling
Exact sampling
Decrease ϵ over time
Low
Variance
( Fast )
High
Variance
( Slow )
High Bias Low Bias
xx
x
x
x x
x xx x
x x
x
x
x
x
xx
x
x
x x
x
x
x
x
x
x
x x
x
x
x
x
Demo: SGLD
31

Weitere ähnliche Inhalte

Ähnlich wie Deep generative learning_icml_part1

Representation Learning & Generative Modeling with Variational Autoencoder(VA...
Representation Learning & Generative Modeling with Variational Autoencoder(VA...Representation Learning & Generative Modeling with Variational Autoencoder(VA...
Representation Learning & Generative Modeling with Variational Autoencoder(VA...changedaeoh
 
SGLD Berlin ML GROUP
SGLD Berlin ML GROUPSGLD Berlin ML GROUP
SGLD Berlin ML GROUPNatan Katz
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural NetworksNatan Katz
 
ProbabilisticModeling20080411
ProbabilisticModeling20080411ProbabilisticModeling20080411
ProbabilisticModeling20080411Clay Stanek
 
Risk Classification with an Adaptive Naive Bayes Kernel Machine Model
Risk Classification with an Adaptive Naive Bayes Kernel Machine ModelRisk Classification with an Adaptive Naive Bayes Kernel Machine Model
Risk Classification with an Adaptive Naive Bayes Kernel Machine ModelJessica Minnier
 
Refining Bayesian Data Analysis Methods for Use with Longer Waveforms
Refining Bayesian Data Analysis Methods for Use with Longer WaveformsRefining Bayesian Data Analysis Methods for Use with Longer Waveforms
Refining Bayesian Data Analysis Methods for Use with Longer WaveformsJames Bell
 
Distance-based bias in model-directed optimization of additively decomposable...
Distance-based bias in model-directed optimization of additively decomposable...Distance-based bias in model-directed optimization of additively decomposable...
Distance-based bias in model-directed optimization of additively decomposable...Martin Pelikan
 
Simulation and Modeling of Gravitational Wave Sources
Simulation and Modeling of Gravitational Wave SourcesSimulation and Modeling of Gravitational Wave Sources
Simulation and Modeling of Gravitational Wave SourcesChristian Ott
 

Ähnlich wie Deep generative learning_icml_part1 (15)

Representation Learning & Generative Modeling with Variational Autoencoder(VA...
Representation Learning & Generative Modeling with Variational Autoencoder(VA...Representation Learning & Generative Modeling with Variational Autoencoder(VA...
Representation Learning & Generative Modeling with Variational Autoencoder(VA...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
SGLD Berlin ML GROUP
SGLD Berlin ML GROUPSGLD Berlin ML GROUP
SGLD Berlin ML GROUP
 
How to Accelerate Molecular Simulations with Data? by Žofia Trsťanová, Machin...
How to Accelerate Molecular Simulations with Data? by Žofia Trsťanová, Machin...How to Accelerate Molecular Simulations with Data? by Žofia Trsťanová, Machin...
How to Accelerate Molecular Simulations with Data? by Žofia Trsťanová, Machin...
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
 
kape_science
kape_sciencekape_science
kape_science
 
Into to prob_prog_hari
Into to prob_prog_hariInto to prob_prog_hari
Into to prob_prog_hari
 
ProbabilisticModeling20080411
ProbabilisticModeling20080411ProbabilisticModeling20080411
ProbabilisticModeling20080411
 
CLIM: Transition Workshop - Accounting for Model Errors Due to Sub-Grid Scale...
CLIM: Transition Workshop - Accounting for Model Errors Due to Sub-Grid Scale...CLIM: Transition Workshop - Accounting for Model Errors Due to Sub-Grid Scale...
CLIM: Transition Workshop - Accounting for Model Errors Due to Sub-Grid Scale...
 
ppt
pptppt
ppt
 
Risk Classification with an Adaptive Naive Bayes Kernel Machine Model
Risk Classification with an Adaptive Naive Bayes Kernel Machine ModelRisk Classification with an Adaptive Naive Bayes Kernel Machine Model
Risk Classification with an Adaptive Naive Bayes Kernel Machine Model
 
Refining Bayesian Data Analysis Methods for Use with Longer Waveforms
Refining Bayesian Data Analysis Methods for Use with Longer WaveformsRefining Bayesian Data Analysis Methods for Use with Longer Waveforms
Refining Bayesian Data Analysis Methods for Use with Longer Waveforms
 
Distance-based bias in model-directed optimization of additively decomposable...
Distance-based bias in model-directed optimization of additively decomposable...Distance-based bias in model-directed optimization of additively decomposable...
Distance-based bias in model-directed optimization of additively decomposable...
 
Simulation and Modeling of Gravitational Wave Sources
Simulation and Modeling of Gravitational Wave SourcesSimulation and Modeling of Gravitational Wave Sources
Simulation and Modeling of Gravitational Wave Sources
 
CLIM Program: Remote Sensing Workshop, Computational and Statistical Trade-of...
CLIM Program: Remote Sensing Workshop, Computational and Statistical Trade-of...CLIM Program: Remote Sensing Workshop, Computational and Statistical Trade-of...
CLIM Program: Remote Sensing Workshop, Computational and Statistical Trade-of...
 

Kürzlich hochgeladen

Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body Areesha Ahmad
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLkantirani197
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxSilpa
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxANSARKHAN96
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Silpa
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxSilpa
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptxArvind Kumar
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry Areesha Ahmad
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Silpa
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Silpa
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxSilpa
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.Silpa
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Silpa
 

Kürzlich hochgeladen (20)

Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 

Deep generative learning_icml_part1

  • 1. Bayesian Posterior Inference in the Big Data Arena Max Welling Anoop Korattikara
  • 2. Outline • Introduction • Stochastic Variational Inference – Variational Inference 101 – Stochastic Variational Inference – Deep Generative Models with SVB • MCMC with mini-batches – MCMC 101 – MCMC using noisy gradients – MCMC using noisy Metropolis-Hastings – Theoretical results • Conclusion
  • 3. Big Data (mine is bigger than yours) Square Kilometer Array (SKA) produces 1 Exabyte per day by 2024… (interested to do approximate inference on this data, talk to me)
  • 4. Introduction Why do we need posterior inference if the datasets are BIG?
  • 5. p>>N Big data may mean large p, small n Gene expression data fMRI data 5
  • 7. Little data inside Big data Not every data-case carries information about every model component New user with no ratings (cold start problem) 7
  • 8. 1943: First NN (+/- N=10) 1988: NetTalk (+/- N=20K) 2009: Hinton’s Deep Belief Net (+/- N=10M) 2013: Google/Y! (N=+/- 10B) Big Models! Models grow faster than useful information in data 8
  • 9. Two Ingredients for Big Data Bayes Any big data posterior inference algorithm should: 1. easily run on a distributed architecture. 2. only use a small mini-batch of the data at every iteration.
  • 10. Bayesian Posterior Inference Variational Inference Sampling Variational Family Q All probability distributions • Deterministic • Biased • Local minima • Easy to assess convergence • Stochastic (sample error) • Unbiased • Hard to mix between modes • Hard to assess convergence
  • 11. Variational Bayes 11 Hinton & van Camp (1993) Neal & Hinton (1999) Saul & Jordan (1996) Saul, Jaakkola & Jordan (1996) Attias (1999,2000) Wiegerinck (2000) Ghahramani & Beal (2000,2001) Coordinate descent on Q P Q (Bishop, Pattern Recognition and Machine Learning)
  • 12. Stochastic VB Hoffman, Blei & Bach, 2010 Stochastic natural gradient descent on Q 12 • P and Q in exponential family. • Q factorized: • At every iteration: subsample n<<N data-cases: • solve analytically. • update parameter using stochastic natural gradient descent.
  • 13. General SVB very high variance sample 13 subsample X (ignoring latent variables Z)
  • 14. Reparameterization Trick 14 -Variational Bayesian Inference with Stochastic Search [D.M. Blei, M.I. Jordan and J.W. Paisley, 2012] -Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression [T. Salimans and A. Knowles, 2013]. -Black Box Variational Inference. [R. Ranganath, S. Gerrish and D.M. Blei. 2013] -Stochastic Variational Inference [M.D. Hoffman, D. Blei, C. Wang and J. Paisley, 2013] -Estimating or propagating gradients through stochastic neurons. [Y. Bengio, 2013]. -Neural Variational Inference and Learning in Belief Networks. [A. Mnih and K. Gregor, 2014] Kingma 2013, Bengio 2013, Kingma & W. 2014 Other solutions to solve the same "large variance problem": Talk Monday June 23, 15:20 In Track F (Deep Learning II) “Efficient Gradient Based Inference through Transformations between Bayes Nets and Neural Nets”
  • 15. Auto Encoding Variational Bayes Both P(X|Z) and Q(Z|X) are general models (e.g. deep neural net) Kingma & W., 2013, Rezende et al 2014 15 The Helmholtz machine Wake/Sleep algorithm Dayan, Hinton, Neal, Zemel, 1995 Z X Q(Z|X) P(X|Z)P(Z)
  • 16. The VB Landscape SVB SSVB AEVB FSSVB Stochastic Variational Bayes Auto-Encoding Variational Bayes Structured Stoch. Variational Bayes Fully Struc. Stoch. Variational Bayes (ICML 2015)
  • 17. Variational Auto-Encoder (with 2 latent variables) 17
  • 19. Semi-supervised Model Z X Y Q(Y,Z|X) = Q(Z|Y,X)Q(Y|X) Analogies: Fix Z, vary Y, sample X|Z,Y P(X,Z,Y) = P(X|Z,Y)P(Y)P(Z) Kingma, Rezende, Mohamed, Wierstra, W., 2014
  • 20. REFERENCES SVB: -Practical Variational Inference for Neural Networks [Alex Graves, 2011] -Variational Bayesian Inference with Stochastic Search [D.M. Blei, M.I. Jordan and J.W. Paisley, 2012] -Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression. Bayesian Analysis [T. Salimans and A. Knowles, 2013]. -Black Box Variational Inference. [R. Ranganath, S. Gerrish and D.M. Blei. 2013] -Stochastic Variational Inference [M.D. Hoffman, D. Blei, C. Wang and J. Paisley, 2013] -Stochastic Structured Mean Field Variational Inference [Matthew Hoffman, 2013] -Doubly Stochastic Variational Bayes for non-Conjugate Inference [M. K. Titsias and M. Lázaro-Gredilla, 2014] REFERENCES STOCHASTIC BACKPROP AND DEEP GENERATIVE MODELS -Fast Gradient-Based Inference with Continuous Latent Variable Models in Auxiliary Form. [D.P. Kingma, 2013]. -Estimating or propagating gradients through stochastic neurons. [Y. Bengio, 2013]. -Auto-Encoding Variational Bayes [D.P. Kingma and M. W., 2013]. -Semi-supervised Learning with Deep Generative Models [D.P. Kingma, D.J. Rezende, S. Mohamed, M. W., 2014] -Efficient Gradient-Based Inference through Transformations between Bayes Nets and Neural Nets [D.P. Kingma and M. W., 2014] -Deep Generative Stochastic Networks Trainable by Backprop [Y. Bengio, E. Laufer, G. Alain, J, Yosinski, 2014] -Stochastic Back-propagation and Approximate Inference in Deep Generative Models [D.J. Rezende, S. Mohamed and D. Wierstra, 2014] -Deep AutoRegressive Networks [K. Gregor, A. Mnih and D. Wierstra, 2014]. -Neural Variational Inference and Learning in Belief Networks. [A. Mnih and K. Gregor, 2014]. References: Lots of action at ICML 2014!
  • 21. Sampling 101 – Why MCMC? Generating Independent Samples Sample from g and suppress samples with low p(θ|X) e.g. a) Rejection Sampling b) Importance Sampling - Does not scale to high dimensions Markov Chain Monte Carlo • Make steps by perturbing previous sample • Probability of visiting a state is equal to P(θ|X)
  • 22. Sampling 101 – What is MCMC? Burn-in ( Throw away) Samples from S0 Auto correlation time 0 200 400 600 800 1000 −3−2−10123 iteration lastpositioncoordinate Random−walk Metropolis 0 200 400 600 800 1000 −3−2−10123 iteration lastpositioncoordinate Hamiltonian Monte Carlo 0 200 400 600 800 1000 −3−2−10123 iteration lastpositioncoordinate Random−walk Metropolis 0 200 400 600 800 1000 −3−2−10123 iteration lastpositioncoordinate Hamiltonian Monte Carlo High τ Low τ
  • 23. Sampling 101 – Metropolis-Hastings Transition Kernel T(θt+1|θt) Accept/Reject TestPropose Is the new state more probable? Is it easy to come back to the current state? For Bayesian Posterior Inference, 2) is too high. 1) Burn-in is unnecessarily slow.
  • 24. Approximate MCMC Low Variance ( Fast ) High Variance ( Slow ) High Bias Low Bias xx x x x x x xx x x x x x x x xx x x x x x x x x x x x x x x x x Decreasing ϵ
  • 25. Minimizing Risk X Axis – ϵ, Y Axis – Bias2, Variance, Risk Computational Time 25 Risk Bias Variance = + 2 Given finite sampling time, ϵ=0 is not the optimal setting.
  • 26. Designing fast MCMC samplers Method 2 Develop a proposal with acceptance probability ≈ 1 and avoid the expensive accept/reject test Propose Accept/Reject O(N) Method 1 Develop an approximate accept/reject test that uses only a fraction of the data
  • 27. Stochastic Gradient Langevin Dynamics Langevin Dynamics Stochastic Gradient Langevin Dynamics (SGLD) θt+1 is then accepted /rejected using a Metropolis-Hastings test Avoid expensive Metropolis-Hastings test by keeping ε small W. & Teh, 2011
  • 30. The SGLD Knob Burn-in using SGA Biased sampling Exact sampling Decrease ϵ over time Low Variance ( Fast ) High Variance ( Slow ) High Bias Low Bias xx x x x x x xx x x x x x x x xx x x x x x x x x x x x x x x x x

Hinweis der Redaktion

  1. Properties – Variational Inference is inherently biased, MCMC is unbiased given infinite sampling time, etc. Main Latex Equations: q^* = min_{q in Q} ext{KL} [ q( heta) || p( heta| X)] mathbb{E}_{p( heta|X)}[ f( heta) ] approx frac{1}{T} sum_{t=1}^{T} f( heta_t) \ ext{~where~} heta_t sim p( heta|X)
  2. Is there too much information on this slide? Latex: Given target distribution $S_0$, design transitions s.t. $p_t( heta_t) o S_0$ as $t o infty$
  3. S_0( heta) propto p( heta) prod_{i=1}^N p(x_i| heta)
  4. ext{Use samples from~} mathcal{S}_epsilon ext{~(instead of~} mathcal{S}_0 ext{)} ext{~to compute~} langle f angle_{mathcal{S}_0}
  5. heta’ leftarrow heta_t + frac{epsilon}{2} abla_ heta mathcallog {S}_0 ( heta_t) + eta ext{~~~~where~} eta sim mathcal{N}(0,epsilon) heta_{t+1} leftarrow heta_t + frac{epsilon}{2} abla_ heta mathcallog {S}_0 ( heta_t) + mathbb{N}(epsilon) ext{Bias~} = langle f angle_{mathcal{S}_0} - langle f angle_{mathcal{S}_epsilon} = O(epsilon) ext{Bayesian Posterior:~~} abla_ heta mathcallog {S}_0 ( heta_t) = frac{N}{n} sum_{i=1}^n abla log l(x_i| heta_t) + abla log ho( heta_t) qquad O(n)