SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Bayesian inversion of deterministic
dynamic causal models
Kay H. Brodersen1,2 & Ajita Gupta2
1   Department of Economics, University of Zurich, Switzerland
2   Department of Computer Science, ETH Zurich, Switzerland




                                                                 Mike West
Overview

1.    Problem setting
      Model; likelihood; prior; posterior.

2.    Variational Laplace
      Factorization of posterior; why the free-energy; energies; gradient ascent;
      adaptive step size; example.

3.    Sampling
      Transformation method; rejection method; Gibbs sampling; MCMC; Metropolis-
      Hastings; example.

4.    Model comparison
      Model evidence; Bayes factors; free-energy; prior arithmetic mean; posterior
      harmonic mean; Savage-Dickey; example.

With material from Will Penny, Klaas Enno Stephan, Chris Bishop, and Justin Chumbley.

                                                                                        2
1 Problem setting
Model = likelihood + prior




                             4
Bayesian inference is conceptually straightforward

Question 1: what do the data tell us about the
model parameters?                                    ?
 compute the posterior
                      𝑝 𝑦 𝜃, 𝑚 𝑝 𝜃 𝑚
         𝑝 𝜃 𝑦, 𝑚 =
                           𝑝 𝑦 𝑚



Question 2: which model is best?
 compute the model evidence
         𝑝 𝑚 𝑦 ∝ 𝑝 𝑦 𝑚 𝑝(𝑚)
            = ∫ 𝑝 𝑦 𝜃, 𝑚 𝑝 𝜃 𝑚 𝑑𝜃
                                                 ?


                                                         5
2 Variational Laplace
Variational Laplace in a nutshell

 Neg. free-energy       ln p  y | m   F  KL  q  ,   , p  ,  | y  
                                                                              
  approx. to model
  evidence.                         F  ln p  y,  ,   q  KL  q  ,   , p  ,  | m  
                                                                                              

 Mean field approx.                      p  ,  | y   q  ,    q   q   

 Maximise neg. free              q    exp  I   exp  ln p  y,  ,              
  energy wrt. q =                                                                  q ( ) 


                                  q     exp  I    exp  ln p  y,  ,            
  minimise divergence,
  by maximising                                                                    q ( ) 

  variational energies

 Iterative updating of sufficient statistics of approx. posteriors by gradient
  ascent.
K.E. Stephan

                                                                                               7
Variational Laplace

Assumptions


                      mean-field approximation

                         Laplace approximation




W. Penny

                                                 8
Variational Laplace

Inversion strategy

Recall the relationship between the log model evidence and the negative free-energy 𝐹:

ln 𝑝 𝑦 𝑚 = 𝐸 𝑞 ln 𝑝 𝑦 𝜃        − 𝐾𝐿 𝑞 𝜃 |𝑝 𝜃 𝑚         + 𝐾𝐿 𝑞 𝜃 ||𝑝 𝜃 𝑦, 𝑚
                                 =: 𝐹                                ≥0


Maximizing 𝐹 implies two things:
(i) we obtain a good approximation to ln 𝑝 𝑦 𝑚
(ii) the KL divergence between 𝑞 𝜃 and 𝑝 𝜃 𝑦, 𝑚 becomes minimal

Practically, we can maximize F by iteratively (EM) maximizing the variational energies:




                                                                                      W. Penny

                                                                                                 9
Variational Laplace

Implementation: gradient-ascent scheme (Newton’s method)

Newton’s Method for finding a root (1D)
                   f ( x(old ))
x(new)  x(old ) 
                   f '( x(old ))



Compute gradient vector
          𝜕𝐼(𝜃)
 𝑗𝜃   𝑖 =
          𝜕𝜃(𝑖)

Compute curvature matrix
               𝜕 2 𝐼(𝜃)
 𝐻𝜃   𝑖, 𝑗 =
             𝜕𝜃(𝑖)𝜕𝜃(𝑗)

                                                           10
Variational Laplace

Implementation: gradient-ascent scheme (Newton’s method)

Compute Newton update (change)




New estimate                       Newton’s Method for finding a root (1D)
                                                      f ( x(old ))
                                   x(new)  x(old ) 
                                                      f '( x(old ))

Big curvature -> small step
Small curvature -> big step
W. Penny

                                                                             11
Newton’s method – demonstration




Newton’s method is very efficient. However, its solution is not insensitive to the starting
point, as shown above.




http://demonstrations.wolfram.com/LearningNewtonsMethod/

                                                                                              12
Variational Laplace

 Nonlinear regression (example)

 Model (likelihood):
                                  Ground truth
                                  (known parameter values that
                                  were used to generate the data
 Data:                            on the left):




                                  where
y(t)




 W. Penny         t
                                                                   13
Variational Laplace

Nonlinear regression (example)

We begin by defining our prior:   Prior density (   )   (ground truth)




W. Penny

                                                                         14
Variational Laplace

Nonlinear regression (example)

Posterior density (   )          VL optimization (4 iterations)




                                        Starting point ( 2.9, 1.65 )

                                        True value ( 3.4012, 2.0794 )

W. Penny                                VL estimate ( 3.4, 2.1 )

                                                                        15
3 Sampling
Sampling

Deterministic approximations                 Stochastic approximations

 computationally efficient                   asymptotically exact
 efficient representation                    easily applicable general-purpose
 learning rules may give additional           algorithms
  insight                                     computationally expensive
 application initially involves hard work    storage intensive
 systematic error




                                                                                   17
Strategy 1 – Transformation method

   We can obtain samples from some distribution 𝑝 𝑧 by first sampling from the
   uniform distribution and then transforming these samples.

       1.4
                                                  2                              yields high-quality
       1.2                                                                        samples
        1                                     1.5                                easy to implement
       0.8                                                                       computationally


                                       p(z)
p(u)




                                                  1                               efficient
       0.6
                                                                                 obtaining the inverse
       0.4
                                              0.5                                 cdf can be difficult
       0.2
                          𝑢 = 0.9                                    𝑧 = 1.15
        0                                         0
             0    0.5        1                        0     1        2      3       4


                        transformation: 𝑧     𝜏       = 𝐹 −1 𝑢   𝜏




                                                                                                         18
Strategy 2 – Rejection method

When the transformation method cannot be applied, we can resort to a more
general method called rejection sampling. Here, we draw random numbers from a
simpler proposal distribution 𝑞(𝑧) and keep only some of these samples.


                                            The proposal distribution
                                             𝑞(𝑧), scaled by a factor 𝑘,
                                            represents an envelope of
                                             𝑝(𝑧).
                                                              yields high-quality
                                                                samples
                                                              easy to implement
                                                              can be computationally
                                                                efficient
                                                              computationally
                                                                inefficient if proposal is
Bishop (2007) PRML                                              a poor approximation
                                                                                       19
Strategy 3 – Gibbs sampling

Often the joint distribution of several random variables is unavailable, whereas the
full-conditional distributions are available. In this case, we can cycle over full-
conditionals to obtain samples from the joint distribution.




                                                            easy to implement
                                                            samples are serially
                                                             correlated
                                                            the full-conditions
                                                             may not be available
Bishop (2007) PRML

                                                                                       20
Strategy 4 – Markov Chain Monte Carlo (MCMC)

Idea: we can sample from a large class of distributions and overcome the problems
that previous methods face in high dimensions using a framework called Markov
Chain Monte Carlo.

                                  p(z*)
Increase in density:     p(zτ)
                                                  1


                            zτ z*
                                  p(zτ)
Decrease in density:                              ~ ( z*)
                                                  p
                                    p(z*)       ~ 
                                                  p( z )



M. Dümcke                        zτ z*
                                                                                    21
MCMC demonstration: finding a good proposal density
When the proposal distribution is too                 When it is too wide, we obtain long
narrow, we might miss a mode.                         constant stretches without an acceptance.




http://demonstrations.wolfram.com/MarkovChainMonteCarloSimulationUsingTheMetropolisAlgorithm/

                                                                                                  22
MCMC for DCM

MH creates as series of random points 𝜃 1 , 𝜃 2 , … , whose
distribution converges to the target distribution of interest. For us, this
is the posterior density 𝑝(𝜃|𝑦).

We could use the following proposal distribution:

                      𝑞 𝜃   𝜏   𝜃   𝜏−1   = 𝒩 0, 𝜎𝐶 𝜃

                                                   prior covariance

                                                 scaling factor
                                                 adapted such that acceptance
                                                 rate is between 20% and 40%

W. Penny

                                                                                23
MCMC for DCM




W. Penny

               24
MCMC – example




W. Penny

                 25
MCMC – example




W. Penny

                 26
MCMC – example


Metropolis-Hastings   Variational-Laplace




  W. Penny

                                            27
4 Model comparison
Model evidence




W. Penny

                 29
Prior arithmetic mean




W. Penny

                        30
Posterior harmonic mean




W. Penny

                          31
Savage-Dickey ratio

In many situations we wish to compare models that are nested. For example:


 𝑚 𝐹 : full model with parameters 𝜃 = 𝜃1 , 𝜃2
 𝑚 𝑅 : reduced model with 𝜃 = (𝜃1 , 0)


In this case, we can use the Savage-
Dickey ratio to obtain a Bayes factor
without having to compute the two
model evidences:                                   𝐵 𝑅𝐹 = 0.9


                    𝑝 𝜃2 = 0 𝑦, 𝑚 𝐹
           𝐵 𝑅𝐹   =
                     𝑝 𝜃2 = 0 𝑚 𝐹

W. Penny

                                                                             32
Comparison of methods


                        MCMC
                                    VL




Chumbley et al. (2007) NeuroImage

                                         33
Comparison of methods

        estimated minus true log BF




Chumbley et al. (2007) NeuroImage

                                      34

Weitere ähnliche Inhalte

Was ist angesagt?

Monte-Carlo method for Two-Stage SLP
Monte-Carlo method for Two-Stage SLPMonte-Carlo method for Two-Stage SLP
Monte-Carlo method for Two-Stage SLPSSA KPI
 
CMA-ES with local meta-models
CMA-ES with local meta-modelsCMA-ES with local meta-models
CMA-ES with local meta-modelszyedb
 
PAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ WarwickPAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ WarwickPierre Jacob
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big DataChristian Robert
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
 
Omiros' talk on the Bernoulli factory problem
Omiros' talk on the  Bernoulli factory problemOmiros' talk on the  Bernoulli factory problem
Omiros' talk on the Bernoulli factory problemBigMC
 
Two dimensional Pool Boiling
Two dimensional Pool BoilingTwo dimensional Pool Boiling
Two dimensional Pool BoilingRobvanGils
 
Markov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themMarkov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themPierre Jacob
 
Recent developments on unbiased MCMC
Recent developments on unbiased MCMCRecent developments on unbiased MCMC
Recent developments on unbiased MCMCPierre Jacob
 

Was ist angesagt? (20)

Cd Simon
Cd SimonCd Simon
Cd Simon
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Monte-Carlo method for Two-Stage SLP
Monte-Carlo method for Two-Stage SLPMonte-Carlo method for Two-Stage SLP
Monte-Carlo method for Two-Stage SLP
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Jere Koskela slides
Jere Koskela slidesJere Koskela slides
Jere Koskela slides
 
CMA-ES with local meta-models
CMA-ES with local meta-modelsCMA-ES with local meta-models
CMA-ES with local meta-models
 
PAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ WarwickPAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ Warwick
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Omiros' talk on the Bernoulli factory problem
Omiros' talk on the  Bernoulli factory problemOmiros' talk on the  Bernoulli factory problem
Omiros' talk on the Bernoulli factory problem
 
Dft
DftDft
Dft
 
Chris Sherlock's slides
Chris Sherlock's slidesChris Sherlock's slides
Chris Sherlock's slides
 
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
 
Two dimensional Pool Boiling
Two dimensional Pool BoilingTwo dimensional Pool Boiling
Two dimensional Pool Boiling
 
Markov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themMarkov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing them
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Recent developments on unbiased MCMC
Recent developments on unbiased MCMCRecent developments on unbiased MCMC
Recent developments on unbiased MCMC
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
 

Andere mochten auch

Lean Systems Engineering
Lean Systems EngineeringLean Systems Engineering
Lean Systems EngineeringMaik Pfingsten
 
Markov Chain Monte Carlo explained
Markov Chain Monte Carlo explainedMarkov Chain Monte Carlo explained
Markov Chain Monte Carlo explaineddariodigiuni
 
Lean Manufacturing Systems WithAdrian
Lean Manufacturing Systems WithAdrianLean Manufacturing Systems WithAdrian
Lean Manufacturing Systems WithAdrianAdrian Beale
 
An Analysis Of How Mc Donalds Delivers Its Products And Services
An Analysis Of How Mc Donalds Delivers Its Products And ServicesAn Analysis Of How Mc Donalds Delivers Its Products And Services
An Analysis Of How Mc Donalds Delivers Its Products And Servicesnicoledixon
 
Process capability analysis
Process capability analysisProcess capability analysis
Process capability analysisSurya Teja
 
Six Sigma : Process Capability
Six Sigma : Process CapabilitySix Sigma : Process Capability
Six Sigma : Process CapabilityLalit Padekar
 
Six Sigma & Process Capability
Six Sigma & Process CapabilitySix Sigma & Process Capability
Six Sigma & Process CapabilityEric Blumenfeld
 
Customer Journey Mapping: Lean for Hospitality.
Customer Journey Mapping: Lean for Hospitality.Customer Journey Mapping: Lean for Hospitality.
Customer Journey Mapping: Lean for Hospitality.Imad Almurib
 
Process Capability - Cp, Cpk. Pp, Ppk
Process Capability - Cp, Cpk. Pp, Ppk Process Capability - Cp, Cpk. Pp, Ppk
Process Capability - Cp, Cpk. Pp, Ppk J. García - Verdugo
 
JIT case study of Toyota
JIT case study of ToyotaJIT case study of Toyota
JIT case study of Toyotaaneela yousaf
 
Value Stream Mapping Workshop
Value Stream Mapping WorkshopValue Stream Mapping Workshop
Value Stream Mapping WorkshopMichael Sahota
 
Value Stream Mapping -The Concept
Value Stream Mapping -The ConceptValue Stream Mapping -The Concept
Value Stream Mapping -The ConceptSubhrajyoti Parida
 
Value Stream Mapping Process
Value Stream Mapping ProcessValue Stream Mapping Process
Value Stream Mapping ProcessAnand Subramaniam
 

Andere mochten auch (20)

Lean Systems Engineering
Lean Systems EngineeringLean Systems Engineering
Lean Systems Engineering
 
Markov Chain Monte Carlo explained
Markov Chain Monte Carlo explainedMarkov Chain Monte Carlo explained
Markov Chain Monte Carlo explained
 
Lean Manufacturing Systems WithAdrian
Lean Manufacturing Systems WithAdrianLean Manufacturing Systems WithAdrian
Lean Manufacturing Systems WithAdrian
 
An Analysis Of How Mc Donalds Delivers Its Products And Services
An Analysis Of How Mc Donalds Delivers Its Products And ServicesAn Analysis Of How Mc Donalds Delivers Its Products And Services
An Analysis Of How Mc Donalds Delivers Its Products And Services
 
Process capability analysis
Process capability analysisProcess capability analysis
Process capability analysis
 
Six Sigma : Process Capability
Six Sigma : Process CapabilitySix Sigma : Process Capability
Six Sigma : Process Capability
 
Six Sigma & Process Capability
Six Sigma & Process CapabilitySix Sigma & Process Capability
Six Sigma & Process Capability
 
PROCESS CAPABILITY
PROCESS CAPABILITYPROCESS CAPABILITY
PROCESS CAPABILITY
 
Process Capability[1]
Process Capability[1]Process Capability[1]
Process Capability[1]
 
Customer Journey Mapping: Lean for Hospitality.
Customer Journey Mapping: Lean for Hospitality.Customer Journey Mapping: Lean for Hospitality.
Customer Journey Mapping: Lean for Hospitality.
 
Markov chain
Markov chainMarkov chain
Markov chain
 
Process Capability - Cp, Cpk. Pp, Ppk
Process Capability - Cp, Cpk. Pp, Ppk Process Capability - Cp, Cpk. Pp, Ppk
Process Capability - Cp, Cpk. Pp, Ppk
 
JIT case study of Toyota
JIT case study of ToyotaJIT case study of Toyota
JIT case study of Toyota
 
Jit in mc donald's
Jit in mc donald'sJit in mc donald's
Jit in mc donald's
 
Basics of Process Capability
Basics of Process CapabilityBasics of Process Capability
Basics of Process Capability
 
Markov Chains
Markov ChainsMarkov Chains
Markov Chains
 
Value Stream Mapping Workshop
Value Stream Mapping WorkshopValue Stream Mapping Workshop
Value Stream Mapping Workshop
 
Value Stream Mapping -The Concept
Value Stream Mapping -The ConceptValue Stream Mapping -The Concept
Value Stream Mapping -The Concept
 
Toyota production system
Toyota production systemToyota production system
Toyota production system
 
Value Stream Mapping Process
Value Stream Mapping ProcessValue Stream Mapping Process
Value Stream Mapping Process
 

Ähnlich wie Bayesian inversion of deterministic dynamic causal models

Do we need a logic of quantum computation?
Do we need a logic of quantum computation?Do we need a logic of quantum computation?
Do we need a logic of quantum computation?Matthew Leifer
 
Introduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov ChainsIntroduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov ChainsUniversity of Salerno
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsChristian Robert
 
Talk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesTalk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesPierre Jacob
 
Variational inference
Variational inference  Variational inference
Variational inference Natan Katz
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural NetworksMasahiro Suzuki
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural NetworksNatan Katz
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
 
(Artificial) Neural Network
(Artificial) Neural Network(Artificial) Neural Network
(Artificial) Neural NetworkPutri Wikie
 
A nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formulaA nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formulaAlexander Litvinenko
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural NetworksMasahiro Suzuki
 
Multilayer Neural Networks
Multilayer Neural NetworksMultilayer Neural Networks
Multilayer Neural NetworksESCOM
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networksm.a.kirn
 
On non-negative unbiased estimators
On non-negative unbiased estimatorsOn non-negative unbiased estimators
On non-negative unbiased estimatorsPierre Jacob
 

Ähnlich wie Bayesian inversion of deterministic dynamic causal models (20)

Iwsmbvs
IwsmbvsIwsmbvs
Iwsmbvs
 
Do we need a logic of quantum computation?
Do we need a logic of quantum computation?Do we need a logic of quantum computation?
Do we need a logic of quantum computation?
 
AI Lesson 29
AI Lesson 29AI Lesson 29
AI Lesson 29
 
Lesson 29
Lesson 29Lesson 29
Lesson 29
 
Introduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov ChainsIntroduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov Chains
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
 
Talk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesTalk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniques
 
Variational inference
Variational inference  Variational inference
Variational inference
 
Serie de dyson
Serie de dysonSerie de dyson
Serie de dyson
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
(Artificial) Neural Network
(Artificial) Neural Network(Artificial) Neural Network
(Artificial) Neural Network
 
A nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formulaA nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formula
 
Review
ReviewReview
Review
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks
 
Multilayer Neural Networks
Multilayer Neural NetworksMultilayer Neural Networks
Multilayer Neural Networks
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
 
Richard Everitt's slides
Richard Everitt's slidesRichard Everitt's slides
Richard Everitt's slides
 
On non-negative unbiased estimators
On non-negative unbiased estimatorsOn non-negative unbiased estimators
On non-negative unbiased estimators
 

Bayesian inversion of deterministic dynamic causal models

  • 1. Bayesian inversion of deterministic dynamic causal models Kay H. Brodersen1,2 & Ajita Gupta2 1 Department of Economics, University of Zurich, Switzerland 2 Department of Computer Science, ETH Zurich, Switzerland Mike West
  • 2. Overview 1. Problem setting Model; likelihood; prior; posterior. 2. Variational Laplace Factorization of posterior; why the free-energy; energies; gradient ascent; adaptive step size; example. 3. Sampling Transformation method; rejection method; Gibbs sampling; MCMC; Metropolis- Hastings; example. 4. Model comparison Model evidence; Bayes factors; free-energy; prior arithmetic mean; posterior harmonic mean; Savage-Dickey; example. With material from Will Penny, Klaas Enno Stephan, Chris Bishop, and Justin Chumbley. 2
  • 4. Model = likelihood + prior 4
  • 5. Bayesian inference is conceptually straightforward Question 1: what do the data tell us about the model parameters? ?  compute the posterior 𝑝 𝑦 𝜃, 𝑚 𝑝 𝜃 𝑚 𝑝 𝜃 𝑦, 𝑚 = 𝑝 𝑦 𝑚 Question 2: which model is best?  compute the model evidence 𝑝 𝑚 𝑦 ∝ 𝑝 𝑦 𝑚 𝑝(𝑚) = ∫ 𝑝 𝑦 𝜃, 𝑚 𝑝 𝜃 𝑚 𝑑𝜃 ? 5
  • 7. Variational Laplace in a nutshell  Neg. free-energy ln p  y | m   F  KL  q  ,   , p  ,  | y     approx. to model evidence. F  ln p  y,  ,   q  KL  q  ,   , p  ,  | m      Mean field approx. p  ,  | y   q  ,    q   q     Maximise neg. free q    exp  I   exp  ln p  y,  ,    energy wrt. q =  q ( )  q     exp  I    exp  ln p  y,  ,    minimise divergence, by maximising  q ( )  variational energies  Iterative updating of sufficient statistics of approx. posteriors by gradient ascent. K.E. Stephan 7
  • 8. Variational Laplace Assumptions mean-field approximation Laplace approximation W. Penny 8
  • 9. Variational Laplace Inversion strategy Recall the relationship between the log model evidence and the negative free-energy 𝐹: ln 𝑝 𝑦 𝑚 = 𝐸 𝑞 ln 𝑝 𝑦 𝜃 − 𝐾𝐿 𝑞 𝜃 |𝑝 𝜃 𝑚 + 𝐾𝐿 𝑞 𝜃 ||𝑝 𝜃 𝑦, 𝑚 =: 𝐹 ≥0 Maximizing 𝐹 implies two things: (i) we obtain a good approximation to ln 𝑝 𝑦 𝑚 (ii) the KL divergence between 𝑞 𝜃 and 𝑝 𝜃 𝑦, 𝑚 becomes minimal Practically, we can maximize F by iteratively (EM) maximizing the variational energies: W. Penny 9
  • 10. Variational Laplace Implementation: gradient-ascent scheme (Newton’s method) Newton’s Method for finding a root (1D) f ( x(old )) x(new)  x(old )  f '( x(old )) Compute gradient vector 𝜕𝐼(𝜃) 𝑗𝜃 𝑖 = 𝜕𝜃(𝑖) Compute curvature matrix 𝜕 2 𝐼(𝜃) 𝐻𝜃 𝑖, 𝑗 = 𝜕𝜃(𝑖)𝜕𝜃(𝑗) 10
  • 11. Variational Laplace Implementation: gradient-ascent scheme (Newton’s method) Compute Newton update (change) New estimate Newton’s Method for finding a root (1D) f ( x(old )) x(new)  x(old )  f '( x(old )) Big curvature -> small step Small curvature -> big step W. Penny 11
  • 12. Newton’s method – demonstration Newton’s method is very efficient. However, its solution is not insensitive to the starting point, as shown above. http://demonstrations.wolfram.com/LearningNewtonsMethod/ 12
  • 13. Variational Laplace Nonlinear regression (example) Model (likelihood): Ground truth (known parameter values that were used to generate the data Data: on the left): where y(t) W. Penny t 13
  • 14. Variational Laplace Nonlinear regression (example) We begin by defining our prior: Prior density ( ) (ground truth) W. Penny 14
  • 15. Variational Laplace Nonlinear regression (example) Posterior density ( ) VL optimization (4 iterations) Starting point ( 2.9, 1.65 ) True value ( 3.4012, 2.0794 ) W. Penny VL estimate ( 3.4, 2.1 ) 15
  • 17. Sampling Deterministic approximations Stochastic approximations  computationally efficient  asymptotically exact  efficient representation  easily applicable general-purpose  learning rules may give additional algorithms insight  computationally expensive  application initially involves hard work  storage intensive  systematic error 17
  • 18. Strategy 1 – Transformation method We can obtain samples from some distribution 𝑝 𝑧 by first sampling from the uniform distribution and then transforming these samples. 1.4 2  yields high-quality 1.2 samples 1 1.5  easy to implement 0.8  computationally p(z) p(u) 1 efficient 0.6  obtaining the inverse 0.4 0.5 cdf can be difficult 0.2 𝑢 = 0.9 𝑧 = 1.15 0 0 0 0.5 1 0 1 2 3 4 transformation: 𝑧 𝜏 = 𝐹 −1 𝑢 𝜏 18
  • 19. Strategy 2 – Rejection method When the transformation method cannot be applied, we can resort to a more general method called rejection sampling. Here, we draw random numbers from a simpler proposal distribution 𝑞(𝑧) and keep only some of these samples. The proposal distribution 𝑞(𝑧), scaled by a factor 𝑘, represents an envelope of 𝑝(𝑧).  yields high-quality samples  easy to implement  can be computationally efficient  computationally inefficient if proposal is Bishop (2007) PRML a poor approximation 19
  • 20. Strategy 3 – Gibbs sampling Often the joint distribution of several random variables is unavailable, whereas the full-conditional distributions are available. In this case, we can cycle over full- conditionals to obtain samples from the joint distribution.  easy to implement  samples are serially correlated  the full-conditions may not be available Bishop (2007) PRML 20
  • 21. Strategy 4 – Markov Chain Monte Carlo (MCMC) Idea: we can sample from a large class of distributions and overcome the problems that previous methods face in high dimensions using a framework called Markov Chain Monte Carlo. p(z*) Increase in density: p(zτ)  1 zτ z* p(zτ) Decrease in density: ~ ( z*) p p(z*) ~  p( z ) M. Dümcke zτ z* 21
  • 22. MCMC demonstration: finding a good proposal density When the proposal distribution is too When it is too wide, we obtain long narrow, we might miss a mode. constant stretches without an acceptance. http://demonstrations.wolfram.com/MarkovChainMonteCarloSimulationUsingTheMetropolisAlgorithm/ 22
  • 23. MCMC for DCM MH creates as series of random points 𝜃 1 , 𝜃 2 , … , whose distribution converges to the target distribution of interest. For us, this is the posterior density 𝑝(𝜃|𝑦). We could use the following proposal distribution: 𝑞 𝜃 𝜏 𝜃 𝜏−1 = 𝒩 0, 𝜎𝐶 𝜃 prior covariance scaling factor adapted such that acceptance rate is between 20% and 40% W. Penny 23
  • 24. MCMC for DCM W. Penny 24
  • 27. MCMC – example Metropolis-Hastings Variational-Laplace W. Penny 27
  • 32. Savage-Dickey ratio In many situations we wish to compare models that are nested. For example: 𝑚 𝐹 : full model with parameters 𝜃 = 𝜃1 , 𝜃2 𝑚 𝑅 : reduced model with 𝜃 = (𝜃1 , 0) In this case, we can use the Savage- Dickey ratio to obtain a Bayes factor without having to compute the two model evidences: 𝐵 𝑅𝐹 = 0.9 𝑝 𝜃2 = 0 𝑦, 𝑚 𝐹 𝐵 𝑅𝐹 = 𝑝 𝜃2 = 0 𝑚 𝐹 W. Penny 32
  • 33. Comparison of methods MCMC VL Chumbley et al. (2007) NeuroImage 33
  • 34. Comparison of methods estimated minus true log BF Chumbley et al. (2007) NeuroImage 34