SlideShare ist ein Scribd-Unternehmen logo
1 von 176
Downloaden Sie, um offline zu lesen
From MCMC to ABC Methods




                       From MCMC to ABC Methods

                                    Christian P. Robert

                             Universit´ Paris-Dauphine, IuF, & CREST
                                      e
                            http://www.ceremade.dauphine.fr/~xian


                           O’Bayes 11, Shanghai, June 10, 2011
From MCMC to ABC Methods




Outline
Computational issues in Bayesian statistics

The Metropolis-Hastings Algorithm

The Gibbs Sampler

Approximate Bayesian computation

ABC for model choice
From MCMC to ABC Methods
  Computational issues in Bayesian statistics




A typology of Bayes computational problems

       (i). use of a complex parameter space, as for instance in
            constrained parameter sets like those resulting from imposing
            stationarity constraints in dynamic models;
From MCMC to ABC Methods
  Computational issues in Bayesian statistics




A typology of Bayes computational problems

       (i). use of a complex parameter space, as for instance in
            constrained parameter sets like those resulting from imposing
            stationarity constraints in dynamic models;
      (ii). use of a complex sampling model with an intractable
            likelihood, as for instance in some latent variable or graphical
            models or in inverse problems;
From MCMC to ABC Methods
  Computational issues in Bayesian statistics




A typology of Bayes computational problems

       (i). use of a complex parameter space, as for instance in
            constrained parameter sets like those resulting from imposing
            stationarity constraints in dynamic models;
      (ii). use of a complex sampling model with an intractable
            likelihood, as for instance in some latent variable or graphical
            models or in inverse problems;
     (iii). use of a huge dataset;
From MCMC to ABC Methods
  Computational issues in Bayesian statistics




A typology of Bayes computational problems

       (i). use of a complex parameter space, as for instance in
            constrained parameter sets like those resulting from imposing
            stationarity constraints in dynamic models;
      (ii). use of a complex sampling model with an intractable
            likelihood, as for instance in some latent variable or graphical
            models or in inverse problems;
     (iii). use of a huge dataset;
     (iv). use of a complex prior distribution (which may be the
            posterior distribution associated with an earlier sample);
From MCMC to ABC Methods
  Computational issues in Bayesian statistics




A typology of Bayes computational problems

       (i). use of a complex parameter space, as for instance in
            constrained parameter sets like those resulting from imposing
            stationarity constraints in dynamic models;
      (ii). use of a complex sampling model with an intractable
            likelihood, as for instance in some latent variable or graphical
            models or in inverse problems;
     (iii). use of a huge dataset;
     (iv). use of a complex prior distribution (which may be the
            posterior distribution associated with an earlier sample);
      (v). use of a particular inferential procedure as for instance, Bayes
            factors
                                   π            P (θ ∈ Θ0 | x)   π(θ ∈ Θ0 )
                                  B01 (x) =                                 .
                                                P (θ ∈ Θ1 | x)   π(θ ∈ Θ1 )
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm




The Metropolis-Hastings Algorithm

      Computational issues in Bayesian statistics

      The Metropolis-Hastings Algorithm
         Monte Carlo basics
         Monte Carlo Methods based on Markov Chains
         The Metropolis–Hastings algorithm
         Random-walk Metropolis-Hastings algorithms

      The Gibbs Sampler

      Approximate Bayesian computation

      ABC for model choice
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Monte Carlo basics


General purpose



      Given a density π known up to a normalizing constant, and an
      integrable function h, compute

                                                        h(x)˜ (x)µ(dx)
                                                             π
                      Π(h) =          h(x)π(x)µ(dx) =
                                                          π (x)µ(dx)
                                                          ˜

      when        h(x)˜ (x)µ(dx) is intractable.
                      π
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Monte Carlo basics


Monte Carlo 101

      Generate an iid sample x1 , . . . , xN from π and estimate Π(h) by
                                                        N
                                      ˆN
                                      ΠM C (h) = N −1         h(xi ).
                                                        i=1

                                          ˆN            as
                                     LLN: ΠM C (h) −→ Π(h)
      If Π(h2 ) =            h2 (x)π(x)µ(dx) < ∞,

                         √                          L
           CLT:                ˆN
                             N ΠM C (h) − Π(h)           N 0, Π [h − Π(h)]2   .
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Monte Carlo basics


Monte Carlo 101

      Generate an iid sample x1 , . . . , xN from π and estimate Π(h) by
                                                        N
                                      ˆN
                                      ΠM C (h) = N −1         h(xi ).
                                                        i=1

                                          ˆN            as
                                     LLN: ΠM C (h) −→ Π(h)
      If Π(h2 ) =            h2 (x)π(x)µ(dx) < ∞,

                         √                          L
           CLT:                ˆN
                             N ΠM C (h) − Π(h)           N 0, Π [h − Π(h)]2   .


      Caveat announcing                    MCMC


      Often impossible or inefficient to simulate directly from Π
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Monte Carlo basics


Importance Sampling


      For Q proposal distribution such that Q(dx) = q(x)µ(dx),
      alternative representation

                               Π(h) =   h(x){π/q}(x)q(x)µ(dx).
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Monte Carlo basics


Importance Sampling


      For Q proposal distribution such that Q(dx) = q(x)µ(dx),
      alternative representation

                               Π(h) =   h(x){π/q}(x)q(x)µ(dx).


      Principle of importance
      Generate an iid sample x1 , . . . , xN ∼ Q and estimate Π(h) by
                                                N
                              ˆ IS
                              ΠQ,N (h) = N −1         h(xi ){π/q}(xi ).
                                                i=1
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Monte Carlo basics


Properties of importance


      Then
           ˆ        as
      LLN: ΠIS (h) −→ Π(h)
             Q,N                          and if Q((hπ/q)2 ) < ∞,
                     √                        L
         CLT:               ˆ Q,N
                         N (ΠIS (h) − Π(h))       N 0, Q{(hπ/q − Π(h))2 } .
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Monte Carlo basics


Properties of importance


      Then
           ˆ        as
      LLN: ΠIS (h) −→ Π(h)
             Q,N                          and if Q((hπ/q)2 ) < ∞,
                     √                        L
         CLT:               ˆ Q,N
                         N (ΠIS (h) − Π(h))       N 0, Q{(hπ/q − Π(h))2 } .


      Caveat
                                                              ˆ Q,N
      If normalizing constant of π unknown, impossible to use ΠIS

      Generic problem in Bayesian Statistics: π(θ|x) ∝ f (x|θ)π(θ).
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Monte Carlo basics


Self-Normalised Importance Sampling

      Self normalized version
                                      N                  −1 N
                ˆ Q,N
                ΠSN IS (h) =                {π/q}(xi )           h(xi ){π/q}(xi ).
                                      i=1                  i=1
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Monte Carlo basics


Self-Normalised Importance Sampling

      Self normalized version
                                      N                  −1 N
                ˆ Q,N
                ΠSN IS (h) =                {π/q}(xi )             h(xi ){π/q}(xi ).
                                      i=1                    i=1

               ˆ           as
      LLN : ΠSN IS (h) −→ Π(h)
                 Q,N
      and if Π((1 + h2 )(π/q)) < ∞,
               √                                    L
      CLT :         ˆ Q,N
                 N (ΠSN IS (h) − Π(h))                   N    0, π {(π/q)(h − Π(h)}2 ) .
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Monte Carlo basics


Self-Normalised Importance Sampling

      Self normalized version
                                      N                  −1 N
                ˆ Q,N
                ΠSN IS (h) =                {π/q}(xi )             h(xi ){π/q}(xi ).
                                      i=1                    i=1

               ˆ           as
      LLN : ΠSN IS (h) −→ Π(h)
                 Q,N
      and if Π((1 + h2 )(π/q)) < ∞,
               √                                    L
      CLT :         ˆ Q,N
                 N (ΠSN IS (h) − Π(h))                   N    0, π {(π/q)(h − Π(h)}2 ) .

       c The quality of the SNIS approximation depends on the
      choice of Q
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Monte Carlo Methods based on Markov Chains


Running Monte Carlo via Markov Chains (MCMC)



      It is not necessary to use a sample from the distribution f to
      approximate the integral

                                        I=       h(x)f (x)dx ,
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Monte Carlo Methods based on Markov Chains


Running Monte Carlo via Markov Chains (MCMC)



      It is not necessary to use a sample from the distribution f to
      approximate the integral

                                        I=       h(x)f (x)dx ,


      We can obtain X1 , . . . , Xn ∼ f (approx) without directly
      simulating from f , using an ergodic Markov chain with
      stationary distribution f
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Monte Carlo Methods based on Markov Chains


Running Monte Carlo via Markov Chains (2)


      Idea
      For an arbitrary starting value x(0) , an ergodic chain (X (t) ) is
      generated using a transition kernel with stationary distribution f
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Monte Carlo Methods based on Markov Chains


Running Monte Carlo via Markov Chains (2)


      Idea
      For an arbitrary starting value x(0) , an ergodic chain (X (t) ) is
      generated using a transition kernel with stationary distribution f


             Insures the convergence in distribution of (X (t) ) to a random
             variable from f .
             For a “large enough” T0 , X (T0 ) can be considered as
             distributed from f
             Produce a dependent sample X (T0 ) , X (T0 +1) , . . ., which is
             generated from f , sufficient for most approximation purposes.
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    The Metropolis–Hastings algorithm


The Metropolis–Hastings algorithm



      Basics
      The algorithm uses the target density

                                          f

      and a conditional density
                                        q(y|x)
      called the instrumental (or proposal) distribution
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    The Metropolis–Hastings algorithm


The MH algorithm

      Algorithm (Metropolis–Hastings)
      Given x(t) ,
        1. Generate Yt ∼ q(y|x(t) ).
        2. Take

                                            Yt     with prob. ρ(x(t) , Yt ),
                           X (t+1) =
                                            x(t)   with prob. 1 − ρ(x(t) , Yt ),

             where
                                                      f (y) q(x|y)
                                      ρ(x, y) = min                ,1     .
                                                      f (x) q(y|x)
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    The Metropolis–Hastings algorithm


Features



             Independent of normalizing constants for both f and q(·|x)
             (ie, those constants independent of x)
             Never move to values with f (y) = 0
             The chain (x(t) )t may take the same value several times in a
             row, even though f is a density wrt Lebesgue measure
             The sequence (yt )t is usually not a Markov chain
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    The Metropolis–Hastings algorithm


Convergence properties



      Under irreducibility,
        1. The M-H Markov chain is reversible, with
           invariant/stationary density f since it satisfies the detailed
           balance condition
                          f (y) K(y, x) = f (x) K(x, y)
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    The Metropolis–Hastings algorithm


Convergence properties



      Under irreducibility,
        1. The M-H Markov chain is reversible, with
           invariant/stationary density f since it satisfies the detailed
           balance condition
                          f (y) K(y, x) = f (x) K(x, y)
        2. As f is a probability measure, the chain is positive recurrent
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    The Metropolis–Hastings algorithm


Convergence properties (2)
        4. If
                                        q(y|x) > 0 for every (x, y),   (2)
             the chain is irreducible
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    The Metropolis–Hastings algorithm


Convergence properties (2)
        4. If
                                        q(y|x) > 0 for every (x, y),   (2)
           the chain is irreducible
        5. For M-H, f -irreducibility implies Harris recurrence
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    The Metropolis–Hastings algorithm


Convergence properties (2)
        4. If
                                         q(y|x) > 0 for every (x, y),                        (2)
           the chain is irreducible
        5. For M-H, f -irreducibility implies Harris recurrence
        6. Thus, for M-H satisfying (1) and (2)
                (i) For h, with Ef |h(X)| < ∞,
                                              T
                                        1
                                 lim                h(X (t) ) =   h(x)df (x)       a.e. f.
                                T →∞    T     t=1

               (ii) and
                                        lim            K n (x, ·)µ(dx) − f        =0
                                        n→∞
                                                                             TV
                      for every initial distribution µ, where K n (x, ·) denotes the
                      kernel for n transitions.
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Random-walk Metropolis-Hastings algorithms


Random walk Metropolis–Hastings



      Use of a local perturbation as proposal

                                                 Yt = X (t) + εt ,

       where εt ∼ g, independent of X (t) .
      The instrumental density is of the form g(y − x) and the Markov
      chain is a random walk if we take g to be symmetric g(x) = g(−x)
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Random-walk Metropolis-Hastings algorithms




      Algorithm (Random walk Metropolis)
      Given x(t)
        1. Generate Yt ∼ g(y − x(t) )
        2. Take
                                          
                                          Y                              f (Yt )
                             (t+1)             t     with prob. min 1,             ,
                         X            =                                  f (x(t) )
                                          
                                              x(t)   otherwise.
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Random-walk Metropolis-Hastings algorithms


Optimizing the Acceptance Rate



      Problem of choice of the transition kernel from a practical point of
      view
      Most common alternatives:
        1. an instrumental density g which approximates f , such that
           f /g is bounded for uniform ergodicity to apply;
        2. a random walk
      In both cases, the choice of g is critical,
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Random-walk Metropolis-Hastings algorithms


Case of the random walk


      Different approach to acceptance rates
      A high acceptance rate does not indicate that the algorithm is
      moving correctly since it indicates that the random walk is moving
      too slowly on the surface of f .
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Random-walk Metropolis-Hastings algorithms


Case of the random walk


      Different approach to acceptance rates
      A high acceptance rate does not indicate that the algorithm is
      moving correctly since it indicates that the random walk is moving
      too slowly on the surface of f .
      If x(t) and yt are close, i.e. f (x(t) ) f (yt ) y is accepted with
      probability
                                       f (yt )
                              min              ,1     1.
                                     f (x(t) )
      For multimodal densities with well separated modes, the negative
      effect of limited moves on the surface of f clearly shows.
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Random-walk Metropolis-Hastings algorithms


Case of the random walk (2)



      If the average acceptance rate is low, the successive values of f (yt )
      tend to be small compared with f (x(t) ), which means that the
      random walk moves quickly on the surface of f since it often
      reaches the “borders” of the support of f
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Random-walk Metropolis-Hastings algorithms


Case of the random walk (2)



      If the average acceptance rate is low, the successive values of f (yt )
      tend to be small compared with f (x(t) ), which means that the
      random walk moves quickly on the surface of f since it often
      reaches the “borders” of the support of f
      In small dimensions, aim at an average acceptance rate of 50%. In
      large dimensions, at an average acceptance rate of 25%.
                                       [Gelman,Gilks and Roberts, 1995]
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Random-walk Metropolis-Hastings algorithms




      Example (Noisy AR(1))
      Hidden Markov chain from a regular AR(1) model,

                             xt+1 = ϕxt +        t+1        t   ∼ N (0, τ 2 )

      and observables
                                           yt |xt ∼ N (x2 , σ 2 )
                                                        t
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Random-walk Metropolis-Hastings algorithms




      Example (Noisy AR(1))
      Hidden Markov chain from a regular AR(1) model,

                             xt+1 = ϕxt +        t+1        t   ∼ N (0, τ 2 )

      and observables
                                           yt |xt ∼ N (x2 , σ 2 )
                                                        t

      The distribution of xt given xt−1 , xt+1 and yt is

                 −1                                                    τ2
          exp              (xt − ϕxt−1 )2 + (xt+1 − ϕxt )2 +              (yt − x2 )2
                                                                                 t      .
                 2τ 2                                                  σ2
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Random-walk Metropolis-Hastings algorithms




      Example (Noisy AR(1) continued)
      For a Gaussian random walk with scale ω small enough, the
      random walk never jumps to the other mode. But if the scale ω is
      sufficiently large, the Markov chain explores both modes and give a
      satisfactory approximation of the target distribution.
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Random-walk Metropolis-Hastings algorithms




      Markov chain based on a random walk with scale ω = .1.
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Random-walk Metropolis-Hastings algorithms




      Markov chain based on a random walk with scale ω = .5.
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Random-walk Metropolis-Hastings algorithms


MA(2)


                                      xt =       t   − θ1   t−1   − θ2   t−2

      Since the constraints on (ϑ1 , ϑ2 ) are well-defined, use of a flat
      prior over the triangle as prior.
      Simple representation of the likelihood

      library(mnormt)
      ma2like=function(theta){
      n=length(y)
      sigma = toeplitz(c(1 +theta[1]^2+theta[2]^2,
        theta[1]+theta[1]*theta[2],theta[2],rep(0,n-3)))
      dmnorm(y,rep(0,n),sigma,log=TRUE)
      }
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Random-walk Metropolis-Hastings algorithms


Basic RWHM for MA(2)

      Algorithm 1 RW-HM-MA(2) sampler
         set ω and ϑ(1)
         for i = 2 to T do
                     ˜         (i−1)       (i−1)
           generate ϑj ∼ U (ϑj       − ω, ϑj     + ω)
           set p = 0 and ϑ (i) = ϑ(i−1)
               ˜
           if ϑ within the triangle then
                                 ˜
              p = exp(ma2like(ϑ) − ma2like(ϑ(i−1) ))
           end if
           if U < p then
                     ˜
              ϑ(i) = ϑ
           end if
         end for
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Random-walk Metropolis-Hastings algorithms


Outcome
      Result with a simulated sample of 100 points and ϑ1 = 0.6,
      ϑ2 = 0.2 and scale ω = 0.2
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Random-walk Metropolis-Hastings algorithms


Outcome
      Result with a simulated sample of 100 points and ϑ1 = 0.6,
      ϑ2 = 0.2 and scale ω = 0.5
From MCMC to ABC Methods
  The Metropolis-Hastings Algorithm
    Random-walk Metropolis-Hastings algorithms


Outcome
      Result with a simulated sample of 100 points and ϑ1 = 0.6,
      ϑ2 = 0.2 and scale ω = 2.0
From MCMC to ABC Methods
  The Gibbs Sampler




The Gibbs Sampler




     The Gibbs Sampler
        General Principles
        Slice sampling
        Convergence
From MCMC to ABC Methods
  The Gibbs Sampler
    General Principles


General Principles


     A very specific simulation algorithm based on the target
     distribution f :
        1. Uses the conditional densities f1 , . . . , fp from f
From MCMC to ABC Methods
  The Gibbs Sampler
    General Principles


General Principles


     A very specific simulation algorithm based on the target
     distribution f :
        1. Uses the conditional densities f1 , . . . , fp from f
        2. Start with the random variable X = (X1 , . . . , Xp )
From MCMC to ABC Methods
  The Gibbs Sampler
    General Principles


General Principles


     A very specific simulation algorithm based on the target
     distribution f :
        1. Uses the conditional densities f1 , . . . , fp from f
        2. Start with the random variable X = (X1 , . . . , Xp )
        3. Simulate from the conditional densities,

                           Xi |x1 , x2 , . . . , xi−1 , xi+1 , . . . , xp
                                 ∼ fi (xi |x1 , x2 , . . . , xi−1 , xi+1 , . . . , xp )

             for i = 1, 2, . . . , p.
From MCMC to ABC Methods
  The Gibbs Sampler
    General Principles




     Algorithm (Gibbs sampler)
                               (t)         (t)
     Given x(t) = (x1 , . . . , xp ), generate
              (t+1)                  (t)           (t)
     1. X1               ∼ f1 (x1 |x2 , . . . , xp );
              (t+1)                  (t+1)       (t)             (t)
     2. X2               ∼ f2 (x2 |x1        , x3 , . . . , xp ),
                   ...
           (t+1)                     (t+1)               (t+1)
     p.   Xp             ∼ fp (xp |x1        , . . . , xp−1 )


                                           X(t+1) → X ∼ f
From MCMC to ABC Methods
  The Gibbs Sampler
    General Principles


Properties


     The full conditionals densities f1 , . . . , fp are the only densities used
     for simulation. Thus, even in a high dimensional problem, all of
     the simulations may be univariate
From MCMC to ABC Methods
  The Gibbs Sampler
    General Principles


Properties


     The full conditionals densities f1 , . . . , fp are the only densities used
     for simulation. Thus, even in a high dimensional problem, all of
     the simulations may be univariate
     The Gibbs sampler is not reversible with respect to f . However,
     each of its p components is. Besides, it can be turned into a
     reversible sampler, either using the Random Scan Gibbs sampler
       see section or running instead the (double) sequence



                             f1 · · · fp−1 fp fp−1 · · · f1
From MCMC to ABC Methods
  The Gibbs Sampler
    General Principles


Limitations of the Gibbs sampler


     Formally, a special case of a sequence of 1-D M-H kernels, all with
     acceptance rate uniformly equal to 1.
     The Gibbs sampler
        1. limits the choice of instrumental distributions
From MCMC to ABC Methods
  The Gibbs Sampler
    General Principles


Limitations of the Gibbs sampler


     Formally, a special case of a sequence of 1-D M-H kernels, all with
     acceptance rate uniformly equal to 1.
     The Gibbs sampler
        1. limits the choice of instrumental distributions
        2. requires some knowledge of f
From MCMC to ABC Methods
  The Gibbs Sampler
    General Principles


Limitations of the Gibbs sampler


     Formally, a special case of a sequence of 1-D M-H kernels, all with
     acceptance rate uniformly equal to 1.
     The Gibbs sampler
        1. limits the choice of instrumental distributions
        2. requires some knowledge of f
        3. is, by construction, multidimensional
From MCMC to ABC Methods
  The Gibbs Sampler
    General Principles


Limitations of the Gibbs sampler


     Formally, a special case of a sequence of 1-D M-H kernels, all with
     acceptance rate uniformly equal to 1.
     The Gibbs sampler
        1. limits the choice of instrumental distributions
        2. requires some knowledge of f
        3. is, by construction, multidimensional
        4. does not apply to problems where the number of parameters
           varies as the resulting chain is not irreducible.
From MCMC to ABC Methods
  The Gibbs Sampler
        General Principles


A wee mixture problem
        4
        3
        2
   µ2

        1
        0
        −1




             −1       0      1        2   3   4

                                 µ1

             Gibbs started at random
From MCMC to ABC Methods
  The Gibbs Sampler
        General Principles


A wee mixture problem


                                                  Gibbs stuck at the wrong mode
        4
        3




                                                       3
        2




                                                       2
   µ2

        1




                                                  µ2

                                                       1
        0




                                                       0
        −1




                                                       −1




             −1       0      1        2   3   4

                                 µ1

             Gibbs started at random                        −1   0   1    2   3

                                                                     µ1
From MCMC to ABC Methods
  The Gibbs Sampler
    Slice sampling


Slice sampler as generic Gibbs


     If f (θ) can be written as a product
                                   k
                                        fi (θ),
                                  i=1
From MCMC to ABC Methods
  The Gibbs Sampler
    Slice sampling


Slice sampler as generic Gibbs


     If f (θ) can be written as a product
                                      k
                                          fi (θ),
                                    i=1

       it can be completed as
                                k
                                      I0≤ωi ≤fi (θ) ,
                                i=1

     leading to the following Gibbs algorithm:
From MCMC to ABC Methods
  The Gibbs Sampler
    Slice sampling




     Algorithm (Slice sampler)
     Simulate
                (t+1)
        1. ω1           ∼ U[0,f1 (θ(t) )] ;
                      ...
               (t+1)
        k.   ωk      ∼ U[0,fk (θ(t) )] ;
    k+1.     θ(t+1) ∼ UA(t+1) , with
                                                   (t+1)
                            A(t+1) = {y; fi (y) ≥ ωi       , i = 1, . . . , k}.
From MCMC to ABC Methods
  The Gibbs Sampler
    Slice sampling


Example of results with a truncated N (−3, 1) distribution
          0.010
          0.008
          0.006
      y

          0.004
          0.002
          0.000




                  0.0   0.2   0.4       0.6   0.8   1.0

                                    x




     Number of Iterations 2
From MCMC to ABC Methods
  The Gibbs Sampler
    Slice sampling


Example of results with a truncated N (−3, 1) distribution
          0.010
          0.008
          0.006
      y

          0.004
          0.002
          0.000




                  0.0   0.2   0.4       0.6   0.8   1.0

                                    x




     Number of Iterations 2, 3
From MCMC to ABC Methods
  The Gibbs Sampler
    Slice sampling


Example of results with a truncated N (−3, 1) distribution
          0.010
          0.008
          0.006
      y

          0.004
          0.002
          0.000




                  0.0   0.2   0.4       0.6   0.8   1.0

                                    x




     Number of Iterations 2, 3, 4
From MCMC to ABC Methods
  The Gibbs Sampler
    Slice sampling


Example of results with a truncated N (−3, 1) distribution
          0.010
          0.008
          0.006
      y

          0.004
          0.002
          0.000




                  0.0   0.2   0.4       0.6   0.8   1.0

                                    x




     Number of Iterations 2, 3, 4, 5
From MCMC to ABC Methods
  The Gibbs Sampler
    Slice sampling


Example of results with a truncated N (−3, 1) distribution
          0.010
          0.008
          0.006
      y

          0.004
          0.002
          0.000




                  0.0   0.2   0.4       0.6   0.8   1.0

                                    x




     Number of Iterations 2, 3, 4, 5, 10
From MCMC to ABC Methods
  The Gibbs Sampler
    Slice sampling


Example of results with a truncated N (−3, 1) distribution
          0.010
          0.008
          0.006
      y

          0.004
          0.002
          0.000




                  0.0   0.2   0.4       0.6   0.8   1.0

                                    x




     Number of Iterations 2, 3, 4, 5, 10, 50
From MCMC to ABC Methods
  The Gibbs Sampler
    Slice sampling


Example of results with a truncated N (−3, 1) distribution
          0.010
          0.008
          0.006
      y

          0.004
          0.002
          0.000




                  0.0   0.2   0.4       0.6   0.8   1.0

                                    x




     Number of Iterations 2, 3, 4, 5, 10, 50, 100
From MCMC to ABC Methods
  The Gibbs Sampler
    Slice sampling


Good slices, tough slices




     The slice sampler usually enjoys good theoretical properties (like
     geometric ergodicity and even uniform ergodicity under bounded f
     and bounded X ).
     As k increases, the determination of the set A(t+1) may get
     increasingly complex.
From MCMC to ABC Methods
  The Gibbs Sampler
    Convergence


Properties of the Gibbs sampler


     Theorem (Convergence)
     For
                             (Y1 , Y2 , · · · , Yp ) ∼ g(y1 , . . . , yp ),
     if either
                                                                      [Positivity condition]
       (i)   g (i) (y   > 0 for every i = 1, · · · , p, implies that
                        i)
             g(y1 , . . . , yp ) > 0, where g (i) denotes the marginal distribution
             of Yi , or
     (ii) the transition kernel is absolutely continuous with respect to g,
     then the chain is irreducible and positive Harris recurrent.
From MCMC to ABC Methods
  The Gibbs Sampler
    Convergence


Properties of the Gibbs sampler (2)

     Consequences
     (i) If       h(y)g(y)dy < ∞, then

                                   T
                               1
                       lim               h1 (Y (t) ) =   h(y)g(y)dy a.e. g.
                      nT →∞    T
                                   t=1

    (ii) If, in addition, (Y (t) ) is aperiodic, then

                             lim          K n (y, ·)µ(dx) − f        =0
                           n→∞
                                                                TV

           for every initial distribution µ.
From MCMC to ABC Methods
  The Gibbs Sampler
    Convergence


Hammersley-Clifford theorem


     An illustration that conditionals determine the joint distribution
     Theorem
     If the joint density g(y1 , y2 ) have conditional distributions
     g1 (y1 |y2 ) and g2 (y2 |y1 ), then

                                                g2 (y2 |y1 )
                           g(y1 , y2 ) =                            .
                                           g2 (v|y1 )/g1 (y1 |v) dv

                                            [Hammersley & Clifford, circa 1970]
From MCMC to ABC Methods
  The Gibbs Sampler
    Convergence


General HC decomposition




     Under the positivity condition, the joint distribution g satisfies
                                      p
                                           g j (y j |y 1 , . . . , y   j−1   ,y   j+1
                                                                                        , . . . , y p)
             g(y1 , . . . , yp ) ∝
                                           g j (y j |y 1 , . . . , y   j−1   ,y         , . . . , y p)
                                     j=1                                          j+1


       for every permutation               on {1, 2, . . . , p} and every y ∈ Y .
From MCMC to ABC Methods
  Approximate Bayesian computation




Approximate Bayesian computation

     Computational issues in Bayesian statistics

     The Metropolis-Hastings Algorithm

     The Gibbs Sampler

     Approximate Bayesian computation
        ABC basics
        Alphabet soup
        Calibration of ABC

     ABC for model choice
From MCMC to ABC Methods
  Approximate Bayesian computation
    ABC basics


Untractable likelihoods



     Cases when the likelihood function f (y|θ) is unavailable and when
     the completion step

                                     f (y|θ) =       f (y, z|θ) dz
                                                 Z

     is impossible or too costly because of the dimension of z
                               c MCMC cannot be implemented!
From MCMC to ABC Methods
  Approximate Bayesian computation
    ABC basics


Illustrations



     Example
    Stochastic volatility model: for                         Highest weight trajectories


    t = 1, . . . , T,




                                            0.4
                                            0.2
    yt = exp(zt ) t , zt = a+bzt−1 +σηt ,



                                            0.0
                                            −0.2
    T very large makes it difficult to
                                            −0.4
    include z within the simulated                 0   200    400

                                                                          t
                                                                                  600      800   1000




    parameters
From MCMC to ABC Methods
  Approximate Bayesian computation
    ABC basics


Illustrations



     Example
     Potts model: if y takes values on a grid Y of size k n and


                                     f (y|θ) ∝ exp θ         Iyl =yi
                                                       l∼i

     where l∼i denotes a neighbourhood relation, n moderately large
     prohibits the computation of the normalising constant
From MCMC to ABC Methods
  Approximate Bayesian computation
    ABC basics


Illustrations




     Example
     Inference on CMB: in cosmology, study of the Cosmic Microwave
     Background via likelihoods immensely slow to computate (e.g
     WMAP, Plank), because of numerically costly spectral transforms
     [Data is a Fortran program]
                                      [Kilbinger et al., 2010, MNRAS]
From MCMC to ABC Methods
  Approximate Bayesian computation
    ABC basics


Illustrations


     Example
   Phylogenetic tree: in population
   genetics, reconstitution of a common
   ancestor from a sample of genes via
   a phylogenetic tree that is close to
   impossible to integrate out
   [100 processor days with 4
   parameters]
                                    [Cornuet et al., 2009, Bioinformatics]
From MCMC to ABC Methods
  Approximate Bayesian computation
    ABC basics


The ABC method

     Bayesian setting: target is π(θ)f (x|θ)
From MCMC to ABC Methods
  Approximate Bayesian computation
    ABC basics


The ABC method

     Bayesian setting: target is π(θ)f (x|θ)
     When likelihood f (x|θ) not in closed form, likelihood-free rejection
     technique:
From MCMC to ABC Methods
  Approximate Bayesian computation
    ABC basics


The ABC method

     Bayesian setting: target is π(θ)f (x|θ)
     When likelihood f (x|θ) not in closed form, likelihood-free rejection
     technique:
     ABC algorithm
     For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly
     simulating
                          θ ∼ π(θ) , z ∼ f (z|θ ) ,
     until the auxiliary variable z is equal to the observed value, z = y.

                                                     [Tavar´ et al., 1997]
                                                           e
From MCMC to ABC Methods
  Approximate Bayesian computation
    ABC basics


Why does it work?!



     The proof is trivial:

                                     f (θi ) ∝         π(θi )f (z|θi )Iy (z)
                                                 z∈D
                                           ∝ π(θi )f (y|θi )
                                           = π(θi |y) .

                                                                          [Accept–Reject 101]
From MCMC to ABC Methods
  Approximate Bayesian computation
    ABC basics


A as approximative


     When y is a continuous random variable, equality z = y is replaced
     with a tolerance condition,

                                     (y, z) ≤

     where         is a distance
From MCMC to ABC Methods
  Approximate Bayesian computation
    ABC basics


A as approximative


     When y is a continuous random variable, equality z = y is replaced
     with a tolerance condition,

                                        (y, z) ≤

     where is a distance
     Output distributed from

                          π(θ) Pθ { (y, z) < } ∝ π(θ| (y, z) < )
From MCMC to ABC Methods
  Approximate Bayesian computation
    ABC basics


ABC algorithm


     Algorithm 2 Likelihood-free rejection sampler 2
       for i = 1 to N do
         repeat
            generate θ from the prior distribution π(·)
            generate z from the likelihood f (·|θ )
         until ρ{η(z), η(y)} ≤
         set θi = θ
       end for

     where η(y) defines a (not necessarily sufficient) statistic
From MCMC to ABC Methods
  Approximate Bayesian computation
    ABC basics


Output

     The likelihood-free algorithm samples from the marginal in z of:

                                            π(θ)f (z|θ)IA ,y (z)
                            π (θ, z|y) =                           ,
                                           A ,y ×Θ π(θ)f (z|θ)dzdθ

     where A        ,y   = {z ∈ D|ρ(η(z), η(y)) < }.
From MCMC to ABC Methods
  Approximate Bayesian computation
    ABC basics


Output

     The likelihood-free algorithm samples from the marginal in z of:

                                            π(θ)f (z|θ)IA ,y (z)
                            π (θ, z|y) =                           ,
                                           A ,y ×Θ π(θ)f (z|θ)dzdθ

     where A        ,y   = {z ∈ D|ρ(η(z), η(y)) < }.
     The idea behind ABC is that the summary statistics coupled with a
     small tolerance should provide a good approximation of the
     posterior distribution:

                             π (θ|y) =     π (θ, z|y)dz ≈ π(θ|y) .
From MCMC to ABC Methods
  Approximate Bayesian computation
    ABC basics


MA example


     Back to the MA(q) model
                                                     q
                                     xt =   t   +         ϑi    t−i
                                                    i=1

     Simple prior: uniform over the inverse [real and complex] roots in
                                                          q
                                     Q(u) = 1 −                ϑi ui
                                                         i=1

     under the identifiability conditions
From MCMC to ABC Methods
  Approximate Bayesian computation
    ABC basics


MA example



     Back to the MA(q) model
                                                  q
                                     xt =   t+         ϑi   t−i
                                                 i=1

     Simple prior: uniform prior over the identifiability zone, e.g.
     triangle for MA(2)
From MCMC to ABC Methods
  Approximate Bayesian computation
    ABC basics


MA example (2)
     ABC algorithm thus made of
        1. picking a new value (ϑ1 , ϑ2 ) in the triangle
        2. generating an iid sequence ( t )−q<t≤T
        3. producing a simulated series (xt )1≤t≤T
From MCMC to ABC Methods
  Approximate Bayesian computation
    ABC basics


MA example (2)
     ABC algorithm thus made of
        1. picking a new value (ϑ1 , ϑ2 ) in the triangle
        2. generating an iid sequence ( t )−q<t≤T
        3. producing a simulated series (xt )1≤t≤T
     Distance: basic distance between the series
                                                           T
                          ρ((xt )1≤t≤T , (xt )1≤t≤T ) =         (xt − xt )2
                                                          t=1

     or distance between summary statistics like the q autocorrelations
                                              T
                                      τj =           xt xt−j
                                             t=j+1
From MCMC to ABC Methods
  Approximate Bayesian computation
    ABC basics


Comparison of distance impact




     Evaluation of the tolerance on the ABC sample against both
     distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
From MCMC to ABC Methods
  Approximate Bayesian computation
    ABC basics


Comparison of distance impact




     Evaluation of the tolerance on the ABC sample against both
     distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
From MCMC to ABC Methods
  Approximate Bayesian computation
    ABC basics


Comparison of distance impact




     Evaluation of the tolerance on the ABC sample against both
     distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
From MCMC to ABC Methods
  Approximate Bayesian computation
    ABC basics


ABC advances

     Simulating from the prior is often poor in efficiency
From MCMC to ABC Methods
  Approximate Bayesian computation
    ABC basics


ABC advances

     Simulating from the prior is often poor in efficiency
     Either modify the proposal distribution on θ to increase the density
     of x’s within the vicinity of y...
          [Marjoram et al, 2003; Bortot et al., 2007; Sisson et al., 2007]
From MCMC to ABC Methods
  Approximate Bayesian computation
    ABC basics


ABC advances

     Simulating from the prior is often poor in efficiency
     Either modify the proposal distribution on θ to increase the density
     of x’s within the vicinity of y...
          [Marjoram et al, 2003; Bortot et al., 2007; Sisson et al., 2007]

     ...or by viewing the problem as a conditional density estimation
     and by developing techniques to allow for larger
                                                 [Beaumont et al., 2002]
From MCMC to ABC Methods
  Approximate Bayesian computation
    ABC basics


ABC advances

     Simulating from the prior is often poor in efficiency
     Either modify the proposal distribution on θ to increase the density
     of x’s within the vicinity of y...
          [Marjoram et al, 2003; Bortot et al., 2007; Sisson et al., 2007]

     ...or by viewing the problem as a conditional density estimation
     and by developing techniques to allow for larger
                                                 [Beaumont et al., 2002]

     .....or even by including       in the inferential framework [ABCµ ]
                                                          [Ratmann et al., 2009]
From MCMC to ABC Methods
  Approximate Bayesian computation
    Alphabet soup


ABC-NP

    Better usage of [prior] simulations by
   adjustement: instead of throwing away
   θ such that ρ(η(z), η(y)) > , replace
   θs with locally regressed
                                     ˆ
             θ∗ = θ − {η(z) − η(y)}T β
                                                      [Csill´ry et al., TEE, 2010]
                                                            e

            ˆ
     where β is obtained by [NP] weighted least square regression on
     (η(z) − η(y)) with weights

                                     Kδ {ρ(η(z), η(y))}

                                             [Beaumont et al., 2002, Genetics]
From MCMC to ABC Methods
  Approximate Bayesian computation
    Alphabet soup


ABC-MCMC


     Markov chain (θ(t) ) created via the transition function
               
               θ ∼ Kω (θ |θ(t) ) if x ∼ f (x|θ ) is such that x = y
               
               
                                                          π(θ )Kω (t) |θ )
     θ (t+1)
             =                       and u ∼ U (0, 1) ≤ π(θ(t) )K (θ |θ(t) ) ,
               
                (t)                                             ω (θ
               θ                   otherwise,
From MCMC to ABC Methods
  Approximate Bayesian computation
    Alphabet soup


ABC-MCMC


     Markov chain (θ(t) ) created via the transition function
               
               θ ∼ Kω (θ |θ(t) ) if x ∼ f (x|θ ) is such that x = y
               
               
                                                          π(θ )Kω (t) |θ )
     θ (t+1)
             =                       and u ∼ U (0, 1) ≤ π(θ(t) )K (θ |θ(t) ) ,
               
                (t)                                             ω (θ
               θ                   otherwise,

     has the posterior π(θ|y) as stationary distribution
                                                   [Marjoram et al, 2003]
From MCMC to ABC Methods
  Approximate Bayesian computation
    Alphabet soup


ABC-MCMC (2)

     Algorithm 3 Likelihood-free MCMC sampler
         Use Algorithm 2 to get (θ(0) , z(0) )
         for t = 1 to N do
           Generate θ from Kω ·|θ(t−1) ,
           Generate z from the likelihood f (·|θ ),
           Generate u from U[0,1] ,
                         π(θ )Kω (θ (t−1) |θ )
             if u ≤                              I
                       π(θ (t−1) Kω (θ |θ (t−1) ) A   ,y   (z ) then
              set     (θ(t) , z(t) ) = (θ , z )
           else
              (θ(t) , z(t) )) = (θ(t−1) , z(t−1) ),
           end if
         end for
From MCMC to ABC Methods
  Approximate Bayesian computation
    Alphabet soup


Why does it work?

     Acceptance probability that does not involve the calculation of the
     likelihood and

               π (θ , z |y)         Kω (θ(t−1) |θ )f (z(t−1) |θ(t−1) )
                                  ×
           π (θ(t−1) , z(t−1) |y)       Kω (θ |θ(t−1) )f (z |θ )
                                            π(θ ) f (z |θ ) IA ,y (z )
                                  =     (t−1) ) f (z(t−1) |θ (t−1) )I         (t−1) )
                                    π(θ                               A ,y (z

                                             Kω (θ(t−1) |θ ) f (z(t−1) |θ(t−1) )
                                         ×
                                                 Kω (θ |θ(t−1) ) f (z |θ )
                                          π(θ )Kω (θ(t−1) |θ )
                                     =                           IA (z ) .
                                         π(θ(t−1) Kω (θ |θ(t−1) ) ,y
From MCMC to ABC Methods
  Approximate Bayesian computation
    Alphabet soup


ABCµ


                       [Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS]

     Use of a joint density

                             f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( )

     where y is the data, and ξ( |y, θ) is the prior predictive density of
     ρ(η(z), η(y)) given θ and x when z ∼ f (z|θ)
From MCMC to ABC Methods
  Approximate Bayesian computation
    Alphabet soup


ABCµ


                       [Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS]

     Use of a joint density

                             f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( )

     where y is the data, and ξ( |y, θ) is the prior predictive density of
     ρ(η(z), η(y)) given θ and x when z ∼ f (z|θ)
     Warning! Replacement of ξ( |y, θ) with a non-parametric kernel
     approximation.
From MCMC to ABC Methods
  Approximate Bayesian computation
    Alphabet soup


ABCµ details

     Multidimensional distances ρk (k = 1, . . . , K) and errors
     k = ρk (ηk (z), ηk (y)), with


                           ˆ                1
       k   ∼ ξk ( |y, θ) ≈ ξk ( |y, θ) =             K[{ k −ρk (ηk (zb ), ηk (y))}/hk ]
                                           Bhk
                                                 b

                                                ˆ
     then used in replacing ξ( |y, θ) with mink ξk ( |y, θ)
From MCMC to ABC Methods
  Approximate Bayesian computation
    Alphabet soup


ABCµ details

     Multidimensional distances ρk (k = 1, . . . , K) and errors
     k = ρk (ηk (z), ηk (y)), with


                           ˆ                  1
       k   ∼ ξk ( |y, θ) ≈ ξk ( |y, θ) =               K[{ k −ρk (ηk (zb ), ηk (y))}/hk ]
                                             Bhk
                                                   b

                                                ˆ
     then used in replacing ξ( |y, θ) with mink ξk ( |y, θ)
     ABCµ involves acceptance probability

                                                        ˆ
                            π(θ , ) q(θ , θ)q( , ) mink ξk ( |y, θ )
                                                         ˆ
                            π(θ, ) q(θ, θ )q( , ) mink ξk ( |y, θ)
From MCMC to ABC Methods
  Approximate Bayesian computation
    Alphabet soup


ABCµ multiple errors




                                     [ c Ratmann et al., PNAS, 2009]
From MCMC to ABC Methods
  Approximate Bayesian computation
    Alphabet soup


ABCµ for model choice




                                     [ c Ratmann et al., PNAS, 2009]
From MCMC to ABC Methods
  Approximate Bayesian computation
    Alphabet soup


Questions about ABCµ


     For each model under comparison, marginal posterior on   used to
     assess the fit of the model (HPD includes 0 or not).
From MCMC to ABC Methods
  Approximate Bayesian computation
    Alphabet soup


Questions about ABCµ


     For each model under comparison, marginal posterior on          used to
     assess the fit of the model (HPD includes 0 or not).
             Is the data informative about ? [Identifiability]
             How is the prior π( ) impacting the comparison?
             How is using both ξ( |x0 , θ) and π ( ) compatible with a
             standard probability model? [remindful of Wilkinson]
             Where is the penalisation for complexity in the model
             comparison?
                                      [X, Mengersen & Chen, 2010, PNAS]
From MCMC to ABC Methods
  Approximate Bayesian computation
    Alphabet soup


A PMC version

     Generate a sample at iteration t by
                                                 N
                                     (t)                (t−1)                (t−1)
                              πt (θ ) ∝
                              ˆ                        ωj       Kt (θ(t) |θj         )
                                                 j=1

     modulo acceptance of the associated xt , and use an importance
                                                    (t)
     weight associated with an accepted simulation θi
                                           (t)          (t)          (t)
                                       ωi ∝ π(θi ) πt (θi ) .
                                                   ˆ

      c Still likelihood free
                                                                           [Beaumont et al., 2009]
From MCMC to ABC Methods
  Approximate Bayesian computation
    Alphabet soup


The ABC-PMC algorithm
     Given a decreasing sequence of approximation levels                       1   ≥ ... ≥       T,

        1. At iteration t = 1,
                    For i = 1, ..., N
                                        (1)                            (1)
                            Simulate θi ∼ π(θ) and x ∼ f (x|θi ) until (x, y) <                        1
                                 (1)
                            Set ωi = 1/N
                                                                              (1)
             Take τ 2 as twice the empirical variance of the θi ’s
        2. At iteration 2 ≤ t ≤ T ,
                    For i = 1, ..., N , repeat
                                                  (t−1)                             (t−1)
                            Pick θi from the θj           ’s with probabilities ωj
                                        (t)                2                       (t)
                            generate   θi |θi   ∼ N (θi , σt ) and x ∼       f (x|θi )
                    until (x, y) <        t
                           (t)          (t)      N        (t−1)      −1       (t)        (t−1)
                    Set   ωi     ∝   π(θi )/     j=1   ωj         ϕ σt  θi − θj                  )
                   2                                                                             (t)
             Take τt+1 as twice the weighted empirical variance of the θi ’s
From MCMC to ABC Methods
  Approximate Bayesian computation
    Alphabet soup


Sequential Monte Carlo

     SMC is a simulation technique to approximate a sequence of
     related probability distributions πn with π0 “easy” and πT as
     target.
     Iterated IS: particles moved from time n to time n via kernel Kn
     and use of a sequence of extended targets πn˜
                                                       n
                               πn (z0:n ) = πn (zn )
                               ˜                             Lj (zj+1 , zj )
                                                       j=0

     where the Lj ’s are backward Markov kernels [check that πn (zn ) is
     a marginal]
                            [Del Moral, Doucet & Jasra, Series B, 2006]
From MCMC to ABC Methods
  Approximate Bayesian computation
    Alphabet soup


Sequential Monte Carlo (2)

     Algorithm 4 SMC sampler
                       (0)
         sample zi ∼ γ0 (x) (i = 1, . . . , N )
                             (0)       (0)          (0)
         compute weights wi = π0 (zi ))/γ0 (zi )
         for t = 1 to N do
           if ESS(w(t−1) ) < NT then
              resample N particles z (t−1) and set weights to 1
           end if
                      (t−1)        (t−1)
           generate zi      ∼ Kt (zi     , ·) and set weights to
                                                         (t)         (t)    (t−1)
                           (t)             (t−1)    πt (zi ))Lt−1 (zi ), zi         ))
                          wi         =   Wi−1             (t−1)        (t−1)     (t)
                                                   πt−1 (zi     ))Kt (zi     ), zi ))

         end for
From MCMC to ABC Methods
  Approximate Bayesian computation
    Alphabet soup


ABC-SMC
                                                  [Del Moral, Doucet & Jasra, 2009]

     True derivation of an SMC-ABC algorithm
     Use of a kernel Kn associated with target π                  n   and derivation of the
     backward kernel
                                                  π n (z )Kn (z , z)
                                 Ln−1 (z, z ) =
                                                        πn (z)

     Update of the weights
                                                    M
                                                    m=1 IA       (xm )
                                                                   in
                            win ∝ wi(n−1)         M
                                                             n

                                                  m=1 IA   n−1
                                                                 (xm
                                                                   i(n−1) )

     when xm ∼ K(xi(n−1) , ·)
           in
From MCMC to ABC Methods
  Approximate Bayesian computation
    Alphabet soup


ABC-SMCM



     Modification: Makes M repeated simulations of the pseudo-data z
     given the parameter, rather than using a single [M = 1]
     simulation, leading to weight that is proportional to the number of
     accepted zi s
                                    M
                                 1
                        ω(θ) =          Iρ(η(y),η(zi ))<
                                M
                                     i=1

     [limit in M means exact simulation from (tempered) target]
From MCMC to ABC Methods
  Approximate Bayesian computation
    Alphabet soup


Properties of ABC-SMC



     The ABC-SMC method properly uses a backward kernel L(z, z ) to
     simplify the importance weight and to remove the dependence on
     the unknown likelihood from this weight. Update of importance
     weights is reduced to the ratio of the proportions of surviving
     particles
From MCMC to ABC Methods
  Approximate Bayesian computation
    Alphabet soup


Properties of ABC-SMC



     The ABC-SMC method properly uses a backward kernel L(z, z ) to
     simplify the importance weight and to remove the dependence on
     the unknown likelihood from this weight. Update of importance
     weights is reduced to the ratio of the proportions of surviving
     particles
     Adaptivity in ABC-SMC algorithm only found in on-line
     construction of the thresholds t , slowly enough to keep a large
     number of accepted transitions
From MCMC to ABC Methods
  Approximate Bayesian computation
    Alphabet soup


Semi-automatic ABC


     Fearnhead and Prangle (2010) study ABC and the selection of the
     summary statistic in close proximity to Wilkinson’s proposal
     ABC then considered from a purely inferential viewpoint and
     calibrated for estimation purposes.
     Use of a randomised (or ‘noisy’) version of the summary statistics

                                     η (y) = η(y) + τ
                                     ˜

     Derivation of a well-calibrated version of ABC, i.e. an algorithm
     that gives proper predictions for the distribution associated with
     this randomised summary statistic.
From MCMC to ABC Methods
  Approximate Bayesian computation
    Alphabet soup


Summary statistics




     Optimality of the posterior expectations of the parameters of
     interest as summary statistics!
From MCMC to ABC Methods
  Approximate Bayesian computation
    Alphabet soup


Summary statistics




     Optimality of the posterior expectations of the parameters of
     interest as summary statistics!
     Use of the standard quadratic loss function

                                     (θ − θ0 )T A(θ − θ0 ) .
From MCMC to ABC Methods
  Approximate Bayesian computation
    Calibration of ABC


Which summary?


     Fundamental difficulty of the choice of the summary statistic when
     there is no non-trivial sufficient statistics [except when done by the
     experimenters in the field]
From MCMC to ABC Methods
  Approximate Bayesian computation
    Calibration of ABC


Which summary?


     Fundamental difficulty of the choice of the summary statistic when
     there is no non-trivial sufficient statistics [except when done by the
     experimenters in the field]
     Starting from a large collection of summary statistics is available,
     Joyce and Marjoram (2008) consider the sequential inclusion into
     the ABC target, with a stopping rule based on a likelihood ratio
     test.
From MCMC to ABC Methods
  Approximate Bayesian computation
    Calibration of ABC


Which summary?


     Fundamental difficulty of the choice of the summary statistic when
     there is no non-trivial sufficient statistics [except when done by the
     experimenters in the field]
     Starting from a large collection of summary statistics is available,
     Joyce and Marjoram (2008) consider the sequential inclusion into
     the ABC target, with a stopping rule based on a likelihood ratio
     test.
             Does not taking into account the sequential nature of the tests
             Depends on parameterisation
             Order of inclusion matters.
From MCMC to ABC Methods
  ABC for model choice




ABC for model choice

     Computational issues in Bayesian statistics

     The Metropolis-Hastings Algorithm

     The Gibbs Sampler

     Approximate Bayesian computation

     ABC for model choice
       Model choice
       Gibbs random fields
       Model choice via ABC
       Illustrations
       Generic ABC model choice
From MCMC to ABC Methods
  ABC for model choice
    Model choice


Bayesian model choice



     Several models M1 , M2 , . . . are considered simultaneously for a
     dataset y and the model index M is part of the inference.
     Use of a prior distribution. π(M = m), plus a prior distribution on
     the parameter conditional on the value m of the model index,
     πm (θ m )
     Goal is to derive the posterior distribution of M , challenging
     computational target when models are complex.
From MCMC to ABC Methods
  ABC for model choice
    Model choice


Generic ABC for model choice



     Algorithm 5 Likelihood-free model choice sampler (ABC-MC)
       for t = 1 to T do
         repeat
            Generate m from the prior π(M = m)
            Generate θ m from the prior πm (θ m )
            Generate z from the model fm (z|θ m )
         until ρ{η(z), η(y)} <
         Set m(t) = m and θ (t) = θ m
       end for
From MCMC to ABC Methods
  ABC for model choice
    Model choice


ABC estimates
     Posterior probability π(M = m|y) approximated by the frequency
     of acceptances from model m
                                      T
                                  1
                                            Im(t) =m .
                                  T
                                      t=1

     Issues with implementation:
             should tolerances   be the same for all models?
             should summary statistics vary across models (incl. their
             dimension)?
             should the distance measure ρ vary as well?
From MCMC to ABC Methods
  ABC for model choice
    Model choice


ABC estimates
     Posterior probability π(M = m|y) approximated by the frequency
     of acceptances from model m
                                      T
                                  1
                                            Im(t) =m .
                                  T
                                      t=1

     Issues with implementation:
             should tolerances   be the same for all models?
             should summary statistics vary across models (incl. their
             dimension)?
             should the distance measure ρ vary as well?
     Extension to a weighted polychotomous logistic regression estimate
     of π(M = m|y), with non-parametric kernel weights
                                       [Cornuet et al., DIYABC, 2009]
From MCMC to ABC Methods
  ABC for model choice
    Model choice


The Great ABC controversy
     On-going controvery in phylogeographic genetics about the validity
     of using ABC for testing




   Against: Templeton, 2008,
   2009, 2010a, 2010b, 2010c
   argues that nested hypotheses
   cannot have higher probabilities
   than nesting hypotheses (!)
From MCMC to ABC Methods
  ABC for model choice
    Model choice


The Great ABC controversy



     On-going controvery in phylogeographic genetics about the validity
     of using ABC for testing
                                        Replies: Fagundes et al., 2008,
   Against: Templeton, 2008,           Beaumont et al., 2010, Berger et
   2009, 2010a, 2010b, 2010c           al., 2010, Csill`ry et al., 2010
                                                       e
   argues that nested hypotheses       point out that the criticisms are
   cannot have higher probabilities    addressed at [Bayesian]
   than nesting hypotheses (!)         model-based inference and have
                                       nothing to do with ABC...
From MCMC to ABC Methods
  ABC for model choice
    Gibbs random fields


Gibbs random fields


     Gibbs distribution
     The rv y = (y1 , . . . , yn ) is a Gibbs random field associated with
     the graph G if

                                     1
                           f (y) =     exp −         Vc (yc )   ,
                                     Z
                                               c∈C

     where Z is the normalising constant, C is the set of cliques of G
     and Vc is any function also called potential
     U (y) = c∈C Vc (yc ) is the energy function
From MCMC to ABC Methods
  ABC for model choice
    Gibbs random fields


Gibbs random fields


     Gibbs distribution
     The rv y = (y1 , . . . , yn ) is a Gibbs random field associated with
     the graph G if

                                     1
                           f (y) =     exp −         Vc (yc )   ,
                                     Z
                                               c∈C

     where Z is the normalising constant, C is the set of cliques of G
     and Vc is any function also called potential
     U (y) = c∈C Vc (yc ) is the energy function

      c Z is usually unavailable in closed form
From MCMC to ABC Methods
  ABC for model choice
    Gibbs random fields


Potts model
     Potts model
     Vc (y) is of the form

                           Vc (y) = θS(y) = θ         δyl =yi
                                                l∼i

     where l∼i denotes a neighbourhood structure
From MCMC to ABC Methods
  ABC for model choice
    Gibbs random fields


Potts model
     Potts model
     Vc (y) is of the form

                           Vc (y) = θS(y) = θ         δyl =yi
                                                l∼i

     where l∼i denotes a neighbourhood structure

     In most realistic settings, summation
                             Zθ =         exp{θ T S(x)}
                                    x∈X

     involves too many terms to be manageable and numerical
     approximations cannot always be trusted
                            [Cucala, Marin, CPR & Titterington, 2009]
From MCMC to ABC Methods
  ABC for model choice
    Model choice via ABC


Bayesian Model Choice



     Comparing a model with potential S0 taking values in Rp0 versus a
     model with potential S1 taking values in Rp1 can be done through
     the Bayes factor corresponding to the priors π0 and π1 on each
     parameter space

                                         exp{θ T S0 (x)}/Zθ 0 ,0 π0 (dθ 0 )
                                               0
                         Bm0 /m1 (x) =
                                         exp{θ T S1 (x)}/Zθ 1 ,1 π1 (dθ 1 )
                                               1
From MCMC to ABC Methods
  ABC for model choice
    Model choice via ABC


Bayesian Model Choice



     Comparing a model with potential S0 taking values in Rp0 versus a
     model with potential S1 taking values in Rp1 can be done through
     the Bayes factor corresponding to the priors π0 and π1 on each
     parameter space

                                         exp{θ T S0 (x)}/Zθ 0 ,0 π0 (dθ 0 )
                                               0
                         Bm0 /m1 (x) =
                                         exp{θ T S1 (x)}/Zθ 1 ,1 π1 (dθ 1 )
                                               1

     Use of Jeffreys’ scale to select most appropriate model
From MCMC to ABC Methods
  ABC for model choice
    Model choice via ABC


Neighbourhood relations



     Choice to be made between M neighbourhood relations
                           m
                           i∼i       (0 ≤ m ≤ M − 1)

     with
                               Sm (x) =         I{xi =xi }
                                          m
                                          i∼i

     driven by the posterior probabilities of the models.
From MCMC to ABC Methods
  ABC for model choice
    Model choice via ABC


Model index



     Formalisation via a model index M that appears as a new
     parameter with prior distribution π(M = m) and
     π(θ|M = m) = πm (θm )
From MCMC to ABC Methods
  ABC for model choice
    Model choice via ABC


Model index



     Formalisation via a model index M that appears as a new
     parameter with prior distribution π(M = m) and
     π(θ|M = m) = πm (θm )
     Computational target:

              P(M = m|x) ∝        fm (x|θm )πm (θm ) dθm π(M = m) ,
                             Θm
From MCMC to ABC Methods
  ABC for model choice
    Model choice via ABC


Sufficient statistics
     By definition, if S(x) sufficient statistic for the joint parameters
     (M, θ0 , . . . , θM −1 ),
                           P(M = m|x) = P(M = m|S(x)) .
From MCMC to ABC Methods
  ABC for model choice
    Model choice via ABC


Sufficient statistics
     By definition, if S(x) sufficient statistic for the joint parameters
     (M, θ0 , . . . , θM −1 ),
                           P(M = m|x) = P(M = m|S(x)) .
     For each model m, own sufficient statistic Sm (·) and
     S(·) = (S0 (·), . . . , SM −1 (·)) also sufficient.
From MCMC to ABC Methods
  ABC for model choice
    Model choice via ABC


Sufficient statistics
     By definition, if S(x) sufficient statistic for the joint parameters
     (M, θ0 , . . . , θM −1 ),
                           P(M = m|x) = P(M = m|S(x)) .
     For each model m, own sufficient statistic Sm (·) and
     S(·) = (S0 (·), . . . , SM −1 (·)) also sufficient.
     For Gibbs random fields,
                                       1           2
               x|M = m ∼ fm (x|θm ) = fm (x|S(x))fm (S(x)|θm )
                                          1
                                    =         f 2 (S(x)|θm )
                                      n(S(x)) m
     where
                           n(S(x)) = {˜ ∈ X : S(˜ ) = S(x)}
                                      x         x
      c S(x) is therefore also sufficient for the joint parameters
                                    [Specific to Gibbs random fields!]
From MCMC to ABC Methods
  ABC for model choice
    Model choice via ABC


ABC model choice Algorithm



     ABC-MC
             Generate m∗ from the prior π(M = m).
                       ∗
             Generate θm∗ from the prior πm∗ (·).
             Generate x∗ from the model fm∗ (·|θm∗ ).
                                                ∗

             Compute the distance ρ(S(x0 ), S(x∗ )).
             Accept (θm∗ , m∗ ) if ρ(S(x0 ), S(x∗ )) < .
                      ∗


     Note When             = 0 the algorithm is exact
From MCMC to ABC Methods
  ABC for model choice
    Model choice via ABC


ABC approximation to the Bayes factor

     Frequency ratio:
                                     ˆ
                                     P(M = m0 |x0 ) π(M = m1 )
                 BF m0 /m1 (x0 ) =                 ×
                                     ˆ
                                     P(M = m1 |x0 ) π(M = m0 )
                                     {mi∗ = m0 } π(M = m1 )
                                =               ×           ,
                                     {mi∗ = m1 } π(M = m0 )
From MCMC to ABC Methods
  ABC for model choice
    Model choice via ABC


ABC approximation to the Bayes factor

     Frequency ratio:
                                       ˆ
                                       P(M = m0 |x0 ) π(M = m1 )
                 BF m0 /m1 (x0 ) =                   ×
                                       ˆ
                                       P(M = m1 |x0 ) π(M = m0 )
                                        {mi∗ = m0 } π(M = m1 )
                                  =                ×           ,
                                        {mi∗ = m1 } π(M = m0 )

     replaced with

                                      1 + {mi∗ = m0 } π(M = m1 )
                  BF m0 /m1 (x0 ) =                  ×
                                      1 + {mi∗ = m1 } π(M = m0 )

     to avoid indeterminacy (also Bayes estimate).
From MCMC to ABC Methods
  ABC for model choice
    Illustrations


Toy example


     iid Bernoulli model versus two-state first-order Markov chain, i.e.
                                              n
                    f0 (x|θ0 ) = exp θ0            I{xi =1}      {1 + exp(θ0 )}n ,
                                          i=1

     versus
                                          n
                             1
              f1 (x|θ1 ) =     exp θ1             I{xi =xi−1 }    {1 + exp(θ1 )}n−1 ,
                             2
                                        i=2

     with priors θ0 ∼ U (−5, 5) and θ1 ∼ U (0, 6) (inspired by “phase
     transition” boundaries).
From MCMC to ABC Methods
  ABC for model choice
    Illustrations


Toy example (2)




     (left) Comparison of the true BF m0 /m1 (x0 ) with BF m0 /m1 (x0 )
     (in logs) over 2, 000 simulations and 4.106 proposals from the
     prior. (right) Same when using tolerance corresponding to the
     1% quantile on the distances.
From MCMC to ABC Methods
  ABC for model choice
    Generic ABC model choice


Back to sufficiency




     If η1 (x) sufficient statistic for model m = 1 and parameter θ1 and
     η2 (x) sufficient statistic for model m = 2 and parameter θ2 ,
     (η1 (x), η2 (x)) is not always sufficient for (m, θm )
From MCMC to ABC Methods
  ABC for model choice
    Generic ABC model choice


Back to sufficiency




     If η1 (x) sufficient statistic for model m = 1 and parameter θ1 and
     η2 (x) sufficient statistic for model m = 2 and parameter θ2 ,
     (η1 (x), η2 (x)) is not always sufficient for (m, θm )
      c Potential loss of information at the testing level
                                 [X, Cornuet, Marin, and Pillai, 2011]
From MCMC to ABC Methods
  ABC for model choice
    Generic ABC model choice


Limiting behaviour of B12 (T → ∞)

     ABC approximation
                                       T
                                       t=1 Imt =1 Iρ{η(zt ),η(y)}≤
                           B12 (y) =   T
                                                                     ,
                                       t=1 Imt =2 Iρ{η(zt ),η(y)}≤

     where the (mt , z t )’s are simulated from the (joint) prior
From MCMC to ABC Methods
  ABC for model choice
    Generic ABC model choice


Limiting behaviour of B12 (T → ∞)

     ABC approximation
                                       T
                                       t=1 Imt =1 Iρ{η(zt ),η(y)}≤
                           B12 (y) =   T
                                                                     ,
                                       t=1 Imt =2 Iρ{η(zt ),η(y)}≤

     where the (mt , z t )’s are simulated from the (joint) prior
     As T go to infinity, limit

                                   Iρ{η(z),η(y)}≤ π1 (θ 1 )f1 (z|θ 1 ) dz dθ 1
                  B12 (y) =
                                   Iρ{η(z),η(y)}≤ π2 (θ 2 )f2 (z|θ 2 ) dz dθ 2
                                                         η
                                   Iρ{η,η(y)}≤ π1 (θ 1 )f1 (η|θ 1 ) dη dθ 1
                               =                         η                  ,
                                   Iρ{η,η(y)}≤ π2 (θ 2 )f2 (η|θ 2 ) dη dθ 2
            η               η
     where f1 (η|θ 1 ) and f2 (η|θ 2 ) distributions of η(z)
From MCMC to ABC Methods
  ABC for model choice
    Generic ABC model choice


Limiting behaviour of B12 ( → 0)




     When         goes to zero,
                                                     η
                                η          π1 (θ 1 )f1 (η(y)|θ 1 ) dθ 1
                               B12 (y) =             η                  ,
                                           π2 (θ 2 )f2 (η(y)|θ 2 ) dθ 2
From MCMC to ABC Methods
  ABC for model choice
    Generic ABC model choice


Limiting behaviour of B12 ( → 0)




     When         goes to zero,
                                                     η
                                η          π1 (θ 1 )f1 (η(y)|θ 1 ) dθ 1
                               B12 (y) =             η                  ,
                                           π2 (θ 2 )f2 (η(y)|θ 2 ) dθ 2

                  Bayes factor based on the sole observation of η(y)
From MCMC to ABC Methods
  ABC for model choice
    Generic ABC model choice


Limiting behaviour of B12 (under sufficiency)

     If η(y) sufficient statistic for both models,

                                     fi (y|θ i ) = gi (y)fiη (η(y)|θ i )

     Thus
                                                   η
                                Θ1   π(θ 1 )g1 (y)f1 (η(y)|θ 1 ) dθ 1
         B12 (y) =                                 η
                                Θ2   π(θ 2 )g2 (y)f2 (η(y)|θ 2 ) dθ 2
                                                    η
                               g1 (y)     π1 (θ 1 )f1 (η(y)|θ 1 ) dθ 1   g1 (y) η
                         =                          η                  =       B (y) .
                               g2 (y)     π2 (θ 2 )f2 (η(y)|θ 2 ) dθ 2   g2 (y) 12

                                         [Didelot, Everitt, Johansen & Lawson, 2011]
From MCMC to ABC Methods
  ABC for model choice
    Generic ABC model choice


Limiting behaviour of B12 (under sufficiency)

     If η(y) sufficient statistic for both models,

                                     fi (y|θ i ) = gi (y)fiη (η(y)|θ i )

     Thus
                                                   η
                                Θ1   π(θ 1 )g1 (y)f1 (η(y)|θ 1 ) dθ 1
         B12 (y) =                                 η
                                Θ2   π(θ 2 )g2 (y)f2 (η(y)|θ 2 ) dθ 2
                                                    η
                               g1 (y)     π1 (θ 1 )f1 (η(y)|θ 1 ) dθ 1   g1 (y) η
                         =                          η                  =       B (y) .
                               g2 (y)     π2 (θ 2 )f2 (η(y)|θ 2 ) dθ 2   g2 (y) 12

                                         [Didelot, Everitt, Johansen & Lawson, 2011]
                  c No discrepancy only when cross-model sufficiency
From MCMC to ABC Methods
  ABC for model choice
    Generic ABC model choice


Poisson/geometric example

     Sample
                                  x = (x1 , . . . , xn )
     from either a Poisson P(λ) or from a geometric G(p) Then
                                        n
                                 S=          yi = η(x)
                                       i=1

     sufficient statistic for either model but not simultaneously
     Discrepancy ratio

                               g1 (x)   S!n−S / i yi !
                                      =
                               g2 (x)    1 n+S−1
                                               S
From MCMC to ABC Methods
  ABC for model choice
    Generic ABC model choice


Poisson/geometric discrepancy

                              η
     Range of B12 (x) versus B12 (x) B12 (x): The values produced have
     nothing in common.
From MCMC to ABC Methods
  ABC for model choice
    Generic ABC model choice


Formal recovery



     Creating an encompassing exponential family
                                     T           T
     f (x|θ1 , θ2 , α1 , α2 ) ∝ exp{θ1 η1 (x) + θ1 η1 (x) + α1 t1 (x) + α2 t2 (x)}

     leads to a sufficient statistic (η1 (x), η2 (x), t1 (x), t2 (x))
                            [Didelot, Everitt, Johansen & Lawson, 2011]
From MCMC to ABC Methods
  ABC for model choice
    Generic ABC model choice


Formal recovery



     Creating an encompassing exponential family
                                     T           T
     f (x|θ1 , θ2 , α1 , α2 ) ∝ exp{θ1 η1 (x) + θ1 η1 (x) + α1 t1 (x) + α2 t2 (x)}

     leads to a sufficient statistic (η1 (x), η2 (x), t1 (x), t2 (x))
                            [Didelot, Everitt, Johansen & Lawson, 2011]

     In the Poisson/geometric case, if        i xi !   is added to S, no
     discrepancy
From MCMC to ABC Methods
  ABC for model choice
    Generic ABC model choice


Formal recovery



     Creating an encompassing exponential family
                                     T           T
     f (x|θ1 , θ2 , α1 , α2 ) ∝ exp{θ1 η1 (x) + θ1 η1 (x) + α1 t1 (x) + α2 t2 (x)}

     leads to a sufficient statistic (η1 (x), η2 (x), t1 (x), t2 (x))
                            [Didelot, Everitt, Johansen & Lawson, 2011]

     Only applies in genuine sufficiency settings...
     c Inability to evaluate loss brought by summary statistics
From MCMC to ABC Methods
  ABC for model choice
    Generic ABC model choice


Meaning of the ABC-Bayes factor




     In the Poisson/geometric case, if E[yi ] = θ0 > 0,

                                    η          (θ0 + 1)2 −θ0
                               lim B12 (y) =            e
                               n→∞                 θ0
From MCMC to ABC Methods
  ABC for model choice
    Generic ABC model choice


MA(q) divergence




     Evolution [against ] of ABC Bayes factor, in terms of frequencies of
     visits to models MA(1) (left) and MA(2) (right) when equal to
     10, 1, .1, .01% quantiles on insufficient autocovariance distances. Sample
     of 50 points from a MA(2) with θ1 = 0.6, θ2 = 0.2. True Bayes factor
     equal to 17.71.
From MCMC to ABC Methods
  ABC for model choice
    Generic ABC model choice


MA(q) divergence




     Evolution [against ] of ABC Bayes factor, in terms of frequencies of
     visits to models MA(1) (left) and MA(2) (right) when equal to
     10, 1, .1, .01% quantiles on insufficient autocovariance distances. Sample
     of 50 points from a MA(1) model with θ1 = 0.6. True Bayes factor B21
     equal to .004.
From MCMC to ABC Methods
  ABC for model choice
    Generic ABC model choice


A population genetics evaluation


     Population genetics example with
             3 populations
             2 scenari
             15 individuals
             5 loci
             single mutation parameter
From MCMC to ABC Methods
  ABC for model choice
    Generic ABC model choice


A population genetics evaluation


     Population genetics example with
             3 populations
             2 scenari
             15 individuals
             5 loci
             single mutation parameter
             24 summary statistics
             2 million ABC proposal
             importance [tree] sampling alternative
From MCMC to ABC Methods
  ABC for model choice
    Generic ABC model choice


Stability of importance sampling
From MCMC to ABC Methods
  ABC for model choice
    Generic ABC model choice


Comparison with ABC
     Use of 24 summary statistics and DIY-ABC logistic correction
From MCMC to ABC Methods
  ABC for model choice
    Generic ABC model choice


Comparison with ABC
     Use of 15 summary statistics and DIY-ABC logistic correction
From MCMC to ABC Methods
  ABC for model choice
    Generic ABC model choice


Comparison with ABC
     Use of 24 summary statistics and DIY-ABC logistic correction
From MCMC to ABC Methods
  ABC for model choice
    Generic ABC model choice


The only safe cases



     Besides specific models like Gibbs random fields,
     using distances over the data itself escapes the discrepancy...
                              [Toni & Stumpf, 2010; Sousa et al., 2009]
From MCMC to ABC Methods
  ABC for model choice
    Generic ABC model choice


The only safe cases



     Besides specific models like Gibbs random fields,
     using distances over the data itself escapes the discrepancy...
                              [Toni & Stumpf, 2010; Sousa et al., 2009]

     ...and so does the use of more informal model fitting measures
                                           [Ratmann et al., 2009, 2011]

Weitere ähnliche Inhalte

Was ist angesagt?

no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithmno U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithmChristian Robert
 
ABC-Xian
ABC-XianABC-Xian
ABC-XianDeb Roy
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018Christian Robert
 
ABC: How Bayesian can it be?
ABC: How Bayesian can it be?ABC: How Bayesian can it be?
ABC: How Bayesian can it be?Christian Robert
 
A Markov Chain Monte Carlo approach to the Steiner Tree Problem in water netw...
A Markov Chain Monte Carlo approach to the Steiner Tree Problem in water netw...A Markov Chain Monte Carlo approach to the Steiner Tree Problem in water netw...
A Markov Chain Monte Carlo approach to the Steiner Tree Problem in water netw...Carlo Lancia
 
fauvel_igarss.pdf
fauvel_igarss.pdffauvel_igarss.pdf
fauvel_igarss.pdfgrssieee
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical MethodsChristian Robert
 
Micro to macro passage in traffic models including multi-anticipation effect
Micro to macro passage in traffic models including multi-anticipation effectMicro to macro passage in traffic models including multi-anticipation effect
Micro to macro passage in traffic models including multi-anticipation effectGuillaume Costeseque
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big DataChristian Robert
 
High-dimensional polytopes defined by oracles: algorithms, computations and a...
High-dimensional polytopes defined by oracles: algorithms, computations and a...High-dimensional polytopes defined by oracles: algorithms, computations and a...
High-dimensional polytopes defined by oracles: algorithms, computations and a...Vissarion Fisikopoulos
 
Discussion of Fearnhead and Prangle, RSS&lt; Dec. 14, 2011
Discussion of Fearnhead and Prangle, RSS&lt; Dec. 14, 2011Discussion of Fearnhead and Prangle, RSS&lt; Dec. 14, 2011
Discussion of Fearnhead and Prangle, RSS&lt; Dec. 14, 2011Christian Robert
 
On complementarity in qec and quantum cryptography
On complementarity in qec and quantum cryptographyOn complementarity in qec and quantum cryptography
On complementarity in qec and quantum cryptographywtyru1989
 
Jyokyo-kai-20120605
Jyokyo-kai-20120605Jyokyo-kai-20120605
Jyokyo-kai-20120605ketanaka
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical MethodsChristian Robert
 

Was ist angesagt? (19)

Jere Koskela slides
Jere Koskela slidesJere Koskela slides
Jere Koskela slides
 
Chris Sherlock's slides
Chris Sherlock's slidesChris Sherlock's slides
Chris Sherlock's slides
 
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithmno U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
 
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
 
ABC-Xian
ABC-XianABC-Xian
ABC-Xian
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
 
Ben Gal
Ben Gal Ben Gal
Ben Gal
 
ABC: How Bayesian can it be?
ABC: How Bayesian can it be?ABC: How Bayesian can it be?
ABC: How Bayesian can it be?
 
cswiercz-general-presentation
cswiercz-general-presentationcswiercz-general-presentation
cswiercz-general-presentation
 
A Markov Chain Monte Carlo approach to the Steiner Tree Problem in water netw...
A Markov Chain Monte Carlo approach to the Steiner Tree Problem in water netw...A Markov Chain Monte Carlo approach to the Steiner Tree Problem in water netw...
A Markov Chain Monte Carlo approach to the Steiner Tree Problem in water netw...
 
fauvel_igarss.pdf
fauvel_igarss.pdffauvel_igarss.pdf
fauvel_igarss.pdf
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 
Micro to macro passage in traffic models including multi-anticipation effect
Micro to macro passage in traffic models including multi-anticipation effectMicro to macro passage in traffic models including multi-anticipation effect
Micro to macro passage in traffic models including multi-anticipation effect
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 
High-dimensional polytopes defined by oracles: algorithms, computations and a...
High-dimensional polytopes defined by oracles: algorithms, computations and a...High-dimensional polytopes defined by oracles: algorithms, computations and a...
High-dimensional polytopes defined by oracles: algorithms, computations and a...
 
Discussion of Fearnhead and Prangle, RSS&lt; Dec. 14, 2011
Discussion of Fearnhead and Prangle, RSS&lt; Dec. 14, 2011Discussion of Fearnhead and Prangle, RSS&lt; Dec. 14, 2011
Discussion of Fearnhead and Prangle, RSS&lt; Dec. 14, 2011
 
On complementarity in qec and quantum cryptography
On complementarity in qec and quantum cryptographyOn complementarity in qec and quantum cryptography
On complementarity in qec and quantum cryptography
 
Jyokyo-kai-20120605
Jyokyo-kai-20120605Jyokyo-kai-20120605
Jyokyo-kai-20120605
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 

Ähnlich wie Shanghai tutorial

Xian's abc
Xian's abcXian's abc
Xian's abcDeb Roy
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methodsChristian Robert
 
RSS Read Paper by Mark Girolami
RSS Read Paper by Mark GirolamiRSS Read Paper by Mark Girolami
RSS Read Paper by Mark GirolamiChristian Robert
 
Mcmc & lkd free I
Mcmc & lkd free IMcmc & lkd free I
Mcmc & lkd free IDeb Roy
 
Monash University short course, part I
Monash University short course, part IMonash University short course, part I
Monash University short course, part IChristian Robert
 
Unbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloUnbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloJeremyHeng10
 
Markov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themMarkov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themPierre Jacob
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
Montpellier Math Colloquium
Montpellier Math ColloquiumMontpellier Math Colloquium
Montpellier Math ColloquiumChristian Robert
 
Vanilla rao blackwellisation
Vanilla rao blackwellisationVanilla rao blackwellisation
Vanilla rao blackwellisationDeb Roy
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
 
ABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsUmberto Picchini
 
Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...
Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...
Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...BigMC
 
Approximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelApproximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelMatt Moores
 
Unbiased MCMC with couplings
Unbiased MCMC with couplingsUnbiased MCMC with couplings
Unbiased MCMC with couplingsPierre Jacob
 

Ähnlich wie Shanghai tutorial (20)

Xian's abc
Xian's abcXian's abc
Xian's abc
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methods
 
RSS Read Paper by Mark Girolami
RSS Read Paper by Mark GirolamiRSS Read Paper by Mark Girolami
RSS Read Paper by Mark Girolami
 
Hastings 1970
Hastings 1970Hastings 1970
Hastings 1970
 
Mcmc & lkd free I
Mcmc & lkd free IMcmc & lkd free I
Mcmc & lkd free I
 
Monash University short course, part I
Monash University short course, part IMonash University short course, part I
Monash University short course, part I
 
HMC and NUTS
HMC and NUTSHMC and NUTS
HMC and NUTS
 
Unbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloUnbiased Markov chain Monte Carlo
Unbiased Markov chain Monte Carlo
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Markov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themMarkov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing them
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
 
Montpellier Math Colloquium
Montpellier Math ColloquiumMontpellier Math Colloquium
Montpellier Math Colloquium
 
Trondheim, LGM2012
Trondheim, LGM2012Trondheim, LGM2012
Trondheim, LGM2012
 
Vanilla rao blackwellisation
Vanilla rao blackwellisationVanilla rao blackwellisation
Vanilla rao blackwellisation
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
ABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space models
 
Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...
Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...
Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...
 
Approximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelApproximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts model
 
Unbiased MCMC with couplings
Unbiased MCMC with couplingsUnbiased MCMC with couplings
Unbiased MCMC with couplings
 

Mehr von Christian Robert

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceChristian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinChristian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihoodChristian Robert
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceChristian Robert
 

Mehr von Christian Robert (20)

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 

Kürzlich hochgeladen

4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...Nguyen Thanh Tu Collection
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesVijayaLaxmi84
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6Vanessa Camilleri
 
Comparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptxComparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptxAvaniJani1
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroomSamsung Business USA
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...Nguyen Thanh Tu Collection
 
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Osopher
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
Employablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxEmployablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxryandux83rd
 
Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxMadhavi Dharankar
 
4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptxmary850239
 

Kürzlich hochgeladen (20)

INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their uses
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6
 
Comparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptxComparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptx
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Employablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxEmployablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptx
 
Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptx
 
Chi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical VariableChi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical Variable
 
4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx
 

Shanghai tutorial

  • 1. From MCMC to ABC Methods From MCMC to ABC Methods Christian P. Robert Universit´ Paris-Dauphine, IuF, & CREST e http://www.ceremade.dauphine.fr/~xian O’Bayes 11, Shanghai, June 10, 2011
  • 2. From MCMC to ABC Methods Outline Computational issues in Bayesian statistics The Metropolis-Hastings Algorithm The Gibbs Sampler Approximate Bayesian computation ABC for model choice
  • 3. From MCMC to ABC Methods Computational issues in Bayesian statistics A typology of Bayes computational problems (i). use of a complex parameter space, as for instance in constrained parameter sets like those resulting from imposing stationarity constraints in dynamic models;
  • 4. From MCMC to ABC Methods Computational issues in Bayesian statistics A typology of Bayes computational problems (i). use of a complex parameter space, as for instance in constrained parameter sets like those resulting from imposing stationarity constraints in dynamic models; (ii). use of a complex sampling model with an intractable likelihood, as for instance in some latent variable or graphical models or in inverse problems;
  • 5. From MCMC to ABC Methods Computational issues in Bayesian statistics A typology of Bayes computational problems (i). use of a complex parameter space, as for instance in constrained parameter sets like those resulting from imposing stationarity constraints in dynamic models; (ii). use of a complex sampling model with an intractable likelihood, as for instance in some latent variable or graphical models or in inverse problems; (iii). use of a huge dataset;
  • 6. From MCMC to ABC Methods Computational issues in Bayesian statistics A typology of Bayes computational problems (i). use of a complex parameter space, as for instance in constrained parameter sets like those resulting from imposing stationarity constraints in dynamic models; (ii). use of a complex sampling model with an intractable likelihood, as for instance in some latent variable or graphical models or in inverse problems; (iii). use of a huge dataset; (iv). use of a complex prior distribution (which may be the posterior distribution associated with an earlier sample);
  • 7. From MCMC to ABC Methods Computational issues in Bayesian statistics A typology of Bayes computational problems (i). use of a complex parameter space, as for instance in constrained parameter sets like those resulting from imposing stationarity constraints in dynamic models; (ii). use of a complex sampling model with an intractable likelihood, as for instance in some latent variable or graphical models or in inverse problems; (iii). use of a huge dataset; (iv). use of a complex prior distribution (which may be the posterior distribution associated with an earlier sample); (v). use of a particular inferential procedure as for instance, Bayes factors π P (θ ∈ Θ0 | x) π(θ ∈ Θ0 ) B01 (x) = . P (θ ∈ Θ1 | x) π(θ ∈ Θ1 )
  • 8. From MCMC to ABC Methods The Metropolis-Hastings Algorithm The Metropolis-Hastings Algorithm Computational issues in Bayesian statistics The Metropolis-Hastings Algorithm Monte Carlo basics Monte Carlo Methods based on Markov Chains The Metropolis–Hastings algorithm Random-walk Metropolis-Hastings algorithms The Gibbs Sampler Approximate Bayesian computation ABC for model choice
  • 9. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Monte Carlo basics General purpose Given a density π known up to a normalizing constant, and an integrable function h, compute h(x)˜ (x)µ(dx) π Π(h) = h(x)π(x)µ(dx) = π (x)µ(dx) ˜ when h(x)˜ (x)µ(dx) is intractable. π
  • 10. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Monte Carlo basics Monte Carlo 101 Generate an iid sample x1 , . . . , xN from π and estimate Π(h) by N ˆN ΠM C (h) = N −1 h(xi ). i=1 ˆN as LLN: ΠM C (h) −→ Π(h) If Π(h2 ) = h2 (x)π(x)µ(dx) < ∞, √ L CLT: ˆN N ΠM C (h) − Π(h) N 0, Π [h − Π(h)]2 .
  • 11. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Monte Carlo basics Monte Carlo 101 Generate an iid sample x1 , . . . , xN from π and estimate Π(h) by N ˆN ΠM C (h) = N −1 h(xi ). i=1 ˆN as LLN: ΠM C (h) −→ Π(h) If Π(h2 ) = h2 (x)π(x)µ(dx) < ∞, √ L CLT: ˆN N ΠM C (h) − Π(h) N 0, Π [h − Π(h)]2 . Caveat announcing MCMC Often impossible or inefficient to simulate directly from Π
  • 12. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Monte Carlo basics Importance Sampling For Q proposal distribution such that Q(dx) = q(x)µ(dx), alternative representation Π(h) = h(x){π/q}(x)q(x)µ(dx).
  • 13. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Monte Carlo basics Importance Sampling For Q proposal distribution such that Q(dx) = q(x)µ(dx), alternative representation Π(h) = h(x){π/q}(x)q(x)µ(dx). Principle of importance Generate an iid sample x1 , . . . , xN ∼ Q and estimate Π(h) by N ˆ IS ΠQ,N (h) = N −1 h(xi ){π/q}(xi ). i=1
  • 14. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Monte Carlo basics Properties of importance Then ˆ as LLN: ΠIS (h) −→ Π(h) Q,N and if Q((hπ/q)2 ) < ∞, √ L CLT: ˆ Q,N N (ΠIS (h) − Π(h)) N 0, Q{(hπ/q − Π(h))2 } .
  • 15. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Monte Carlo basics Properties of importance Then ˆ as LLN: ΠIS (h) −→ Π(h) Q,N and if Q((hπ/q)2 ) < ∞, √ L CLT: ˆ Q,N N (ΠIS (h) − Π(h)) N 0, Q{(hπ/q − Π(h))2 } . Caveat ˆ Q,N If normalizing constant of π unknown, impossible to use ΠIS Generic problem in Bayesian Statistics: π(θ|x) ∝ f (x|θ)π(θ).
  • 16. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Monte Carlo basics Self-Normalised Importance Sampling Self normalized version N −1 N ˆ Q,N ΠSN IS (h) = {π/q}(xi ) h(xi ){π/q}(xi ). i=1 i=1
  • 17. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Monte Carlo basics Self-Normalised Importance Sampling Self normalized version N −1 N ˆ Q,N ΠSN IS (h) = {π/q}(xi ) h(xi ){π/q}(xi ). i=1 i=1 ˆ as LLN : ΠSN IS (h) −→ Π(h) Q,N and if Π((1 + h2 )(π/q)) < ∞, √ L CLT : ˆ Q,N N (ΠSN IS (h) − Π(h)) N 0, π {(π/q)(h − Π(h)}2 ) .
  • 18. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Monte Carlo basics Self-Normalised Importance Sampling Self normalized version N −1 N ˆ Q,N ΠSN IS (h) = {π/q}(xi ) h(xi ){π/q}(xi ). i=1 i=1 ˆ as LLN : ΠSN IS (h) −→ Π(h) Q,N and if Π((1 + h2 )(π/q)) < ∞, √ L CLT : ˆ Q,N N (ΠSN IS (h) − Π(h)) N 0, π {(π/q)(h − Π(h)}2 ) . c The quality of the SNIS approximation depends on the choice of Q
  • 19. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains (MCMC) It is not necessary to use a sample from the distribution f to approximate the integral I= h(x)f (x)dx ,
  • 20. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains (MCMC) It is not necessary to use a sample from the distribution f to approximate the integral I= h(x)f (x)dx , We can obtain X1 , . . . , Xn ∼ f (approx) without directly simulating from f , using an ergodic Markov chain with stationary distribution f
  • 21. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains (2) Idea For an arbitrary starting value x(0) , an ergodic chain (X (t) ) is generated using a transition kernel with stationary distribution f
  • 22. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains (2) Idea For an arbitrary starting value x(0) , an ergodic chain (X (t) ) is generated using a transition kernel with stationary distribution f Insures the convergence in distribution of (X (t) ) to a random variable from f . For a “large enough” T0 , X (T0 ) can be considered as distributed from f Produce a dependent sample X (T0 ) , X (T0 +1) , . . ., which is generated from f , sufficient for most approximation purposes.
  • 23. From MCMC to ABC Methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm The Metropolis–Hastings algorithm Basics The algorithm uses the target density f and a conditional density q(y|x) called the instrumental (or proposal) distribution
  • 24. From MCMC to ABC Methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm The MH algorithm Algorithm (Metropolis–Hastings) Given x(t) , 1. Generate Yt ∼ q(y|x(t) ). 2. Take Yt with prob. ρ(x(t) , Yt ), X (t+1) = x(t) with prob. 1 − ρ(x(t) , Yt ), where f (y) q(x|y) ρ(x, y) = min ,1 . f (x) q(y|x)
  • 25. From MCMC to ABC Methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm Features Independent of normalizing constants for both f and q(·|x) (ie, those constants independent of x) Never move to values with f (y) = 0 The chain (x(t) )t may take the same value several times in a row, even though f is a density wrt Lebesgue measure The sequence (yt )t is usually not a Markov chain
  • 26. From MCMC to ABC Methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm Convergence properties Under irreducibility, 1. The M-H Markov chain is reversible, with invariant/stationary density f since it satisfies the detailed balance condition f (y) K(y, x) = f (x) K(x, y)
  • 27. From MCMC to ABC Methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm Convergence properties Under irreducibility, 1. The M-H Markov chain is reversible, with invariant/stationary density f since it satisfies the detailed balance condition f (y) K(y, x) = f (x) K(x, y) 2. As f is a probability measure, the chain is positive recurrent
  • 28. From MCMC to ABC Methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm Convergence properties (2) 4. If q(y|x) > 0 for every (x, y), (2) the chain is irreducible
  • 29. From MCMC to ABC Methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm Convergence properties (2) 4. If q(y|x) > 0 for every (x, y), (2) the chain is irreducible 5. For M-H, f -irreducibility implies Harris recurrence
  • 30. From MCMC to ABC Methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm Convergence properties (2) 4. If q(y|x) > 0 for every (x, y), (2) the chain is irreducible 5. For M-H, f -irreducibility implies Harris recurrence 6. Thus, for M-H satisfying (1) and (2) (i) For h, with Ef |h(X)| < ∞, T 1 lim h(X (t) ) = h(x)df (x) a.e. f. T →∞ T t=1 (ii) and lim K n (x, ·)µ(dx) − f =0 n→∞ TV for every initial distribution µ, where K n (x, ·) denotes the kernel for n transitions.
  • 31. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithms Random walk Metropolis–Hastings Use of a local perturbation as proposal Yt = X (t) + εt , where εt ∼ g, independent of X (t) . The instrumental density is of the form g(y − x) and the Markov chain is a random walk if we take g to be symmetric g(x) = g(−x)
  • 32. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithms Algorithm (Random walk Metropolis) Given x(t) 1. Generate Yt ∼ g(y − x(t) ) 2. Take  Y f (Yt ) (t+1) t with prob. min 1, , X = f (x(t) )  x(t) otherwise.
  • 33. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithms Optimizing the Acceptance Rate Problem of choice of the transition kernel from a practical point of view Most common alternatives: 1. an instrumental density g which approximates f , such that f /g is bounded for uniform ergodicity to apply; 2. a random walk In both cases, the choice of g is critical,
  • 34. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithms Case of the random walk Different approach to acceptance rates A high acceptance rate does not indicate that the algorithm is moving correctly since it indicates that the random walk is moving too slowly on the surface of f .
  • 35. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithms Case of the random walk Different approach to acceptance rates A high acceptance rate does not indicate that the algorithm is moving correctly since it indicates that the random walk is moving too slowly on the surface of f . If x(t) and yt are close, i.e. f (x(t) ) f (yt ) y is accepted with probability f (yt ) min ,1 1. f (x(t) ) For multimodal densities with well separated modes, the negative effect of limited moves on the surface of f clearly shows.
  • 36. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithms Case of the random walk (2) If the average acceptance rate is low, the successive values of f (yt ) tend to be small compared with f (x(t) ), which means that the random walk moves quickly on the surface of f since it often reaches the “borders” of the support of f
  • 37. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithms Case of the random walk (2) If the average acceptance rate is low, the successive values of f (yt ) tend to be small compared with f (x(t) ), which means that the random walk moves quickly on the surface of f since it often reaches the “borders” of the support of f In small dimensions, aim at an average acceptance rate of 50%. In large dimensions, at an average acceptance rate of 25%. [Gelman,Gilks and Roberts, 1995]
  • 38. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithms Example (Noisy AR(1)) Hidden Markov chain from a regular AR(1) model, xt+1 = ϕxt + t+1 t ∼ N (0, τ 2 ) and observables yt |xt ∼ N (x2 , σ 2 ) t
  • 39. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithms Example (Noisy AR(1)) Hidden Markov chain from a regular AR(1) model, xt+1 = ϕxt + t+1 t ∼ N (0, τ 2 ) and observables yt |xt ∼ N (x2 , σ 2 ) t The distribution of xt given xt−1 , xt+1 and yt is −1 τ2 exp (xt − ϕxt−1 )2 + (xt+1 − ϕxt )2 + (yt − x2 )2 t . 2τ 2 σ2
  • 40. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithms Example (Noisy AR(1) continued) For a Gaussian random walk with scale ω small enough, the random walk never jumps to the other mode. But if the scale ω is sufficiently large, the Markov chain explores both modes and give a satisfactory approximation of the target distribution.
  • 41. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithms Markov chain based on a random walk with scale ω = .1.
  • 42. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithms Markov chain based on a random walk with scale ω = .5.
  • 43. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithms MA(2) xt = t − θ1 t−1 − θ2 t−2 Since the constraints on (ϑ1 , ϑ2 ) are well-defined, use of a flat prior over the triangle as prior. Simple representation of the likelihood library(mnormt) ma2like=function(theta){ n=length(y) sigma = toeplitz(c(1 +theta[1]^2+theta[2]^2, theta[1]+theta[1]*theta[2],theta[2],rep(0,n-3))) dmnorm(y,rep(0,n),sigma,log=TRUE) }
  • 44. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithms Basic RWHM for MA(2) Algorithm 1 RW-HM-MA(2) sampler set ω and ϑ(1) for i = 2 to T do ˜ (i−1) (i−1) generate ϑj ∼ U (ϑj − ω, ϑj + ω) set p = 0 and ϑ (i) = ϑ(i−1) ˜ if ϑ within the triangle then ˜ p = exp(ma2like(ϑ) − ma2like(ϑ(i−1) )) end if if U < p then ˜ ϑ(i) = ϑ end if end for
  • 45. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithms Outcome Result with a simulated sample of 100 points and ϑ1 = 0.6, ϑ2 = 0.2 and scale ω = 0.2
  • 46. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithms Outcome Result with a simulated sample of 100 points and ϑ1 = 0.6, ϑ2 = 0.2 and scale ω = 0.5
  • 47. From MCMC to ABC Methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithms Outcome Result with a simulated sample of 100 points and ϑ1 = 0.6, ϑ2 = 0.2 and scale ω = 2.0
  • 48. From MCMC to ABC Methods The Gibbs Sampler The Gibbs Sampler The Gibbs Sampler General Principles Slice sampling Convergence
  • 49. From MCMC to ABC Methods The Gibbs Sampler General Principles General Principles A very specific simulation algorithm based on the target distribution f : 1. Uses the conditional densities f1 , . . . , fp from f
  • 50. From MCMC to ABC Methods The Gibbs Sampler General Principles General Principles A very specific simulation algorithm based on the target distribution f : 1. Uses the conditional densities f1 , . . . , fp from f 2. Start with the random variable X = (X1 , . . . , Xp )
  • 51. From MCMC to ABC Methods The Gibbs Sampler General Principles General Principles A very specific simulation algorithm based on the target distribution f : 1. Uses the conditional densities f1 , . . . , fp from f 2. Start with the random variable X = (X1 , . . . , Xp ) 3. Simulate from the conditional densities, Xi |x1 , x2 , . . . , xi−1 , xi+1 , . . . , xp ∼ fi (xi |x1 , x2 , . . . , xi−1 , xi+1 , . . . , xp ) for i = 1, 2, . . . , p.
  • 52. From MCMC to ABC Methods The Gibbs Sampler General Principles Algorithm (Gibbs sampler) (t) (t) Given x(t) = (x1 , . . . , xp ), generate (t+1) (t) (t) 1. X1 ∼ f1 (x1 |x2 , . . . , xp ); (t+1) (t+1) (t) (t) 2. X2 ∼ f2 (x2 |x1 , x3 , . . . , xp ), ... (t+1) (t+1) (t+1) p. Xp ∼ fp (xp |x1 , . . . , xp−1 ) X(t+1) → X ∼ f
  • 53. From MCMC to ABC Methods The Gibbs Sampler General Principles Properties The full conditionals densities f1 , . . . , fp are the only densities used for simulation. Thus, even in a high dimensional problem, all of the simulations may be univariate
  • 54. From MCMC to ABC Methods The Gibbs Sampler General Principles Properties The full conditionals densities f1 , . . . , fp are the only densities used for simulation. Thus, even in a high dimensional problem, all of the simulations may be univariate The Gibbs sampler is not reversible with respect to f . However, each of its p components is. Besides, it can be turned into a reversible sampler, either using the Random Scan Gibbs sampler see section or running instead the (double) sequence f1 · · · fp−1 fp fp−1 · · · f1
  • 55. From MCMC to ABC Methods The Gibbs Sampler General Principles Limitations of the Gibbs sampler Formally, a special case of a sequence of 1-D M-H kernels, all with acceptance rate uniformly equal to 1. The Gibbs sampler 1. limits the choice of instrumental distributions
  • 56. From MCMC to ABC Methods The Gibbs Sampler General Principles Limitations of the Gibbs sampler Formally, a special case of a sequence of 1-D M-H kernels, all with acceptance rate uniformly equal to 1. The Gibbs sampler 1. limits the choice of instrumental distributions 2. requires some knowledge of f
  • 57. From MCMC to ABC Methods The Gibbs Sampler General Principles Limitations of the Gibbs sampler Formally, a special case of a sequence of 1-D M-H kernels, all with acceptance rate uniformly equal to 1. The Gibbs sampler 1. limits the choice of instrumental distributions 2. requires some knowledge of f 3. is, by construction, multidimensional
  • 58. From MCMC to ABC Methods The Gibbs Sampler General Principles Limitations of the Gibbs sampler Formally, a special case of a sequence of 1-D M-H kernels, all with acceptance rate uniformly equal to 1. The Gibbs sampler 1. limits the choice of instrumental distributions 2. requires some knowledge of f 3. is, by construction, multidimensional 4. does not apply to problems where the number of parameters varies as the resulting chain is not irreducible.
  • 59. From MCMC to ABC Methods The Gibbs Sampler General Principles A wee mixture problem 4 3 2 µ2 1 0 −1 −1 0 1 2 3 4 µ1 Gibbs started at random
  • 60. From MCMC to ABC Methods The Gibbs Sampler General Principles A wee mixture problem Gibbs stuck at the wrong mode 4 3 3 2 2 µ2 1 µ2 1 0 0 −1 −1 −1 0 1 2 3 4 µ1 Gibbs started at random −1 0 1 2 3 µ1
  • 61. From MCMC to ABC Methods The Gibbs Sampler Slice sampling Slice sampler as generic Gibbs If f (θ) can be written as a product k fi (θ), i=1
  • 62. From MCMC to ABC Methods The Gibbs Sampler Slice sampling Slice sampler as generic Gibbs If f (θ) can be written as a product k fi (θ), i=1 it can be completed as k I0≤ωi ≤fi (θ) , i=1 leading to the following Gibbs algorithm:
  • 63. From MCMC to ABC Methods The Gibbs Sampler Slice sampling Algorithm (Slice sampler) Simulate (t+1) 1. ω1 ∼ U[0,f1 (θ(t) )] ; ... (t+1) k. ωk ∼ U[0,fk (θ(t) )] ; k+1. θ(t+1) ∼ UA(t+1) , with (t+1) A(t+1) = {y; fi (y) ≥ ωi , i = 1, . . . , k}.
  • 64. From MCMC to ABC Methods The Gibbs Sampler Slice sampling Example of results with a truncated N (−3, 1) distribution 0.010 0.008 0.006 y 0.004 0.002 0.000 0.0 0.2 0.4 0.6 0.8 1.0 x Number of Iterations 2
  • 65. From MCMC to ABC Methods The Gibbs Sampler Slice sampling Example of results with a truncated N (−3, 1) distribution 0.010 0.008 0.006 y 0.004 0.002 0.000 0.0 0.2 0.4 0.6 0.8 1.0 x Number of Iterations 2, 3
  • 66. From MCMC to ABC Methods The Gibbs Sampler Slice sampling Example of results with a truncated N (−3, 1) distribution 0.010 0.008 0.006 y 0.004 0.002 0.000 0.0 0.2 0.4 0.6 0.8 1.0 x Number of Iterations 2, 3, 4
  • 67. From MCMC to ABC Methods The Gibbs Sampler Slice sampling Example of results with a truncated N (−3, 1) distribution 0.010 0.008 0.006 y 0.004 0.002 0.000 0.0 0.2 0.4 0.6 0.8 1.0 x Number of Iterations 2, 3, 4, 5
  • 68. From MCMC to ABC Methods The Gibbs Sampler Slice sampling Example of results with a truncated N (−3, 1) distribution 0.010 0.008 0.006 y 0.004 0.002 0.000 0.0 0.2 0.4 0.6 0.8 1.0 x Number of Iterations 2, 3, 4, 5, 10
  • 69. From MCMC to ABC Methods The Gibbs Sampler Slice sampling Example of results with a truncated N (−3, 1) distribution 0.010 0.008 0.006 y 0.004 0.002 0.000 0.0 0.2 0.4 0.6 0.8 1.0 x Number of Iterations 2, 3, 4, 5, 10, 50
  • 70. From MCMC to ABC Methods The Gibbs Sampler Slice sampling Example of results with a truncated N (−3, 1) distribution 0.010 0.008 0.006 y 0.004 0.002 0.000 0.0 0.2 0.4 0.6 0.8 1.0 x Number of Iterations 2, 3, 4, 5, 10, 50, 100
  • 71. From MCMC to ABC Methods The Gibbs Sampler Slice sampling Good slices, tough slices The slice sampler usually enjoys good theoretical properties (like geometric ergodicity and even uniform ergodicity under bounded f and bounded X ). As k increases, the determination of the set A(t+1) may get increasingly complex.
  • 72. From MCMC to ABC Methods The Gibbs Sampler Convergence Properties of the Gibbs sampler Theorem (Convergence) For (Y1 , Y2 , · · · , Yp ) ∼ g(y1 , . . . , yp ), if either [Positivity condition] (i) g (i) (y > 0 for every i = 1, · · · , p, implies that i) g(y1 , . . . , yp ) > 0, where g (i) denotes the marginal distribution of Yi , or (ii) the transition kernel is absolutely continuous with respect to g, then the chain is irreducible and positive Harris recurrent.
  • 73. From MCMC to ABC Methods The Gibbs Sampler Convergence Properties of the Gibbs sampler (2) Consequences (i) If h(y)g(y)dy < ∞, then T 1 lim h1 (Y (t) ) = h(y)g(y)dy a.e. g. nT →∞ T t=1 (ii) If, in addition, (Y (t) ) is aperiodic, then lim K n (y, ·)µ(dx) − f =0 n→∞ TV for every initial distribution µ.
  • 74. From MCMC to ABC Methods The Gibbs Sampler Convergence Hammersley-Clifford theorem An illustration that conditionals determine the joint distribution Theorem If the joint density g(y1 , y2 ) have conditional distributions g1 (y1 |y2 ) and g2 (y2 |y1 ), then g2 (y2 |y1 ) g(y1 , y2 ) = . g2 (v|y1 )/g1 (y1 |v) dv [Hammersley & Clifford, circa 1970]
  • 75. From MCMC to ABC Methods The Gibbs Sampler Convergence General HC decomposition Under the positivity condition, the joint distribution g satisfies p g j (y j |y 1 , . . . , y j−1 ,y j+1 , . . . , y p) g(y1 , . . . , yp ) ∝ g j (y j |y 1 , . . . , y j−1 ,y , . . . , y p) j=1 j+1 for every permutation on {1, 2, . . . , p} and every y ∈ Y .
  • 76. From MCMC to ABC Methods Approximate Bayesian computation Approximate Bayesian computation Computational issues in Bayesian statistics The Metropolis-Hastings Algorithm The Gibbs Sampler Approximate Bayesian computation ABC basics Alphabet soup Calibration of ABC ABC for model choice
  • 77. From MCMC to ABC Methods Approximate Bayesian computation ABC basics Untractable likelihoods Cases when the likelihood function f (y|θ) is unavailable and when the completion step f (y|θ) = f (y, z|θ) dz Z is impossible or too costly because of the dimension of z c MCMC cannot be implemented!
  • 78. From MCMC to ABC Methods Approximate Bayesian computation ABC basics Illustrations Example Stochastic volatility model: for Highest weight trajectories t = 1, . . . , T, 0.4 0.2 yt = exp(zt ) t , zt = a+bzt−1 +σηt , 0.0 −0.2 T very large makes it difficult to −0.4 include z within the simulated 0 200 400 t 600 800 1000 parameters
  • 79. From MCMC to ABC Methods Approximate Bayesian computation ABC basics Illustrations Example Potts model: if y takes values on a grid Y of size k n and f (y|θ) ∝ exp θ Iyl =yi l∼i where l∼i denotes a neighbourhood relation, n moderately large prohibits the computation of the normalising constant
  • 80. From MCMC to ABC Methods Approximate Bayesian computation ABC basics Illustrations Example Inference on CMB: in cosmology, study of the Cosmic Microwave Background via likelihoods immensely slow to computate (e.g WMAP, Plank), because of numerically costly spectral transforms [Data is a Fortran program] [Kilbinger et al., 2010, MNRAS]
  • 81. From MCMC to ABC Methods Approximate Bayesian computation ABC basics Illustrations Example Phylogenetic tree: in population genetics, reconstitution of a common ancestor from a sample of genes via a phylogenetic tree that is close to impossible to integrate out [100 processor days with 4 parameters] [Cornuet et al., 2009, Bioinformatics]
  • 82. From MCMC to ABC Methods Approximate Bayesian computation ABC basics The ABC method Bayesian setting: target is π(θ)f (x|θ)
  • 83. From MCMC to ABC Methods Approximate Bayesian computation ABC basics The ABC method Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique:
  • 84. From MCMC to ABC Methods Approximate Bayesian computation ABC basics The ABC method Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique: ABC algorithm For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly simulating θ ∼ π(θ) , z ∼ f (z|θ ) , until the auxiliary variable z is equal to the observed value, z = y. [Tavar´ et al., 1997] e
  • 85. From MCMC to ABC Methods Approximate Bayesian computation ABC basics Why does it work?! The proof is trivial: f (θi ) ∝ π(θi )f (z|θi )Iy (z) z∈D ∝ π(θi )f (y|θi ) = π(θi |y) . [Accept–Reject 101]
  • 86. From MCMC to ABC Methods Approximate Bayesian computation ABC basics A as approximative When y is a continuous random variable, equality z = y is replaced with a tolerance condition, (y, z) ≤ where is a distance
  • 87. From MCMC to ABC Methods Approximate Bayesian computation ABC basics A as approximative When y is a continuous random variable, equality z = y is replaced with a tolerance condition, (y, z) ≤ where is a distance Output distributed from π(θ) Pθ { (y, z) < } ∝ π(θ| (y, z) < )
  • 88. From MCMC to ABC Methods Approximate Bayesian computation ABC basics ABC algorithm Algorithm 2 Likelihood-free rejection sampler 2 for i = 1 to N do repeat generate θ from the prior distribution π(·) generate z from the likelihood f (·|θ ) until ρ{η(z), η(y)} ≤ set θi = θ end for where η(y) defines a (not necessarily sufficient) statistic
  • 89. From MCMC to ABC Methods Approximate Bayesian computation ABC basics Output The likelihood-free algorithm samples from the marginal in z of: π(θ)f (z|θ)IA ,y (z) π (θ, z|y) = , A ,y ×Θ π(θ)f (z|θ)dzdθ where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
  • 90. From MCMC to ABC Methods Approximate Bayesian computation ABC basics Output The likelihood-free algorithm samples from the marginal in z of: π(θ)f (z|θ)IA ,y (z) π (θ, z|y) = , A ,y ×Θ π(θ)f (z|θ)dzdθ where A ,y = {z ∈ D|ρ(η(z), η(y)) < }. The idea behind ABC is that the summary statistics coupled with a small tolerance should provide a good approximation of the posterior distribution: π (θ|y) = π (θ, z|y)dz ≈ π(θ|y) .
  • 91. From MCMC to ABC Methods Approximate Bayesian computation ABC basics MA example Back to the MA(q) model q xt = t + ϑi t−i i=1 Simple prior: uniform over the inverse [real and complex] roots in q Q(u) = 1 − ϑi ui i=1 under the identifiability conditions
  • 92. From MCMC to ABC Methods Approximate Bayesian computation ABC basics MA example Back to the MA(q) model q xt = t+ ϑi t−i i=1 Simple prior: uniform prior over the identifiability zone, e.g. triangle for MA(2)
  • 93. From MCMC to ABC Methods Approximate Bayesian computation ABC basics MA example (2) ABC algorithm thus made of 1. picking a new value (ϑ1 , ϑ2 ) in the triangle 2. generating an iid sequence ( t )−q<t≤T 3. producing a simulated series (xt )1≤t≤T
  • 94. From MCMC to ABC Methods Approximate Bayesian computation ABC basics MA example (2) ABC algorithm thus made of 1. picking a new value (ϑ1 , ϑ2 ) in the triangle 2. generating an iid sequence ( t )−q<t≤T 3. producing a simulated series (xt )1≤t≤T Distance: basic distance between the series T ρ((xt )1≤t≤T , (xt )1≤t≤T ) = (xt − xt )2 t=1 or distance between summary statistics like the q autocorrelations T τj = xt xt−j t=j+1
  • 95. From MCMC to ABC Methods Approximate Bayesian computation ABC basics Comparison of distance impact Evaluation of the tolerance on the ABC sample against both distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
  • 96. From MCMC to ABC Methods Approximate Bayesian computation ABC basics Comparison of distance impact Evaluation of the tolerance on the ABC sample against both distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
  • 97. From MCMC to ABC Methods Approximate Bayesian computation ABC basics Comparison of distance impact Evaluation of the tolerance on the ABC sample against both distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
  • 98. From MCMC to ABC Methods Approximate Bayesian computation ABC basics ABC advances Simulating from the prior is often poor in efficiency
  • 99. From MCMC to ABC Methods Approximate Bayesian computation ABC basics ABC advances Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007; Sisson et al., 2007]
  • 100. From MCMC to ABC Methods Approximate Bayesian computation ABC basics ABC advances Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007; Sisson et al., 2007] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger [Beaumont et al., 2002]
  • 101. From MCMC to ABC Methods Approximate Bayesian computation ABC basics ABC advances Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007; Sisson et al., 2007] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger [Beaumont et al., 2002] .....or even by including in the inferential framework [ABCµ ] [Ratmann et al., 2009]
  • 102. From MCMC to ABC Methods Approximate Bayesian computation Alphabet soup ABC-NP Better usage of [prior] simulations by adjustement: instead of throwing away θ such that ρ(η(z), η(y)) > , replace θs with locally regressed ˆ θ∗ = θ − {η(z) − η(y)}T β [Csill´ry et al., TEE, 2010] e ˆ where β is obtained by [NP] weighted least square regression on (η(z) − η(y)) with weights Kδ {ρ(η(z), η(y))} [Beaumont et al., 2002, Genetics]
  • 103. From MCMC to ABC Methods Approximate Bayesian computation Alphabet soup ABC-MCMC Markov chain (θ(t) ) created via the transition function  θ ∼ Kω (θ |θ(t) ) if x ∼ f (x|θ ) is such that x = y   π(θ )Kω (t) |θ ) θ (t+1) = and u ∼ U (0, 1) ≤ π(θ(t) )K (θ |θ(t) ) ,   (t) ω (θ θ otherwise,
  • 104. From MCMC to ABC Methods Approximate Bayesian computation Alphabet soup ABC-MCMC Markov chain (θ(t) ) created via the transition function  θ ∼ Kω (θ |θ(t) ) if x ∼ f (x|θ ) is such that x = y   π(θ )Kω (t) |θ ) θ (t+1) = and u ∼ U (0, 1) ≤ π(θ(t) )K (θ |θ(t) ) ,   (t) ω (θ θ otherwise, has the posterior π(θ|y) as stationary distribution [Marjoram et al, 2003]
  • 105. From MCMC to ABC Methods Approximate Bayesian computation Alphabet soup ABC-MCMC (2) Algorithm 3 Likelihood-free MCMC sampler Use Algorithm 2 to get (θ(0) , z(0) ) for t = 1 to N do Generate θ from Kω ·|θ(t−1) , Generate z from the likelihood f (·|θ ), Generate u from U[0,1] , π(θ )Kω (θ (t−1) |θ ) if u ≤ I π(θ (t−1) Kω (θ |θ (t−1) ) A ,y (z ) then set (θ(t) , z(t) ) = (θ , z ) else (θ(t) , z(t) )) = (θ(t−1) , z(t−1) ), end if end for
  • 106. From MCMC to ABC Methods Approximate Bayesian computation Alphabet soup Why does it work? Acceptance probability that does not involve the calculation of the likelihood and π (θ , z |y) Kω (θ(t−1) |θ )f (z(t−1) |θ(t−1) ) × π (θ(t−1) , z(t−1) |y) Kω (θ |θ(t−1) )f (z |θ ) π(θ ) f (z |θ ) IA ,y (z ) = (t−1) ) f (z(t−1) |θ (t−1) )I (t−1) ) π(θ A ,y (z Kω (θ(t−1) |θ ) f (z(t−1) |θ(t−1) ) × Kω (θ |θ(t−1) ) f (z |θ ) π(θ )Kω (θ(t−1) |θ ) = IA (z ) . π(θ(t−1) Kω (θ |θ(t−1) ) ,y
  • 107. From MCMC to ABC Methods Approximate Bayesian computation Alphabet soup ABCµ [Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS] Use of a joint density f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( ) where y is the data, and ξ( |y, θ) is the prior predictive density of ρ(η(z), η(y)) given θ and x when z ∼ f (z|θ)
  • 108. From MCMC to ABC Methods Approximate Bayesian computation Alphabet soup ABCµ [Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS] Use of a joint density f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( ) where y is the data, and ξ( |y, θ) is the prior predictive density of ρ(η(z), η(y)) given θ and x when z ∼ f (z|θ) Warning! Replacement of ξ( |y, θ) with a non-parametric kernel approximation.
  • 109. From MCMC to ABC Methods Approximate Bayesian computation Alphabet soup ABCµ details Multidimensional distances ρk (k = 1, . . . , K) and errors k = ρk (ηk (z), ηk (y)), with ˆ 1 k ∼ ξk ( |y, θ) ≈ ξk ( |y, θ) = K[{ k −ρk (ηk (zb ), ηk (y))}/hk ] Bhk b ˆ then used in replacing ξ( |y, θ) with mink ξk ( |y, θ)
  • 110. From MCMC to ABC Methods Approximate Bayesian computation Alphabet soup ABCµ details Multidimensional distances ρk (k = 1, . . . , K) and errors k = ρk (ηk (z), ηk (y)), with ˆ 1 k ∼ ξk ( |y, θ) ≈ ξk ( |y, θ) = K[{ k −ρk (ηk (zb ), ηk (y))}/hk ] Bhk b ˆ then used in replacing ξ( |y, θ) with mink ξk ( |y, θ) ABCµ involves acceptance probability ˆ π(θ , ) q(θ , θ)q( , ) mink ξk ( |y, θ ) ˆ π(θ, ) q(θ, θ )q( , ) mink ξk ( |y, θ)
  • 111. From MCMC to ABC Methods Approximate Bayesian computation Alphabet soup ABCµ multiple errors [ c Ratmann et al., PNAS, 2009]
  • 112. From MCMC to ABC Methods Approximate Bayesian computation Alphabet soup ABCµ for model choice [ c Ratmann et al., PNAS, 2009]
  • 113. From MCMC to ABC Methods Approximate Bayesian computation Alphabet soup Questions about ABCµ For each model under comparison, marginal posterior on used to assess the fit of the model (HPD includes 0 or not).
  • 114. From MCMC to ABC Methods Approximate Bayesian computation Alphabet soup Questions about ABCµ For each model under comparison, marginal posterior on used to assess the fit of the model (HPD includes 0 or not). Is the data informative about ? [Identifiability] How is the prior π( ) impacting the comparison? How is using both ξ( |x0 , θ) and π ( ) compatible with a standard probability model? [remindful of Wilkinson] Where is the penalisation for complexity in the model comparison? [X, Mengersen & Chen, 2010, PNAS]
  • 115. From MCMC to ABC Methods Approximate Bayesian computation Alphabet soup A PMC version Generate a sample at iteration t by N (t) (t−1) (t−1) πt (θ ) ∝ ˆ ωj Kt (θ(t) |θj ) j=1 modulo acceptance of the associated xt , and use an importance (t) weight associated with an accepted simulation θi (t) (t) (t) ωi ∝ π(θi ) πt (θi ) . ˆ c Still likelihood free [Beaumont et al., 2009]
  • 116. From MCMC to ABC Methods Approximate Bayesian computation Alphabet soup The ABC-PMC algorithm Given a decreasing sequence of approximation levels 1 ≥ ... ≥ T, 1. At iteration t = 1, For i = 1, ..., N (1) (1) Simulate θi ∼ π(θ) and x ∼ f (x|θi ) until (x, y) < 1 (1) Set ωi = 1/N (1) Take τ 2 as twice the empirical variance of the θi ’s 2. At iteration 2 ≤ t ≤ T , For i = 1, ..., N , repeat (t−1) (t−1) Pick θi from the θj ’s with probabilities ωj (t) 2 (t) generate θi |θi ∼ N (θi , σt ) and x ∼ f (x|θi ) until (x, y) < t (t) (t) N (t−1) −1 (t) (t−1) Set ωi ∝ π(θi )/ j=1 ωj ϕ σt θi − θj ) 2 (t) Take τt+1 as twice the weighted empirical variance of the θi ’s
  • 117. From MCMC to ABC Methods Approximate Bayesian computation Alphabet soup Sequential Monte Carlo SMC is a simulation technique to approximate a sequence of related probability distributions πn with π0 “easy” and πT as target. Iterated IS: particles moved from time n to time n via kernel Kn and use of a sequence of extended targets πn˜ n πn (z0:n ) = πn (zn ) ˜ Lj (zj+1 , zj ) j=0 where the Lj ’s are backward Markov kernels [check that πn (zn ) is a marginal] [Del Moral, Doucet & Jasra, Series B, 2006]
  • 118. From MCMC to ABC Methods Approximate Bayesian computation Alphabet soup Sequential Monte Carlo (2) Algorithm 4 SMC sampler (0) sample zi ∼ γ0 (x) (i = 1, . . . , N ) (0) (0) (0) compute weights wi = π0 (zi ))/γ0 (zi ) for t = 1 to N do if ESS(w(t−1) ) < NT then resample N particles z (t−1) and set weights to 1 end if (t−1) (t−1) generate zi ∼ Kt (zi , ·) and set weights to (t) (t) (t−1) (t) (t−1) πt (zi ))Lt−1 (zi ), zi )) wi = Wi−1 (t−1) (t−1) (t) πt−1 (zi ))Kt (zi ), zi )) end for
  • 119. From MCMC to ABC Methods Approximate Bayesian computation Alphabet soup ABC-SMC [Del Moral, Doucet & Jasra, 2009] True derivation of an SMC-ABC algorithm Use of a kernel Kn associated with target π n and derivation of the backward kernel π n (z )Kn (z , z) Ln−1 (z, z ) = πn (z) Update of the weights M m=1 IA (xm ) in win ∝ wi(n−1) M n m=1 IA n−1 (xm i(n−1) ) when xm ∼ K(xi(n−1) , ·) in
  • 120. From MCMC to ABC Methods Approximate Bayesian computation Alphabet soup ABC-SMCM Modification: Makes M repeated simulations of the pseudo-data z given the parameter, rather than using a single [M = 1] simulation, leading to weight that is proportional to the number of accepted zi s M 1 ω(θ) = Iρ(η(y),η(zi ))< M i=1 [limit in M means exact simulation from (tempered) target]
  • 121. From MCMC to ABC Methods Approximate Bayesian computation Alphabet soup Properties of ABC-SMC The ABC-SMC method properly uses a backward kernel L(z, z ) to simplify the importance weight and to remove the dependence on the unknown likelihood from this weight. Update of importance weights is reduced to the ratio of the proportions of surviving particles
  • 122. From MCMC to ABC Methods Approximate Bayesian computation Alphabet soup Properties of ABC-SMC The ABC-SMC method properly uses a backward kernel L(z, z ) to simplify the importance weight and to remove the dependence on the unknown likelihood from this weight. Update of importance weights is reduced to the ratio of the proportions of surviving particles Adaptivity in ABC-SMC algorithm only found in on-line construction of the thresholds t , slowly enough to keep a large number of accepted transitions
  • 123. From MCMC to ABC Methods Approximate Bayesian computation Alphabet soup Semi-automatic ABC Fearnhead and Prangle (2010) study ABC and the selection of the summary statistic in close proximity to Wilkinson’s proposal ABC then considered from a purely inferential viewpoint and calibrated for estimation purposes. Use of a randomised (or ‘noisy’) version of the summary statistics η (y) = η(y) + τ ˜ Derivation of a well-calibrated version of ABC, i.e. an algorithm that gives proper predictions for the distribution associated with this randomised summary statistic.
  • 124. From MCMC to ABC Methods Approximate Bayesian computation Alphabet soup Summary statistics Optimality of the posterior expectations of the parameters of interest as summary statistics!
  • 125. From MCMC to ABC Methods Approximate Bayesian computation Alphabet soup Summary statistics Optimality of the posterior expectations of the parameters of interest as summary statistics! Use of the standard quadratic loss function (θ − θ0 )T A(θ − θ0 ) .
  • 126. From MCMC to ABC Methods Approximate Bayesian computation Calibration of ABC Which summary? Fundamental difficulty of the choice of the summary statistic when there is no non-trivial sufficient statistics [except when done by the experimenters in the field]
  • 127. From MCMC to ABC Methods Approximate Bayesian computation Calibration of ABC Which summary? Fundamental difficulty of the choice of the summary statistic when there is no non-trivial sufficient statistics [except when done by the experimenters in the field] Starting from a large collection of summary statistics is available, Joyce and Marjoram (2008) consider the sequential inclusion into the ABC target, with a stopping rule based on a likelihood ratio test.
  • 128. From MCMC to ABC Methods Approximate Bayesian computation Calibration of ABC Which summary? Fundamental difficulty of the choice of the summary statistic when there is no non-trivial sufficient statistics [except when done by the experimenters in the field] Starting from a large collection of summary statistics is available, Joyce and Marjoram (2008) consider the sequential inclusion into the ABC target, with a stopping rule based on a likelihood ratio test. Does not taking into account the sequential nature of the tests Depends on parameterisation Order of inclusion matters.
  • 129. From MCMC to ABC Methods ABC for model choice ABC for model choice Computational issues in Bayesian statistics The Metropolis-Hastings Algorithm The Gibbs Sampler Approximate Bayesian computation ABC for model choice Model choice Gibbs random fields Model choice via ABC Illustrations Generic ABC model choice
  • 130. From MCMC to ABC Methods ABC for model choice Model choice Bayesian model choice Several models M1 , M2 , . . . are considered simultaneously for a dataset y and the model index M is part of the inference. Use of a prior distribution. π(M = m), plus a prior distribution on the parameter conditional on the value m of the model index, πm (θ m ) Goal is to derive the posterior distribution of M , challenging computational target when models are complex.
  • 131. From MCMC to ABC Methods ABC for model choice Model choice Generic ABC for model choice Algorithm 5 Likelihood-free model choice sampler (ABC-MC) for t = 1 to T do repeat Generate m from the prior π(M = m) Generate θ m from the prior πm (θ m ) Generate z from the model fm (z|θ m ) until ρ{η(z), η(y)} < Set m(t) = m and θ (t) = θ m end for
  • 132. From MCMC to ABC Methods ABC for model choice Model choice ABC estimates Posterior probability π(M = m|y) approximated by the frequency of acceptances from model m T 1 Im(t) =m . T t=1 Issues with implementation: should tolerances be the same for all models? should summary statistics vary across models (incl. their dimension)? should the distance measure ρ vary as well?
  • 133. From MCMC to ABC Methods ABC for model choice Model choice ABC estimates Posterior probability π(M = m|y) approximated by the frequency of acceptances from model m T 1 Im(t) =m . T t=1 Issues with implementation: should tolerances be the same for all models? should summary statistics vary across models (incl. their dimension)? should the distance measure ρ vary as well? Extension to a weighted polychotomous logistic regression estimate of π(M = m|y), with non-parametric kernel weights [Cornuet et al., DIYABC, 2009]
  • 134. From MCMC to ABC Methods ABC for model choice Model choice The Great ABC controversy On-going controvery in phylogeographic genetics about the validity of using ABC for testing Against: Templeton, 2008, 2009, 2010a, 2010b, 2010c argues that nested hypotheses cannot have higher probabilities than nesting hypotheses (!)
  • 135. From MCMC to ABC Methods ABC for model choice Model choice The Great ABC controversy On-going controvery in phylogeographic genetics about the validity of using ABC for testing Replies: Fagundes et al., 2008, Against: Templeton, 2008, Beaumont et al., 2010, Berger et 2009, 2010a, 2010b, 2010c al., 2010, Csill`ry et al., 2010 e argues that nested hypotheses point out that the criticisms are cannot have higher probabilities addressed at [Bayesian] than nesting hypotheses (!) model-based inference and have nothing to do with ABC...
  • 136. From MCMC to ABC Methods ABC for model choice Gibbs random fields Gibbs random fields Gibbs distribution The rv y = (y1 , . . . , yn ) is a Gibbs random field associated with the graph G if 1 f (y) = exp − Vc (yc ) , Z c∈C where Z is the normalising constant, C is the set of cliques of G and Vc is any function also called potential U (y) = c∈C Vc (yc ) is the energy function
  • 137. From MCMC to ABC Methods ABC for model choice Gibbs random fields Gibbs random fields Gibbs distribution The rv y = (y1 , . . . , yn ) is a Gibbs random field associated with the graph G if 1 f (y) = exp − Vc (yc ) , Z c∈C where Z is the normalising constant, C is the set of cliques of G and Vc is any function also called potential U (y) = c∈C Vc (yc ) is the energy function c Z is usually unavailable in closed form
  • 138. From MCMC to ABC Methods ABC for model choice Gibbs random fields Potts model Potts model Vc (y) is of the form Vc (y) = θS(y) = θ δyl =yi l∼i where l∼i denotes a neighbourhood structure
  • 139. From MCMC to ABC Methods ABC for model choice Gibbs random fields Potts model Potts model Vc (y) is of the form Vc (y) = θS(y) = θ δyl =yi l∼i where l∼i denotes a neighbourhood structure In most realistic settings, summation Zθ = exp{θ T S(x)} x∈X involves too many terms to be manageable and numerical approximations cannot always be trusted [Cucala, Marin, CPR & Titterington, 2009]
  • 140. From MCMC to ABC Methods ABC for model choice Model choice via ABC Bayesian Model Choice Comparing a model with potential S0 taking values in Rp0 versus a model with potential S1 taking values in Rp1 can be done through the Bayes factor corresponding to the priors π0 and π1 on each parameter space exp{θ T S0 (x)}/Zθ 0 ,0 π0 (dθ 0 ) 0 Bm0 /m1 (x) = exp{θ T S1 (x)}/Zθ 1 ,1 π1 (dθ 1 ) 1
  • 141. From MCMC to ABC Methods ABC for model choice Model choice via ABC Bayesian Model Choice Comparing a model with potential S0 taking values in Rp0 versus a model with potential S1 taking values in Rp1 can be done through the Bayes factor corresponding to the priors π0 and π1 on each parameter space exp{θ T S0 (x)}/Zθ 0 ,0 π0 (dθ 0 ) 0 Bm0 /m1 (x) = exp{θ T S1 (x)}/Zθ 1 ,1 π1 (dθ 1 ) 1 Use of Jeffreys’ scale to select most appropriate model
  • 142. From MCMC to ABC Methods ABC for model choice Model choice via ABC Neighbourhood relations Choice to be made between M neighbourhood relations m i∼i (0 ≤ m ≤ M − 1) with Sm (x) = I{xi =xi } m i∼i driven by the posterior probabilities of the models.
  • 143. From MCMC to ABC Methods ABC for model choice Model choice via ABC Model index Formalisation via a model index M that appears as a new parameter with prior distribution π(M = m) and π(θ|M = m) = πm (θm )
  • 144. From MCMC to ABC Methods ABC for model choice Model choice via ABC Model index Formalisation via a model index M that appears as a new parameter with prior distribution π(M = m) and π(θ|M = m) = πm (θm ) Computational target: P(M = m|x) ∝ fm (x|θm )πm (θm ) dθm π(M = m) , Θm
  • 145. From MCMC to ABC Methods ABC for model choice Model choice via ABC Sufficient statistics By definition, if S(x) sufficient statistic for the joint parameters (M, θ0 , . . . , θM −1 ), P(M = m|x) = P(M = m|S(x)) .
  • 146. From MCMC to ABC Methods ABC for model choice Model choice via ABC Sufficient statistics By definition, if S(x) sufficient statistic for the joint parameters (M, θ0 , . . . , θM −1 ), P(M = m|x) = P(M = m|S(x)) . For each model m, own sufficient statistic Sm (·) and S(·) = (S0 (·), . . . , SM −1 (·)) also sufficient.
  • 147. From MCMC to ABC Methods ABC for model choice Model choice via ABC Sufficient statistics By definition, if S(x) sufficient statistic for the joint parameters (M, θ0 , . . . , θM −1 ), P(M = m|x) = P(M = m|S(x)) . For each model m, own sufficient statistic Sm (·) and S(·) = (S0 (·), . . . , SM −1 (·)) also sufficient. For Gibbs random fields, 1 2 x|M = m ∼ fm (x|θm ) = fm (x|S(x))fm (S(x)|θm ) 1 = f 2 (S(x)|θm ) n(S(x)) m where n(S(x)) = {˜ ∈ X : S(˜ ) = S(x)} x x c S(x) is therefore also sufficient for the joint parameters [Specific to Gibbs random fields!]
  • 148. From MCMC to ABC Methods ABC for model choice Model choice via ABC ABC model choice Algorithm ABC-MC Generate m∗ from the prior π(M = m). ∗ Generate θm∗ from the prior πm∗ (·). Generate x∗ from the model fm∗ (·|θm∗ ). ∗ Compute the distance ρ(S(x0 ), S(x∗ )). Accept (θm∗ , m∗ ) if ρ(S(x0 ), S(x∗ )) < . ∗ Note When = 0 the algorithm is exact
  • 149. From MCMC to ABC Methods ABC for model choice Model choice via ABC ABC approximation to the Bayes factor Frequency ratio: ˆ P(M = m0 |x0 ) π(M = m1 ) BF m0 /m1 (x0 ) = × ˆ P(M = m1 |x0 ) π(M = m0 ) {mi∗ = m0 } π(M = m1 ) = × , {mi∗ = m1 } π(M = m0 )
  • 150. From MCMC to ABC Methods ABC for model choice Model choice via ABC ABC approximation to the Bayes factor Frequency ratio: ˆ P(M = m0 |x0 ) π(M = m1 ) BF m0 /m1 (x0 ) = × ˆ P(M = m1 |x0 ) π(M = m0 ) {mi∗ = m0 } π(M = m1 ) = × , {mi∗ = m1 } π(M = m0 ) replaced with 1 + {mi∗ = m0 } π(M = m1 ) BF m0 /m1 (x0 ) = × 1 + {mi∗ = m1 } π(M = m0 ) to avoid indeterminacy (also Bayes estimate).
  • 151. From MCMC to ABC Methods ABC for model choice Illustrations Toy example iid Bernoulli model versus two-state first-order Markov chain, i.e. n f0 (x|θ0 ) = exp θ0 I{xi =1} {1 + exp(θ0 )}n , i=1 versus n 1 f1 (x|θ1 ) = exp θ1 I{xi =xi−1 } {1 + exp(θ1 )}n−1 , 2 i=2 with priors θ0 ∼ U (−5, 5) and θ1 ∼ U (0, 6) (inspired by “phase transition” boundaries).
  • 152. From MCMC to ABC Methods ABC for model choice Illustrations Toy example (2) (left) Comparison of the true BF m0 /m1 (x0 ) with BF m0 /m1 (x0 ) (in logs) over 2, 000 simulations and 4.106 proposals from the prior. (right) Same when using tolerance corresponding to the 1% quantile on the distances.
  • 153. From MCMC to ABC Methods ABC for model choice Generic ABC model choice Back to sufficiency If η1 (x) sufficient statistic for model m = 1 and parameter θ1 and η2 (x) sufficient statistic for model m = 2 and parameter θ2 , (η1 (x), η2 (x)) is not always sufficient for (m, θm )
  • 154. From MCMC to ABC Methods ABC for model choice Generic ABC model choice Back to sufficiency If η1 (x) sufficient statistic for model m = 1 and parameter θ1 and η2 (x) sufficient statistic for model m = 2 and parameter θ2 , (η1 (x), η2 (x)) is not always sufficient for (m, θm ) c Potential loss of information at the testing level [X, Cornuet, Marin, and Pillai, 2011]
  • 155. From MCMC to ABC Methods ABC for model choice Generic ABC model choice Limiting behaviour of B12 (T → ∞) ABC approximation T t=1 Imt =1 Iρ{η(zt ),η(y)}≤ B12 (y) = T , t=1 Imt =2 Iρ{η(zt ),η(y)}≤ where the (mt , z t )’s are simulated from the (joint) prior
  • 156. From MCMC to ABC Methods ABC for model choice Generic ABC model choice Limiting behaviour of B12 (T → ∞) ABC approximation T t=1 Imt =1 Iρ{η(zt ),η(y)}≤ B12 (y) = T , t=1 Imt =2 Iρ{η(zt ),η(y)}≤ where the (mt , z t )’s are simulated from the (joint) prior As T go to infinity, limit Iρ{η(z),η(y)}≤ π1 (θ 1 )f1 (z|θ 1 ) dz dθ 1 B12 (y) = Iρ{η(z),η(y)}≤ π2 (θ 2 )f2 (z|θ 2 ) dz dθ 2 η Iρ{η,η(y)}≤ π1 (θ 1 )f1 (η|θ 1 ) dη dθ 1 = η , Iρ{η,η(y)}≤ π2 (θ 2 )f2 (η|θ 2 ) dη dθ 2 η η where f1 (η|θ 1 ) and f2 (η|θ 2 ) distributions of η(z)
  • 157. From MCMC to ABC Methods ABC for model choice Generic ABC model choice Limiting behaviour of B12 ( → 0) When goes to zero, η η π1 (θ 1 )f1 (η(y)|θ 1 ) dθ 1 B12 (y) = η , π2 (θ 2 )f2 (η(y)|θ 2 ) dθ 2
  • 158. From MCMC to ABC Methods ABC for model choice Generic ABC model choice Limiting behaviour of B12 ( → 0) When goes to zero, η η π1 (θ 1 )f1 (η(y)|θ 1 ) dθ 1 B12 (y) = η , π2 (θ 2 )f2 (η(y)|θ 2 ) dθ 2 Bayes factor based on the sole observation of η(y)
  • 159. From MCMC to ABC Methods ABC for model choice Generic ABC model choice Limiting behaviour of B12 (under sufficiency) If η(y) sufficient statistic for both models, fi (y|θ i ) = gi (y)fiη (η(y)|θ i ) Thus η Θ1 π(θ 1 )g1 (y)f1 (η(y)|θ 1 ) dθ 1 B12 (y) = η Θ2 π(θ 2 )g2 (y)f2 (η(y)|θ 2 ) dθ 2 η g1 (y) π1 (θ 1 )f1 (η(y)|θ 1 ) dθ 1 g1 (y) η = η = B (y) . g2 (y) π2 (θ 2 )f2 (η(y)|θ 2 ) dθ 2 g2 (y) 12 [Didelot, Everitt, Johansen & Lawson, 2011]
  • 160. From MCMC to ABC Methods ABC for model choice Generic ABC model choice Limiting behaviour of B12 (under sufficiency) If η(y) sufficient statistic for both models, fi (y|θ i ) = gi (y)fiη (η(y)|θ i ) Thus η Θ1 π(θ 1 )g1 (y)f1 (η(y)|θ 1 ) dθ 1 B12 (y) = η Θ2 π(θ 2 )g2 (y)f2 (η(y)|θ 2 ) dθ 2 η g1 (y) π1 (θ 1 )f1 (η(y)|θ 1 ) dθ 1 g1 (y) η = η = B (y) . g2 (y) π2 (θ 2 )f2 (η(y)|θ 2 ) dθ 2 g2 (y) 12 [Didelot, Everitt, Johansen & Lawson, 2011] c No discrepancy only when cross-model sufficiency
  • 161. From MCMC to ABC Methods ABC for model choice Generic ABC model choice Poisson/geometric example Sample x = (x1 , . . . , xn ) from either a Poisson P(λ) or from a geometric G(p) Then n S= yi = η(x) i=1 sufficient statistic for either model but not simultaneously Discrepancy ratio g1 (x) S!n−S / i yi ! = g2 (x) 1 n+S−1 S
  • 162. From MCMC to ABC Methods ABC for model choice Generic ABC model choice Poisson/geometric discrepancy η Range of B12 (x) versus B12 (x) B12 (x): The values produced have nothing in common.
  • 163. From MCMC to ABC Methods ABC for model choice Generic ABC model choice Formal recovery Creating an encompassing exponential family T T f (x|θ1 , θ2 , α1 , α2 ) ∝ exp{θ1 η1 (x) + θ1 η1 (x) + α1 t1 (x) + α2 t2 (x)} leads to a sufficient statistic (η1 (x), η2 (x), t1 (x), t2 (x)) [Didelot, Everitt, Johansen & Lawson, 2011]
  • 164. From MCMC to ABC Methods ABC for model choice Generic ABC model choice Formal recovery Creating an encompassing exponential family T T f (x|θ1 , θ2 , α1 , α2 ) ∝ exp{θ1 η1 (x) + θ1 η1 (x) + α1 t1 (x) + α2 t2 (x)} leads to a sufficient statistic (η1 (x), η2 (x), t1 (x), t2 (x)) [Didelot, Everitt, Johansen & Lawson, 2011] In the Poisson/geometric case, if i xi ! is added to S, no discrepancy
  • 165. From MCMC to ABC Methods ABC for model choice Generic ABC model choice Formal recovery Creating an encompassing exponential family T T f (x|θ1 , θ2 , α1 , α2 ) ∝ exp{θ1 η1 (x) + θ1 η1 (x) + α1 t1 (x) + α2 t2 (x)} leads to a sufficient statistic (η1 (x), η2 (x), t1 (x), t2 (x)) [Didelot, Everitt, Johansen & Lawson, 2011] Only applies in genuine sufficiency settings... c Inability to evaluate loss brought by summary statistics
  • 166. From MCMC to ABC Methods ABC for model choice Generic ABC model choice Meaning of the ABC-Bayes factor In the Poisson/geometric case, if E[yi ] = θ0 > 0, η (θ0 + 1)2 −θ0 lim B12 (y) = e n→∞ θ0
  • 167. From MCMC to ABC Methods ABC for model choice Generic ABC model choice MA(q) divergence Evolution [against ] of ABC Bayes factor, in terms of frequencies of visits to models MA(1) (left) and MA(2) (right) when equal to 10, 1, .1, .01% quantiles on insufficient autocovariance distances. Sample of 50 points from a MA(2) with θ1 = 0.6, θ2 = 0.2. True Bayes factor equal to 17.71.
  • 168. From MCMC to ABC Methods ABC for model choice Generic ABC model choice MA(q) divergence Evolution [against ] of ABC Bayes factor, in terms of frequencies of visits to models MA(1) (left) and MA(2) (right) when equal to 10, 1, .1, .01% quantiles on insufficient autocovariance distances. Sample of 50 points from a MA(1) model with θ1 = 0.6. True Bayes factor B21 equal to .004.
  • 169. From MCMC to ABC Methods ABC for model choice Generic ABC model choice A population genetics evaluation Population genetics example with 3 populations 2 scenari 15 individuals 5 loci single mutation parameter
  • 170. From MCMC to ABC Methods ABC for model choice Generic ABC model choice A population genetics evaluation Population genetics example with 3 populations 2 scenari 15 individuals 5 loci single mutation parameter 24 summary statistics 2 million ABC proposal importance [tree] sampling alternative
  • 171. From MCMC to ABC Methods ABC for model choice Generic ABC model choice Stability of importance sampling
  • 172. From MCMC to ABC Methods ABC for model choice Generic ABC model choice Comparison with ABC Use of 24 summary statistics and DIY-ABC logistic correction
  • 173. From MCMC to ABC Methods ABC for model choice Generic ABC model choice Comparison with ABC Use of 15 summary statistics and DIY-ABC logistic correction
  • 174. From MCMC to ABC Methods ABC for model choice Generic ABC model choice Comparison with ABC Use of 24 summary statistics and DIY-ABC logistic correction
  • 175. From MCMC to ABC Methods ABC for model choice Generic ABC model choice The only safe cases Besides specific models like Gibbs random fields, using distances over the data itself escapes the discrepancy... [Toni & Stumpf, 2010; Sousa et al., 2009]
  • 176. From MCMC to ABC Methods ABC for model choice Generic ABC model choice The only safe cases Besides specific models like Gibbs random fields, using distances over the data itself escapes the discrepancy... [Toni & Stumpf, 2010; Sousa et al., 2009] ...and so does the use of more informal model fitting measures [Ratmann et al., 2009, 2011]