SlideShare ist ein Scribd-Unternehmen logo
1 von 111
Downloaden Sie, um offline zu lesen
MCMC and likelihood-free methods




                       MCMC and likelihood-free methods

                                           Christian P. Robert

                                   Universit´ Paris-Dauphine, IUF, & CREST
                                            e


                        Universit´ de Besan¸on, November 22, 2012
                                 e         c
MCMC and likelihood-free methods
  Computational issues in Bayesian cosmology




Computational issues in Bayesian cosmology



   Computational issues in Bayesian
   cosmology

   The Metropolis-Hastings Algorithm

   The Gibbs Sampler

   Approximate Bayesian computation
MCMC and likelihood-free methods
  Computational issues in Bayesian cosmology




Statistical problems in cosmology

              Potentially high dimensional parameter space [Not considered
              here]
              Immensely slow computation of likelihoods, e.g WMAP, CMB,
              because of numerically costly spectral transforms [Data is a
              Fortran program]
              Nonlinear dependence and degeneracies between parameters
              introduced by physical constraints or theoretical assumptions
MCMC and likelihood-free methods
  Computational issues in Bayesian cosmology




Cosmological data




      Posterior distribution of cosmological parameters for recent
      observational data of CMB anisotropies (differences in temperature
      from directions) [WMAP], SNIa, and cosmic shear.
      Combination of three likelihoods, some of which are available as
      public (Fortran) code, and of a uniform prior on a hypercube.
MCMC and likelihood-free methods
  Computational issues in Bayesian cosmology




Cosmology parameters

      Parameters for the cosmology likelihood
      (C=CMB, S=SNIa, L=lensing)

                   Symbol      Description                    Minimum   Maximum    Experiment
                   Ωb          Baryon density                 0.01      0.1       C          L
                   Ωm          Total matter density           0.01      1.2       C    S     L
                   w           Dark-energy eq. of state       -3.0      0.5       C    S     L
                   ns          Primordial spectral index      0.7       1.4       C          L
                   ∆2R         Normalization (large scales)                       C
                   σ8          Normalization (small scales)                       C          L
                   h           Hubble constant                                    C          L
                   τ           Optical depth                                      C
                   M           Absolute SNIa magnitude                                 S
                   α           Colour response                                         S
                   β           Stretch response                                        S
                   a                                                                         L
                   b           galaxy z-distribution fit                                      L
                   c                                                                         L


      For WMAP5, σ8 is a deduced quantity that depends on the other parameters
MCMC and likelihood-free methods
  Computational issues in Bayesian cosmology




Adaptation of importance function




                                               [Benabed et al., MNRAS, 2010]
MCMC and likelihood-free methods
  Computational issues in Bayesian cosmology




Estimates
                                   Parameter   PMC              MCMC


                                   Ωb          0.0432+0.0027
                                                     −0.0024    0.0432+0.0026
                                                                      −0.0023
                                   Ωm          0.254+0.018
                                                    −0.017      0.253+0.018
                                                                     −0.016
                                   τ           0.088+0.018
                                                    −0.016      0.088+0.019
                                                                     −0.015
                                   w           −1.011 ± 0.060   −1.010+0.059
                                                                      −0.060
                                   ns          0.963+0.015
                                                    −0.014      0.963+0.015
                                                                     −0.014
                                   109 ∆2
                                        R      2.413+0.098
                                                    −0.093      2.414+0.098
                                                                     −0.092
                                   h           0.720+0.022
                                                    −0.021      0.720+0.023
                                                                     −0.021
                                   a           0.648+0.040
                                                    −0.041      0.649+0.043
                                                                     −0.042
                                   b           9.3+1.4
                                                  −0.9          9.3+1.7
                                                                   −0.9
                                   c           0.639+0.084
                                                    −0.070      0.639+0.082
                                                                     −0.070
                                   −M          19.331 ± 0.030   19.332+0.029
                                                                      −0.031
                                   α           1.61+0.15
                                                   −0.14        1.62+0.16
                                                                    −0.14
                                   −β          −1.82+0.17
                                                    −0.16       −1.82 ± 0.16
                                   σ8          0.795+0.028
                                                    −0.030      0.795+0.030
                                                                     −0.027



      Means and 68% credible intervals using lensing, SNIa and CMB
MCMC and likelihood-free methods
  Computational issues in Bayesian cosmology




Evidence/Marginal likelihood/Integrated Likelihood ...

      Central quantity of interest in (Bayesian) model choice

                                                  π(x)
                                   E = π(x)dx =        q(x)dx.
                                                  q(x)

      expressed as an expectation under any density q with large enough
      support.
MCMC and likelihood-free methods
  Computational issues in Bayesian cosmology




Evidence/Marginal likelihood/Integrated Likelihood ...

      Central quantity of interest in (Bayesian) model choice

                                                           π(x)
                                   E = π(x)dx =                 q(x)dx.
                                                           q(x)

      expressed as an expectation under any density q with large enough
      support.
      Importance sampling provides a sample x1 , . . . xN ∼ q and
      approximation of the above integral,
                                                      N
                                                E≈         wn
                                                     n=1

                                   π(xn )
      where the wn =               q(xn )   are the (unnormalised) importance weights.
MCMC and likelihood-free methods
  Computational issues in Bayesian cosmology




Back to cosmology questions


      Standard cosmology successful in explaining recent observations,
      such as CMB, SNIa, galaxy clustering, cosmic shear, galaxy cluster
      counts, and Lyα forest clustering.
      Flat ΛCDM model with only six free parameters
                         (Ωm , Ωb , h, ns , τ, σ8 )
MCMC and likelihood-free methods
  Computational issues in Bayesian cosmology




Back to cosmology questions


      Standard cosmology successful in explaining recent observations,
      such as CMB, SNIa, galaxy clustering, cosmic shear, galaxy cluster
      counts, and Lyα forest clustering.
      Flat ΛCDM model with only six free parameters
                         (Ωm , Ωb , h, ns , τ, σ8 )
      Extensions to ΛCDM may be based on independent evidence
      (massive neutrinos from oscillation experiments), predicted by
      compelling hypotheses (primordial gravitational waves from
      inflation) or reflect ignorance about fundamental physics
      (dynamical dark energy).
      Testing for dark energy, curvature, and inflationary models
MCMC and likelihood-free methods
  Computational issues in Bayesian cosmology




Extended models


      Focus on the dark energy equation-of-state parameter, modeled as

                         w = −1                      ΛCDM
                         w = w0                      wCDM
                         w = w0 + w1 (1 − a)      w(z)CDM

      In addition, curvature parameter ΩK for each of the above is either
      ΩK = 0 (‘flat’) or ΩK = 0 (‘curved’).
      Choice of models represents simplest models beyond a
      “cosmological constant” model able to explain the observed,
      recent accelerated expansion of the Universe.
MCMC and likelihood-free methods
  Computational issues in Bayesian cosmology




Cosmology priors


      Prior ranges for dark energy and curvature models. In case of
      w(a) models, the prior on w1 depends on w0
       Parameter Description                    Min.       Max.
       Ωm            Total matter density       0.15       0.45
       Ωb            Baryon density             0.01       0.08
       h             Hubble parameter           0.5        0.9
       ΩK            Curvature                  −1         1
       w0            Constant dark-energy par. −1          −1/3
       w1            Linear dark-energy par.    −1 − w0 −1/3−w0
                                                             1−aacc
MCMC and likelihood-free methods
  Computational issues in Bayesian cosmology




Results


      In most cases evidence in favour of the standard model. especially
      when more datasets/experiments are combined.
      Largest evidence is ln B12 = 1.8, for the w(z)CDM model and
      CMB alone. Case where a large part of the prior range is still
      allowed by the data, and a region of comparable size is excluded.
      Hence weak evidence that both w0 and w1 are required, but
      excluded when adding SNIa and BAO datasets.
      Results on the curvature are compatible with current findings:
      non-flat Universe(s) strongly disfavoured for the three dark-energy
      cases.
MCMC and likelihood-free methods
  Computational issues in Bayesian cosmology




Evidence
MCMC and likelihood-free methods
  Computational issues in Bayesian cosmology




Posterior outcome
      Posterior on dark-energy parameters w0 and w1 as 68%- and 95% credible regions for
      WMAP (solid blue lines), WMAP+SNIa (dashed green) and WMAP+SNIa+BAO
      (dotted red curves). Allowed prior range as red straight lines.
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm




The Metropolis-Hastings Algorithm



   Computational issues in Bayesian
   cosmology

   The Metropolis-Hastings Algorithm

   The Gibbs Sampler

   Approximate Bayesian computation
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Monte Carlo basics


General purpose



      A major computational issue in Bayesian statistics:

      Given a density π known up to a normalizing constant, and an
      integrable function h, compute

                                                       ˜
                                                   h(x)π(x)µ(dx)
                          Π(h) = h(x)π(x)µ(dx) =
                                                     ˜
                                                     π(x)µ(dx)

      when            ˜
                  h(x)π(x)µ(dx) is intractable.
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Monte Carlo basics


Monte Carlo 101

      Generate an iid sample x1 , . . . , xN from π and estimate Π(h) by
                                                      N
                                      ΠMC (h) = N−1
                                      ^N                    h(xi ).
                                                      i=1

                               as
      LLN: ΠMC (h) −→ Π(h)
            ^N
      If Π(h2 ) = h2 (x)π(x)µ(dx) < ∞,
                          √                       L
             CLT:          N ΠMC (h) − Π(h)
                             ^N                        N 0, Π [h − Π(h)]2   .
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Monte Carlo basics


Monte Carlo 101

      Generate an iid sample x1 , . . . , xN from π and estimate Π(h) by
                                                       N
                                      ΠMC (h) = N−1
                                      ^N                    h(xi ).
                                                      i=1

                               as
      LLN: ΠMC (h) −→ Π(h)
            ^N
      If Π(h2 ) = h2 (x)π(x)µ(dx) < ∞,
                          √                        L
             CLT:          N ΠMC (h) − Π(h)
                             ^N                        N 0, Π [h − Π(h)]2   .


      Caveat conducting to                  MCMC


      Often impossible or inefficient to simulate directly from Π
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Monte Carlo Methods based on Markov Chains


Running Monte Carlo via Markov Chains (MCMC)




      It is not necessary to use a sample from the distribution f to
      approximate the integral

                                          I=      h(x)f(x)dx ,
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Monte Carlo Methods based on Markov Chains


Running Monte Carlo via Markov Chains (MCMC)



      It is not necessary to use a sample from the distribution f to
      approximate the integral

                                          I=      h(x)f(x)dx ,


                                                     [notation warnin: π turned to f!]
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Monte Carlo Methods based on Markov Chains


Running Monte Carlo via Markov Chains (MCMC)

      It is not necessary to use a sample from the distribution f to
      approximate the integral

                                          I=      h(x)f(x)dx ,



   We can obtain X1 , . . . , Xn ∼ f (approx)
   without directly simulating from f,
   using an ergodic Markov chain with
   stationary distribution f
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Monte Carlo Methods based on Markov Chains


Running Monte Carlo via Markov Chains (MCMC)

      It is not necessary to use a sample from the distribution f to
      approximate the integral

                                          I=      h(x)f(x)dx ,




   We can obtain X1 , . . . , Xn ∼ f (approx)
   without directly simulating from f,
   using an ergodic Markov chain with
   stationary distribution f


                                                                 Andre¨ Markov
                                                                      ı
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Monte Carlo Methods based on Markov Chains


Running Monte Carlo via Markov Chains (2)




      Idea
      For an arbitrary starting value x(0) , an ergodic chain (X(t) ) is
      generated using a transition kernel with stationary distribution f
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Monte Carlo Methods based on Markov Chains


Running Monte Carlo via Markov Chains (2)

      Idea
      For an arbitrary starting value x(0) , an ergodic chain (X(t) ) is
      generated using a transition kernel with stationary distribution f

              irreducible Markov chain with stationary distribution f is
              ergodic with limiting distribution f under weak conditions
              hence convergence in distribution of (X(t) ) to a random
              variable from f.
              for T0 “large enough” T0 , X(T0 ) distributed from f
              Markov sequence is dependent sample X(T0 ) , X(T0 +1) , . . .
              generated from f
              Birkoff’s ergodic theorem extends LLN, sufficient for most
              approximation purposes
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Monte Carlo Methods based on Markov Chains


Running Monte Carlo via Markov Chains (2)




      Idea
      For an arbitrary starting value x(0) , an ergodic chain (X(t) ) is
      generated using a transition kernel with stationary distribution f


      Problem: How can one build a Markov chain with a given
      stationary distribution?
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     The Metropolis–Hastings algorithm


The Metropolis–Hastings algorithm


   Arguments: The algorithm uses the
   objective (target) density

                                   f

   and a conditional density

                              q(y|x)

   called the instrumental (or proposal)   Nicholas Metropolis
   distribution
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     The Metropolis–Hastings algorithm


The MH algorithm

      Algorithm (Metropolis–Hastings)
      Given x(t) ,
         1. Generate Yt ∼ q(y|x(t) ).
         2. Take

                                            Yt     with prob. ρ(x(t) , Yt ),
                            X(t+1) =
                                            x(t)   with prob. 1 − ρ(x(t) , Yt ),

              where
                                                      f(y) q(x|y)
                                      ρ(x, y) = min               ,1      .
                                                      f(x) q(y|x)
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     The Metropolis–Hastings algorithm


Features



              Independent of normalizing constants for both f and q(·|x)
              (ie, those constants independent of x)
              Never move to values with f(y) = 0
              The chain (x(t) )t may take the same value several times in a
              row, even though f is a density wrt Lebesgue measure
              The sequence (yt )t is usually not a Markov chain
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     The Metropolis–Hastings algorithm


Convergence properties


         1. The M-H Markov chain is reversible, with
            invariant/stationary density f since it satisfies the detailed
            balance condition
                            f(y) K(y, x) = f(x) K(x, y)
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     The Metropolis–Hastings algorithm


Convergence properties


         1. The M-H Markov chain is reversible, with
            invariant/stationary density f since it satisfies the detailed
            balance condition
                            f(y) K(y, x) = f(x) K(x, y)
         2. As f is a probability measure, the chain is positive recurrent
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     The Metropolis–Hastings algorithm


Convergence properties


         1. The M-H Markov chain is reversible, with
            invariant/stationary density f since it satisfies the detailed
            balance condition
                            f(y) K(y, x) = f(x) K(x, y)
         2. As f is a probability measure, the chain is positive recurrent
         3. If
                                            f(Yt ) q(X(t) |Yt )
                                      Pr                           1 < 1.   (1)
                                           f(X(t) ) q(Yt |X(t) )
              that is, the event {X(t+1) = X(t) } is possible, then the chain is
              aperiodic
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Random-walk Metropolis-Hastings algorithms


Random walk Metropolis–Hastings



      Use of a local perturbation as proposal

                                                  Yt = X(t) + εt ,

       where εt ∼ g, independent of X(t) .
      The instrumental density is of the form g(y − x) and the Markov
      chain is a random walk if we take g to be symmetric g(x) = g(−x)
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Random-walk Metropolis-Hastings algorithms


Random walk Metropolis–Hastings [code]


      Algorithm (Random walk Metropolis)
      Given x(t)
         1. Generate Yt ∼ g(y − x(t) )
         2. Take
                                        
                                        Y                             f(Yt )
                             (t+1)          t     with prob. min 1,            ,
                         X            =                               f(x(t) )
                                         (t)
                                          x       otherwise.
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Extensions


Langevin Algorithms


      Proposal based on the Langevin diffusion Lt is defined by the
      stochastic differential equation
                                                    1
                                      dLt = dBt +       log f(Lt )dt,
                                                    2
       where Bt is the standard Brownian motion
      Theorem
      The Langevin diffusion is the only non-explosive diffusion which is
      reversible with respect to f.
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Extensions


Discretization


      Instead, consider the sequence

                                      σ2
              x(t+1) = x(t) +              log f(x(t) ) + σεt ,   εt ∼ Np (0, Ip )
                                      2
       where σ2 corresponds to the discretization step
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Extensions


Discretization


      Instead, consider the sequence

                                       σ2
              x(t+1) = x(t) +                log f(x(t) ) + σεt ,   εt ∼ Np (0, Ip )
                                       2
      where σ2 corresponds to the discretization step
      Unfortunately, the discretized chain may be transient, for instance
      when
                          lim σ2 log f(x)|x|−1 > 1
                                      x→±∞
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Extensions


MH correction


      Accept the new value Yt with probability
                                                                          2
                                                  σ2
                         exp − Yt − x(t) −         2       log f(x(t) )       2σ2
            f(Yt )
                    ·                                                               ∧1.
           f(x(t) )                                   σ2
                                                                          2
                          exp −       x(t)   − Yt −    2   log f(Yt )         2σ2

      Choice of the scaling factor σ
      Should lead to an acceptance rate of 0.574 to achieve optimal
      convergence rates (when the components of x are uncorrelated)
              [Roberts & Rosenthal, 1998; Girolami & Calderhead, 2011]
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Extensions


Optimizing the Acceptance Rate


      Problem of choosing the transition q kernel from a practical point
      of view
      Most common solutions:
       (a) a fully automated algorithm like ARMS;
                                                            [Gilks & Wild, 1992]
       (b) an instrumental density g which approximates f, such that
           f/g is bounded for uniform ergodicity to apply;
       (c) a random walk
      In both cases (b) and (c), the choice of g is critical,
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Extensions


Case of the random walk


      Different approach to acceptance rates
      A high acceptance rate does not indicate that the algorithm is
      moving correctly since it indicates that the random walk is moving
      too slowly on the surface of f.
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Extensions


Case of the random walk


      Different approach to acceptance rates
      A high acceptance rate does not indicate that the algorithm is
      moving correctly since it indicates that the random walk is moving
      too slowly on the surface of f.
      If x(t) and yt are close, i.e. f(x(t) ) f(yt ) y is accepted with
      probability
                                      f(yt )
                             min              ,1   1.
                                     f(x(t) )
      For multimodal densities with well separated modes, the negative
      effect of limited moves on the surface of f clearly shows.
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Extensions


Case of the random walk (2)




      If the average acceptance rate is low, the successive values of f(yt )
      tend to be small compared with f(x(t) ), which means that the
      random walk moves quickly on the surface of f since it often
      reaches the “borders” of the support of f
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Extensions


Rule of thumb



      In small dimensions, aim at an average acceptance rate of
      50%. In large dimensions, at an average acceptance rate of
      25%.
                                    [Gelman,Gilks and Roberts, 1995]
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Extensions


Rule of thumb



      In small dimensions, aim at an average acceptance rate of
      50%. In large dimensions, at an average acceptance rate of
      25%.
                                    [Gelman,Gilks and Roberts, 1995]


      warnin: rule to be taken with a pinch of salt!
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Extensions


Role of scale


      Example (Noisy AR(1))
      Hidden Markov chain from a regular AR(1) model,

                              xt+1 = ϕxt +   t+1       t   ∼ N(0, τ2 )

      and observables
                                      yt |xt ∼ N(x2 , σ2 )
                                                  t
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Extensions


Role of scale


      Example (Noisy AR(1))
      Hidden Markov chain from a regular AR(1) model,

                              xt+1 = ϕxt +   t+1       t   ∼ N(0, τ2 )

      and observables
                                      yt |xt ∼ N(x2 , σ2 )
                                                  t

      The distribution of xt given xt−1 , xt+1 and yt is

                  −1                                             τ2
           exp             (xt − ϕxt−1 )2 + (xt+1 − ϕxt )2 +        (yt − x2 )2
                                                                           t      .
                  2τ2                                            σ2
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Extensions


Role of scale




      Example (Noisy AR(1) continued)
      For a Gaussian random walk with scale ω small enough, the
      random walk never jumps to the other mode. But if the scale ω is
      sufficiently large, the Markov chain explores both modes and give a
      satisfactory approximation of the target distribution.
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Extensions


Role of scale




      Markov chain based on a random walk with scale ω = .1.
MCMC and likelihood-free methods
  The Metropolis-Hastings Algorithm
     Extensions


Role of scale




      Markov chain based on a random walk with scale ω = .5.
MCMC and likelihood-free methods
  The Gibbs Sampler




The Gibbs Sampler



   Computational issues in Bayesian
   cosmology

   The Metropolis-Hastings Algorithm

   The Gibbs Sampler

   Approximate Bayesian computation
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


General Principles


      A very specific simulation algorithm based on the target
      distribution f:
         1. Uses the conditional densities f1 , . . . , fp from f
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


General Principles


      A very specific simulation algorithm based on the target
      distribution f:
         1. Uses the conditional densities f1 , . . . , fp from f
         2. Start with the random variable X = (X1 , . . . , Xp )
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


General Principles


      A very specific simulation algorithm based on the target
      distribution f:
         1. Uses the conditional densities f1 , . . . , fp from f
         2. Start with the random variable X = (X1 , . . . , Xp )
         3. Simulate from the conditional densities,

                              Xi |x1 , x2 , . . . , xi−1 , xi+1 , . . . , xp
                                    ∼ fi (xi |x1 , x2 , . . . , xi−1 , xi+1 , . . . , xp )

              for i = 1, 2, . . . , p.
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Gibbs code


      Algorithm (Gibbs sampler)
                                (t)           (t)
      Given x(t) = (x1 , . . . , xp ), generate
              (t+1)                   (t)           (t)
      1. X1               ∼ f1 (x1 |x2 , . . . , xp );
              (t+1)                   (t+1)     (t)           (t)
      2. X2               ∼ f2 (x2 |x1        , x3 , . . . , xp ),
                    ...
            (t+1)                     (t+1)               (t+1)
      p.   Xp             ∼ fp (xp |x1        , . . . , xp−1 )


                                                X(t+1) → X ∼ f
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Properties




      The full conditionals densities f1 , . . . , fp are the only densities used
      for simulation. Thus, even in a high dimensional problem, all of
      the simulations may be univariate
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


toy example: iid N(µ, σ2 ) variates

                                   iid
      When Y1 , . . . , Yn ∼ N(y|µ, σ2 ) with both µ and σ unknown, the
      posterior in (µ, σ2 ) is conjugate outside a standard familly
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


toy example: iid N(µ, σ2 ) variates

                                   iid
      When Y1 , . . . , Yn ∼ N(y|µ, σ2 ) with both µ and σ unknown, the
      posterior in (µ, σ2 ) is conjugate outside a standard familly

      But...
                                               n        σ2
           µ|Y 0:n , σ2 ∼ N µ              1
                                           n   i=1 Yi , n )

           σ2 |Y 1:n , µ ∼ IG            σ2 n − 1, 2 n (Yi
                                            2
                                                   1
                                                         i=1   − µ)2
      assuming constant (improper) priors on both µ and σ2

              Hence we may use the Gibbs sampler for simulating from the
              posterior of (µ, σ2 )
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


toy example: R code


      Gibbs Sampler for Gaussian posterior
               n = length(Y);
               S = sum(Y);
               mu = S/n;
               for (i in 1:500)
                  S2 = sum((Y-mu)^2);
                  sigma2 = 1/rgamma(1,n/2-1,S2/2);
                  mu = S/n + sqrt(sigma2/n)*rnorm(1);
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Example of results with n = 10 observations from the
N(0, 1) distribution




      Number of Iterations 1
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Example of results with n = 10 observations from the
N(0, 1) distribution




      Number of Iterations 1, 2
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Example of results with n = 10 observations from the
N(0, 1) distribution




      Number of Iterations 1, 2, 3
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Example of results with n = 10 observations from the
N(0, 1) distribution




      Number of Iterations 1, 2, 3, 4
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Example of results with n = 10 observations from the
N(0, 1) distribution




      Number of Iterations 1, 2, 3, 4, 5
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Example of results with n = 10 observations from the
N(0, 1) distribution




      Number of Iterations 1, 2, 3, 4, 5, 10
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Example of results with n = 10 observations from the
N(0, 1) distribution




      Number of Iterations 1, 2, 3, 4, 5, 10, 25
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Example of results with n = 10 observations from the
N(0, 1) distribution




      Number of Iterations 1, 2, 3, 4, 5, 10, 25, 50
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Example of results with n = 10 observations from the
N(0, 1) distribution




      Number of Iterations 1, 2, 3, 4, 5, 10, 25, 50, 100
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Example of results with n = 10 observations from the
N(0, 1) distribution




      Number of Iterations 1, 2, 3, 4, 5, 10, 25, 50, 100, 500
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Limitations of the Gibbs sampler


      Formally, a special case of a sequence of 1-D M-H kernels, all with
      acceptance rate uniformly equal to 1.
      The Gibbs sampler
         1. limits the choice of instrumental distributions
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Limitations of the Gibbs sampler


      Formally, a special case of a sequence of 1-D M-H kernels, all with
      acceptance rate uniformly equal to 1.
      The Gibbs sampler
         1. limits the choice of instrumental distributions
         2. requires some knowledge of f
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Limitations of the Gibbs sampler


      Formally, a special case of a sequence of 1-D M-H kernels, all with
      acceptance rate uniformly equal to 1.
      The Gibbs sampler
         1. limits the choice of instrumental distributions
         2. requires some knowledge of f
         3. is, by construction, multidimensional
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Limitations of the Gibbs sampler


      Formally, a special case of a sequence of 1-D M-H kernels, all with
      acceptance rate uniformly equal to 1.
      The Gibbs sampler
         1. limits the choice of instrumental distributions
         2. requires some knowledge of f
         3. is, by construction, multidimensional
         4. does not apply to problems where the number of parameters
            varies as the resulting chain is not irreducible.
MCMC and likelihood-free methods
  The Gibbs Sampler
        General Principles


A wee problem
        4
        3
        2
   µ2

        1
        0
        −1




             −1       0      1        2   3   4

                                 µ1

             Gibbs started at random
MCMC and likelihood-free methods
  The Gibbs Sampler
        General Principles


A wee problem


                                                  Gibbs stuck at the wrong mode
        4
        3




                                                       3
        2




                                                       2
   µ2

        1




                                                  µ2

                                                       1
        0




                                                       0
        −1




                                                       −1




             −1       0      1        2   3   4

                                 µ1

             Gibbs started at random                        −1   0   1    2   3

                                                                     µ1
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Slice sampler as generic Gibbs


      If f(θ) can be written as a product
                                    k
                                         fi (θ),
                                   i=1
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Slice sampler as generic Gibbs


      If f(θ) can be written as a product
                                          k
                                               fi (θ),
                                         i=1

       it can be completed as
                                    k
                                         I0    ωi fi (θ) ,
                                   i=1

      leading to the following Gibbs algorithm:
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Slice sampler (code)


      Algorithm (Slice sampler)
      Simulate
                  (t+1)
         1. ω1                  ∼ U[0,f1 (θ(t) )] ;
                          ...
                (t+1)
         k.   ωk      ∼ U[0,fk (θ(t) )] ;
    k+1.      θ(t+1) ∼ UA(t+1) , with
                                                         (t+1)
                                  A(t+1) = {y; fi (y)   ωi       , i = 1, . . . , k}.
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Example of results with a truncated N(−3, 1) distribution



                                       0.010
                                       0.008
                                       0.006
                                   y

                                       0.004
                                       0.002
                                       0.000




                                               0.0   0.2   0.4       0.6   0.8   1.0

                                                                 x




      Number of Iterations 2
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Example of results with a truncated N(−3, 1) distribution



                                       0.010
                                       0.008
                                       0.006
                                   y

                                       0.004
                                       0.002
                                       0.000




                                               0.0   0.2   0.4       0.6   0.8   1.0

                                                                 x




      Number of Iterations 2, 3
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Example of results with a truncated N(−3, 1) distribution



                                       0.010
                                       0.008
                                       0.006
                                   y

                                       0.004
                                       0.002
                                       0.000




                                               0.0   0.2   0.4       0.6   0.8   1.0

                                                                 x




      Number of Iterations 2, 3, 4
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Example of results with a truncated N(−3, 1) distribution



                                       0.010
                                       0.008
                                       0.006
                                   y

                                       0.004
                                       0.002
                                       0.000




                                               0.0   0.2   0.4       0.6   0.8   1.0

                                                                 x




      Number of Iterations 2, 3, 4, 5
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Example of results with a truncated N(−3, 1) distribution




                                       0.010
                                       0.008
                                       0.006
                                   y

                                       0.004
                                       0.002
                                       0.000




                                               0.0   0.2   0.4       0.6   0.8   1.0

                                                                 x




      Number of Iterations 2, 3, 4, 5, 10
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Example of results with a truncated N(−3, 1) distribution




                                       0.010
                                       0.008
                                       0.006
                                   y

                                       0.004
                                       0.002
                                       0.000




                                               0.0   0.2   0.4       0.6   0.8   1.0

                                                                 x




      Number of Iterations 2, 3, 4, 5, 10, 50
MCMC and likelihood-free methods
  The Gibbs Sampler
     General Principles


Example of results with a truncated N(−3, 1) distribution




                                       0.010
                                       0.008
                                       0.006
                                   y

                                       0.004
                                       0.002
                                       0.000




                                               0.0   0.2   0.4       0.6   0.8   1.0

                                                                 x




      Number of Iterations 2, 3, 4, 5, 10, 50, 100
MCMC and likelihood-free methods
  Approximate Bayesian computation




Approximate Bayesian computation



   Computational issues in Bayesian
   cosmology

   The Metropolis-Hastings Algorithm

   The Gibbs Sampler

   Approximate Bayesian computation
MCMC and likelihood-free methods
  Approximate Bayesian computation
     ABC basics


Regular Bayesian computation issues


      Recap’: When faced with a non-standard posterior distribution

                                     π(θ|y) ∝ π(θ)L(θ|y)

      the standard solution is to use simulation (Monte Carlo) to
      produce a sample
                                   θ1 , . . . , θT
      from π(θ|y) (or approximately by Markov chain Monte Carlo
      methods)
                                              [Robert & Casella, 2004]
MCMC and likelihood-free methods
  Approximate Bayesian computation
     ABC basics


Untractable likelihoods



      Cases when the likelihood function f(y|θ) is unavailable (in
      analytic and numerical senses) and when the completion step

                                     f(y|θ) =       f(y, z|θ) dz
                                                Z

      is impossible or too costly because of the dimension of z
       c MCMC cannot be implemented!
MCMC and likelihood-free methods
  Approximate Bayesian computation
     ABC basics


Illustration


   Phylogenetic tree: in population
   genetics, reconstitution of a common
   ancestor from a sample of genes via
   a phylogenetic tree that is close to
   impossible to integrate out
   [100 processor days with 4
   parameters]
                                    [Cornuet et al., 2009, Bioinformatics]
MCMC and likelihood-free methods
  Approximate Bayesian computation
     ABC basics


Illustration !""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03!
                     1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+

          Différents scénarios possibles, choix de scenario par ABC
      demo-genetic inference

      Genetic model of evolution from a
      common ancestor (MRCA)
      characterized by a set of parameters
      that cover historical, demographic, and
      genetic factors
      Dataset of polymorphism (DNA sample)
      observed at the present time

                  Le scenario 1a est largement soutenu par rapport aux
                  autres ! plaide pour une origine commune des
                                                                     Verdu et al. 2009
                  populations pygmées d’Afrique de l’Ouest
                                                                                97
MCMC and likelihood-free methods
  Approximate Bayesian computation
     ABC basics


Illustration

                                         !""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03!
                                         1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+
      Pygmies population demo-genetics
      Pygmies populations: do they
      have a common origin? when
      and how did they split from
      non-pygmies populations? were
      there more recent interactions
      between pygmies and
      non-pygmies populations?
                                                                                 94
MCMC and likelihood-free methods
  Approximate Bayesian computation
     ABC basics


The ABC method

      Bayesian setting: target is π(θ)f(x|θ)
MCMC and likelihood-free methods
  Approximate Bayesian computation
     ABC basics


The ABC method

      Bayesian setting: target is π(θ)f(x|θ)
      When likelihood f(x|θ) not in closed form, likelihood-free rejection
      technique:
MCMC and likelihood-free methods
  Approximate Bayesian computation
     ABC basics


The ABC method

      Bayesian setting: target is π(θ)f(x|θ)
      When likelihood f(x|θ) not in closed form, likelihood-free rejection
      technique:
      ABC algorithm
      For an observation y ∼ f(y|θ), under the prior π(θ), keep jointly
      simulating
                            θ ∼ π(θ) , z ∼ f(z|θ ) ,
      until the auxiliary variable z is equal to the observed value, z = y.

                                                      [Tavar´ et al., 1997]
                                                            e
MCMC and likelihood-free methods
  Approximate Bayesian computation
     ABC basics


Why does it work?!



      The proof is trivial:

                                     f(θi ) ∝         π(θi )f(z|θi )Iy (z)
                                                z∈D
                                           ∝ π(θi )f(y|θi )
                                           = π(θi |y) .

                                                                        [Accept–Reject 101]
MCMC and likelihood-free methods
  Approximate Bayesian computation
     ABC basics


A as approximative


      When y is a continuous random variable, equality z = y is
      replaced with a tolerance condition,

                                     ρ(y, z)

      where ρ is a distance
MCMC and likelihood-free methods
  Approximate Bayesian computation
     ABC basics


A as approximative


      When y is a continuous random variable, equality z = y is
      replaced with a tolerance condition,

                                        ρ(y, z)

      where ρ is a distance
      Output distributed from

                           π(θ) Pθ {ρ(y, z) < } ∝ π(θ|ρ(y, z) < )
MCMC and likelihood-free methods
  Approximate Bayesian computation
     ABC basics


ABC algorithm


      Algorithm 1 Likelihood-free rejection sampler 2
        for i = 1 to N do
          repeat
             generate θ from the prior distribution π(·)
             generate z from the likelihood f(·|θ )
          until ρ{η(z), η(y)}
          set θi = θ
        end for

      where η(y) defines a (not necessarily sufficient) statistic
MCMC and likelihood-free methods
  Approximate Bayesian computation
     ABC basics


Output

      The likelihood-free algorithm samples from the marginal in z of:

                                            π(θ)f(z|θ)IA ,y (z)
                            π (θ, z|y) =                          ,
                                           A ,y ×Θ π(θ)f(z|θ)dzdθ

      where A        ,y   = {z ∈ D|ρ(η(z), η(y)) < }.
MCMC and likelihood-free methods
  Approximate Bayesian computation
     ABC basics


Output

      The likelihood-free algorithm samples from the marginal in z of:

                                            π(θ)f(z|θ)IA ,y (z)
                            π (θ, z|y) =                          ,
                                           A ,y ×Θ π(θ)f(z|θ)dzdθ

      where A        ,y   = {z ∈ D|ρ(η(z), η(y)) < }.
      The idea behind ABC is that the summary statistics coupled with a
      small tolerance should provide a good approximation of the
      posterior distribution:

                           π (θ|y) = π (θ, z|y)dz ≈ π(θ|η(y)) .
MCMC and likelihood-free methods
  Approximate Bayesian computation
     ABC basics


Pima Indian benchmark




                                                                         80
                            100




                                                                                                                1.0
                            80




                                                                         60




                                                                                                                0.8
                            60




                                                                                                                0.6
                  Density




                                                               Density




                                                                                                      Density
                                                                         40
                            40




                                                                                                                0.4
                                                                         20
                            20




                                                                                                                0.2
                                                                                                                0.0
                            0




                                                                         0




                              −0.005   0.010   0.020   0.030                  −0.05   −0.03   −0.01                   −1.0   0.0   1.0   2.0




      Figure: Comparison between density estimates of the marginals on β1
      (left), β2 (center) and β3 (right) from ABC rejection samples (red) and
      MCMC samples (black)

                                                                                      .
MCMC and likelihood-free methods
  Approximate Bayesian computation
     ABC basics


ABC advances

      Simulating from the prior is often poor in efficiency
MCMC and likelihood-free methods
  Approximate Bayesian computation
     ABC basics


ABC advances

      Simulating from the prior is often poor in efficiency
      Either modify the proposal distribution on θ to increase the density
      of x’s within the vicinity of y...
           [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]
MCMC and likelihood-free methods
  Approximate Bayesian computation
     ABC basics


ABC advances

      Simulating from the prior is often poor in efficiency
      Either modify the proposal distribution on θ to increase the density
      of x’s within the vicinity of y...
           [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]

      ...or by viewing the problem as a conditional density estimation
      and by developing techniques to allow for larger
                                                  [Beaumont et al., 2002]
MCMC and likelihood-free methods
  Approximate Bayesian computation
     ABC basics


ABC advances

      Simulating from the prior is often poor in efficiency
      Either modify the proposal distribution on θ to increase the density
      of x’s within the vicinity of y...
           [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]

      ...or by viewing the problem as a conditional density estimation
      and by developing techniques to allow for larger
                                                  [Beaumont et al., 2002]

      .....or even by including      in the inferential framework [ABCµ ]
                                                         [Ratmann et al., 2009]
MCMC and likelihood-free methods
  Approximate Bayesian computation
     ABC basics


ABC-MCMC


      Markov chain (θ(t) ) created via the transition function
                
                θ ∼ Kω (θ |θ(t) ) if x ∼ f(x|θ ) is such that x = y
                
                
                                                          π(θ )Kω (t) |θ )
      θ (t+1)
              =                       and u ∼ U(0, 1) π(θ(t) )K (θ |θ(t) ) ,
                
                 (t)                                            ω (θ
                θ                   otherwise,
MCMC and likelihood-free methods
  Approximate Bayesian computation
     ABC basics


ABC-MCMC


      Markov chain (θ(t) ) created via the transition function
                
                θ ∼ Kω (θ |θ(t) ) if x ∼ f(x|θ ) is such that x = y
                
                
                                                          π(θ )Kω (t) |θ )
      θ (t+1)
              =                       and u ∼ U(0, 1) π(θ(t) )K (θ |θ(t) ) ,
                
                 (t)                                            ω (θ
                θ                   otherwise,

      has the posterior π(θ|y) as stationary distribution
                                                    [Marjoram et al, 2003]
MCMC and likelihood-free methods
  Approximate Bayesian computation
     ABC basics


ABC-MCMC (2)

      Algorithm 2 Likelihood-free MCMC sampler
          Use Algorithm 1 to get (θ(0) , z(0) )
          for t = 1 to N do
            Generate θ from Kω ·|θ(t−1) ,
            Generate z from the likelihood f(·|θ ),
            Generate u from U[0,1] ,
                         π(θ )Kω (θ(t−1) |θ )
              if u                              I
                        π(θ(t−1) Kω (θ |θ(t−1) ) A   ,y   (z ) then
               set     (θ(t) , z(t) ) = (θ , z )
            else
               (θ(t) , z(t) )) = (θ(t−1) , z(t−1) ),
            end if
          end for
MCMC and likelihood-free methods
  Approximate Bayesian computation
     ABC basics


Sequential Monte Carlo

      SMC is a simulation technique to approximate a sequence of
      related probability distributions πn with π0 “easy” and πT as
      target.
      Iterated IS as PMC : particles moved from time n to time n via
      kernel Kn and use of a sequence of extended targets πn˜
                                                           n
                                   ˜
                                   πn (z0:n ) = πn (zn )         Lj (zj+1 , zj )
                                                           j=0

      where the Lj ’s are backward Markov kernels [check that πn (zn ) is
      a marginal]
                              [Del Moral, Doucet & Jasra, Series B, 2006]
MCMC and likelihood-free methods
  Approximate Bayesian computation
     ABC basics


Sequential Monte Carlo (2)

      Algorithm 3 SMC sampler [Del Moral, Doucet & Jasra, Series B,
      2006]
                       (0)
          sample zi ∼ γ0 (x) (i = 1, . . . , N)
                              (0)         (0)        (0)
          compute weights wi = π0 (zi ))/γ0 (zi )
          for t = 1 to N do
            if ESS(w(t−1) ) < NT then
               resample N particles z(t−1) and set weights to 1
            end if
                       (t−1)        (t−1)
            generate zi      ∼ Kt (zi     , ·) and set weights to
                                                        (t)         (t)   (t−1)
                           (t)            (t−1)    πt (zi ))Lt−1 (zi ), zi        ))
                          wi         =   Wi−1            (t−1)        (t−1)     (t)
                                                  πt−1 (zi     ))Kt (zi     ), zi ))

          end for
MCMC and likelihood-free methods
  Approximate Bayesian computation
     ABC basics


ABC-SMC
                                                      [Del Moral, Doucet & Jasra, 2009]

      True derivation of an SMC-ABC algorithm
      Use of a kernel Kn associated with target π                     n   and derivation of the
      backward kernel
                                                      π n (z )Kn (z , z)
                                     Ln−1 (z, z ) =
                                                           πn (z)

      Update of the weights
                                                         M
                                                         m=1 IA   n
                                                                      (xm )
                                                                        in
                            win ∝ wi(n−1)             M
                                                      m=1 IA   n−1
                                                                     (xm
                                                                       i(n−1) )

      when xm ∼ K(xi(n−1) , ·)
            in

Weitere ähnliche Inhalte

Was ist angesagt?

Performance of Optimal Registration Estimator
Performance of Optimal Registration EstimatorPerformance of Optimal Registration Estimator
Performance of Optimal Registration EstimatorTuan Q. Pham
 
On The Measurement Of Yield Strength
On The Measurement Of Yield StrengthOn The Measurement Of Yield Strength
On The Measurement Of Yield Strengtherikgherbert
 
Normal Distribution
Normal DistributionNormal Distribution
Normal Distributiondheard3
 
Affinity: the meaningful trait-based alternative to the obsolete obfuscation ...
Affinity: the meaningful trait-based alternative to the obsolete obfuscation ...Affinity: the meaningful trait-based alternative to the obsolete obfuscation ...
Affinity: the meaningful trait-based alternative to the obsolete obfuscation ...Lanimal
 
Data processing Lab Lecture
Data processing Lab LectureData processing Lab Lecture
Data processing Lab Lecturewaddling
 
LHC Jets and MET
LHC Jets and METLHC Jets and MET
LHC Jets and METJay Wacker
 
The SA.ANA classification
The SA.ANA classificationThe SA.ANA classification
The SA.ANA classificationperedelcampo
 
Est3 tutorial3mejorado
Est3 tutorial3mejoradoEst3 tutorial3mejorado
Est3 tutorial3mejoradohunapuh
 
1990-2010 - Magnetismo Cuántico
1990-2010 - Magnetismo Cuántico1990-2010 - Magnetismo Cuántico
1990-2010 - Magnetismo Cuánticooriolespinal
 
Regression.Doc Rini
Regression.Doc RiniRegression.Doc Rini
Regression.Doc Riniguestbed2c6
 
Affinity: the meaningful trait-based alternative to the half-saturation constant
Affinity: the meaningful trait-based alternative to the half-saturation constantAffinity: the meaningful trait-based alternative to the half-saturation constant
Affinity: the meaningful trait-based alternative to the half-saturation constantLanimal
 
Effect of thermomechanical process on the austenite transformation in Nb-Mo m...
Effect of thermomechanical process on the austenite transformation in Nb-Mo m...Effect of thermomechanical process on the austenite transformation in Nb-Mo m...
Effect of thermomechanical process on the austenite transformation in Nb-Mo m...Pello Uranga
 
Dynamic Recrystallization of a Nb bearing Al-Si TRIP steel
Dynamic Recrystallization of a Nb bearing Al-Si TRIP steelDynamic Recrystallization of a Nb bearing Al-Si TRIP steel
Dynamic Recrystallization of a Nb bearing Al-Si TRIP steelPello Uranga
 

Was ist angesagt? (15)

Performance of Optimal Registration Estimator
Performance of Optimal Registration EstimatorPerformance of Optimal Registration Estimator
Performance of Optimal Registration Estimator
 
Regresi Nina
Regresi NinaRegresi Nina
Regresi Nina
 
On The Measurement Of Yield Strength
On The Measurement Of Yield StrengthOn The Measurement Of Yield Strength
On The Measurement Of Yield Strength
 
Normal Distribution
Normal DistributionNormal Distribution
Normal Distribution
 
Affinity: the meaningful trait-based alternative to the obsolete obfuscation ...
Affinity: the meaningful trait-based alternative to the obsolete obfuscation ...Affinity: the meaningful trait-based alternative to the obsolete obfuscation ...
Affinity: the meaningful trait-based alternative to the obsolete obfuscation ...
 
Data processing Lab Lecture
Data processing Lab LectureData processing Lab Lecture
Data processing Lab Lecture
 
LHC Jets and MET
LHC Jets and METLHC Jets and MET
LHC Jets and MET
 
Regresi Ni3.Nf
Regresi Ni3.NfRegresi Ni3.Nf
Regresi Ni3.Nf
 
The SA.ANA classification
The SA.ANA classificationThe SA.ANA classification
The SA.ANA classification
 
Est3 tutorial3mejorado
Est3 tutorial3mejoradoEst3 tutorial3mejorado
Est3 tutorial3mejorado
 
1990-2010 - Magnetismo Cuántico
1990-2010 - Magnetismo Cuántico1990-2010 - Magnetismo Cuántico
1990-2010 - Magnetismo Cuántico
 
Regression.Doc Rini
Regression.Doc RiniRegression.Doc Rini
Regression.Doc Rini
 
Affinity: the meaningful trait-based alternative to the half-saturation constant
Affinity: the meaningful trait-based alternative to the half-saturation constantAffinity: the meaningful trait-based alternative to the half-saturation constant
Affinity: the meaningful trait-based alternative to the half-saturation constant
 
Effect of thermomechanical process on the austenite transformation in Nb-Mo m...
Effect of thermomechanical process on the austenite transformation in Nb-Mo m...Effect of thermomechanical process on the austenite transformation in Nb-Mo m...
Effect of thermomechanical process on the austenite transformation in Nb-Mo m...
 
Dynamic Recrystallization of a Nb bearing Al-Si TRIP steel
Dynamic Recrystallization of a Nb bearing Al-Si TRIP steelDynamic Recrystallization of a Nb bearing Al-Si TRIP steel
Dynamic Recrystallization of a Nb bearing Al-Si TRIP steel
 

Andere mochten auch

Let's Practice What We Preach: Likelihood Methods for Monte Carlo Data
Let's Practice What We Preach: Likelihood Methods for Monte Carlo DataLet's Practice What We Preach: Likelihood Methods for Monte Carlo Data
Let's Practice What We Preach: Likelihood Methods for Monte Carlo DataChristian Robert
 
ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihoodChristian Robert
 
Gelfand and Smith (1990), read by
Gelfand and Smith (1990), read byGelfand and Smith (1990), read by
Gelfand and Smith (1990), read byChristian Robert
 
Reading Efron's 1979 paper on bootstrap
Reading Efron's 1979 paper on bootstrapReading Efron's 1979 paper on bootstrap
Reading Efron's 1979 paper on bootstrapChristian Robert
 

Andere mochten auch (7)

Let's Practice What We Preach: Likelihood Methods for Monte Carlo Data
Let's Practice What We Preach: Likelihood Methods for Monte Carlo DataLet's Practice What We Preach: Likelihood Methods for Monte Carlo Data
Let's Practice What We Preach: Likelihood Methods for Monte Carlo Data
 
ABC in Varanasi
ABC in VaranasiABC in Varanasi
ABC in Varanasi
 
ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihood
 
Gelfand and Smith (1990), read by
Gelfand and Smith (1990), read byGelfand and Smith (1990), read by
Gelfand and Smith (1990), read by
 
Hastings paper discussion
Hastings paper discussionHastings paper discussion
Hastings paper discussion
 
Reading Efron's 1979 paper on bootstrap
Reading Efron's 1979 paper on bootstrapReading Efron's 1979 paper on bootstrap
Reading Efron's 1979 paper on bootstrap
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
 

Mehr von Christian Robert

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceChristian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinChristian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihoodChristian Robert
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 

Mehr von Christian Robert (20)

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 

Kürzlich hochgeladen

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 

Kürzlich hochgeladen (20)

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 

Séminaire de Physique à Besancon, Nov. 22, 2012

  • 1. MCMC and likelihood-free methods MCMC and likelihood-free methods Christian P. Robert Universit´ Paris-Dauphine, IUF, & CREST e Universit´ de Besan¸on, November 22, 2012 e c
  • 2. MCMC and likelihood-free methods Computational issues in Bayesian cosmology Computational issues in Bayesian cosmology Computational issues in Bayesian cosmology The Metropolis-Hastings Algorithm The Gibbs Sampler Approximate Bayesian computation
  • 3. MCMC and likelihood-free methods Computational issues in Bayesian cosmology Statistical problems in cosmology Potentially high dimensional parameter space [Not considered here] Immensely slow computation of likelihoods, e.g WMAP, CMB, because of numerically costly spectral transforms [Data is a Fortran program] Nonlinear dependence and degeneracies between parameters introduced by physical constraints or theoretical assumptions
  • 4. MCMC and likelihood-free methods Computational issues in Bayesian cosmology Cosmological data Posterior distribution of cosmological parameters for recent observational data of CMB anisotropies (differences in temperature from directions) [WMAP], SNIa, and cosmic shear. Combination of three likelihoods, some of which are available as public (Fortran) code, and of a uniform prior on a hypercube.
  • 5. MCMC and likelihood-free methods Computational issues in Bayesian cosmology Cosmology parameters Parameters for the cosmology likelihood (C=CMB, S=SNIa, L=lensing) Symbol Description Minimum Maximum Experiment Ωb Baryon density 0.01 0.1 C L Ωm Total matter density 0.01 1.2 C S L w Dark-energy eq. of state -3.0 0.5 C S L ns Primordial spectral index 0.7 1.4 C L ∆2R Normalization (large scales) C σ8 Normalization (small scales) C L h Hubble constant C L τ Optical depth C M Absolute SNIa magnitude S α Colour response S β Stretch response S a L b galaxy z-distribution fit L c L For WMAP5, σ8 is a deduced quantity that depends on the other parameters
  • 6. MCMC and likelihood-free methods Computational issues in Bayesian cosmology Adaptation of importance function [Benabed et al., MNRAS, 2010]
  • 7. MCMC and likelihood-free methods Computational issues in Bayesian cosmology Estimates Parameter PMC MCMC Ωb 0.0432+0.0027 −0.0024 0.0432+0.0026 −0.0023 Ωm 0.254+0.018 −0.017 0.253+0.018 −0.016 τ 0.088+0.018 −0.016 0.088+0.019 −0.015 w −1.011 ± 0.060 −1.010+0.059 −0.060 ns 0.963+0.015 −0.014 0.963+0.015 −0.014 109 ∆2 R 2.413+0.098 −0.093 2.414+0.098 −0.092 h 0.720+0.022 −0.021 0.720+0.023 −0.021 a 0.648+0.040 −0.041 0.649+0.043 −0.042 b 9.3+1.4 −0.9 9.3+1.7 −0.9 c 0.639+0.084 −0.070 0.639+0.082 −0.070 −M 19.331 ± 0.030 19.332+0.029 −0.031 α 1.61+0.15 −0.14 1.62+0.16 −0.14 −β −1.82+0.17 −0.16 −1.82 ± 0.16 σ8 0.795+0.028 −0.030 0.795+0.030 −0.027 Means and 68% credible intervals using lensing, SNIa and CMB
  • 8. MCMC and likelihood-free methods Computational issues in Bayesian cosmology Evidence/Marginal likelihood/Integrated Likelihood ... Central quantity of interest in (Bayesian) model choice π(x) E = π(x)dx = q(x)dx. q(x) expressed as an expectation under any density q with large enough support.
  • 9. MCMC and likelihood-free methods Computational issues in Bayesian cosmology Evidence/Marginal likelihood/Integrated Likelihood ... Central quantity of interest in (Bayesian) model choice π(x) E = π(x)dx = q(x)dx. q(x) expressed as an expectation under any density q with large enough support. Importance sampling provides a sample x1 , . . . xN ∼ q and approximation of the above integral, N E≈ wn n=1 π(xn ) where the wn = q(xn ) are the (unnormalised) importance weights.
  • 10. MCMC and likelihood-free methods Computational issues in Bayesian cosmology Back to cosmology questions Standard cosmology successful in explaining recent observations, such as CMB, SNIa, galaxy clustering, cosmic shear, galaxy cluster counts, and Lyα forest clustering. Flat ΛCDM model with only six free parameters (Ωm , Ωb , h, ns , τ, σ8 )
  • 11. MCMC and likelihood-free methods Computational issues in Bayesian cosmology Back to cosmology questions Standard cosmology successful in explaining recent observations, such as CMB, SNIa, galaxy clustering, cosmic shear, galaxy cluster counts, and Lyα forest clustering. Flat ΛCDM model with only six free parameters (Ωm , Ωb , h, ns , τ, σ8 ) Extensions to ΛCDM may be based on independent evidence (massive neutrinos from oscillation experiments), predicted by compelling hypotheses (primordial gravitational waves from inflation) or reflect ignorance about fundamental physics (dynamical dark energy). Testing for dark energy, curvature, and inflationary models
  • 12. MCMC and likelihood-free methods Computational issues in Bayesian cosmology Extended models Focus on the dark energy equation-of-state parameter, modeled as w = −1 ΛCDM w = w0 wCDM w = w0 + w1 (1 − a) w(z)CDM In addition, curvature parameter ΩK for each of the above is either ΩK = 0 (‘flat’) or ΩK = 0 (‘curved’). Choice of models represents simplest models beyond a “cosmological constant” model able to explain the observed, recent accelerated expansion of the Universe.
  • 13. MCMC and likelihood-free methods Computational issues in Bayesian cosmology Cosmology priors Prior ranges for dark energy and curvature models. In case of w(a) models, the prior on w1 depends on w0 Parameter Description Min. Max. Ωm Total matter density 0.15 0.45 Ωb Baryon density 0.01 0.08 h Hubble parameter 0.5 0.9 ΩK Curvature −1 1 w0 Constant dark-energy par. −1 −1/3 w1 Linear dark-energy par. −1 − w0 −1/3−w0 1−aacc
  • 14. MCMC and likelihood-free methods Computational issues in Bayesian cosmology Results In most cases evidence in favour of the standard model. especially when more datasets/experiments are combined. Largest evidence is ln B12 = 1.8, for the w(z)CDM model and CMB alone. Case where a large part of the prior range is still allowed by the data, and a region of comparable size is excluded. Hence weak evidence that both w0 and w1 are required, but excluded when adding SNIa and BAO datasets. Results on the curvature are compatible with current findings: non-flat Universe(s) strongly disfavoured for the three dark-energy cases.
  • 15. MCMC and likelihood-free methods Computational issues in Bayesian cosmology Evidence
  • 16. MCMC and likelihood-free methods Computational issues in Bayesian cosmology Posterior outcome Posterior on dark-energy parameters w0 and w1 as 68%- and 95% credible regions for WMAP (solid blue lines), WMAP+SNIa (dashed green) and WMAP+SNIa+BAO (dotted red curves). Allowed prior range as red straight lines.
  • 17. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm The Metropolis-Hastings Algorithm Computational issues in Bayesian cosmology The Metropolis-Hastings Algorithm The Gibbs Sampler Approximate Bayesian computation
  • 18. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Monte Carlo basics General purpose A major computational issue in Bayesian statistics: Given a density π known up to a normalizing constant, and an integrable function h, compute ˜ h(x)π(x)µ(dx) Π(h) = h(x)π(x)µ(dx) = ˜ π(x)µ(dx) when ˜ h(x)π(x)µ(dx) is intractable.
  • 19. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Monte Carlo basics Monte Carlo 101 Generate an iid sample x1 , . . . , xN from π and estimate Π(h) by N ΠMC (h) = N−1 ^N h(xi ). i=1 as LLN: ΠMC (h) −→ Π(h) ^N If Π(h2 ) = h2 (x)π(x)µ(dx) < ∞, √ L CLT: N ΠMC (h) − Π(h) ^N N 0, Π [h − Π(h)]2 .
  • 20. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Monte Carlo basics Monte Carlo 101 Generate an iid sample x1 , . . . , xN from π and estimate Π(h) by N ΠMC (h) = N−1 ^N h(xi ). i=1 as LLN: ΠMC (h) −→ Π(h) ^N If Π(h2 ) = h2 (x)π(x)µ(dx) < ∞, √ L CLT: N ΠMC (h) − Π(h) ^N N 0, Π [h − Π(h)]2 . Caveat conducting to MCMC Often impossible or inefficient to simulate directly from Π
  • 21. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains (MCMC) It is not necessary to use a sample from the distribution f to approximate the integral I= h(x)f(x)dx ,
  • 22. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains (MCMC) It is not necessary to use a sample from the distribution f to approximate the integral I= h(x)f(x)dx , [notation warnin: π turned to f!]
  • 23. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains (MCMC) It is not necessary to use a sample from the distribution f to approximate the integral I= h(x)f(x)dx , We can obtain X1 , . . . , Xn ∼ f (approx) without directly simulating from f, using an ergodic Markov chain with stationary distribution f
  • 24. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains (MCMC) It is not necessary to use a sample from the distribution f to approximate the integral I= h(x)f(x)dx , We can obtain X1 , . . . , Xn ∼ f (approx) without directly simulating from f, using an ergodic Markov chain with stationary distribution f Andre¨ Markov ı
  • 25. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains (2) Idea For an arbitrary starting value x(0) , an ergodic chain (X(t) ) is generated using a transition kernel with stationary distribution f
  • 26. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains (2) Idea For an arbitrary starting value x(0) , an ergodic chain (X(t) ) is generated using a transition kernel with stationary distribution f irreducible Markov chain with stationary distribution f is ergodic with limiting distribution f under weak conditions hence convergence in distribution of (X(t) ) to a random variable from f. for T0 “large enough” T0 , X(T0 ) distributed from f Markov sequence is dependent sample X(T0 ) , X(T0 +1) , . . . generated from f Birkoff’s ergodic theorem extends LLN, sufficient for most approximation purposes
  • 27. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains (2) Idea For an arbitrary starting value x(0) , an ergodic chain (X(t) ) is generated using a transition kernel with stationary distribution f Problem: How can one build a Markov chain with a given stationary distribution?
  • 28. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm The Metropolis–Hastings algorithm Arguments: The algorithm uses the objective (target) density f and a conditional density q(y|x) called the instrumental (or proposal) Nicholas Metropolis distribution
  • 29. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm The MH algorithm Algorithm (Metropolis–Hastings) Given x(t) , 1. Generate Yt ∼ q(y|x(t) ). 2. Take Yt with prob. ρ(x(t) , Yt ), X(t+1) = x(t) with prob. 1 − ρ(x(t) , Yt ), where f(y) q(x|y) ρ(x, y) = min ,1 . f(x) q(y|x)
  • 30. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm Features Independent of normalizing constants for both f and q(·|x) (ie, those constants independent of x) Never move to values with f(y) = 0 The chain (x(t) )t may take the same value several times in a row, even though f is a density wrt Lebesgue measure The sequence (yt )t is usually not a Markov chain
  • 31. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm Convergence properties 1. The M-H Markov chain is reversible, with invariant/stationary density f since it satisfies the detailed balance condition f(y) K(y, x) = f(x) K(x, y)
  • 32. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm Convergence properties 1. The M-H Markov chain is reversible, with invariant/stationary density f since it satisfies the detailed balance condition f(y) K(y, x) = f(x) K(x, y) 2. As f is a probability measure, the chain is positive recurrent
  • 33. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm Convergence properties 1. The M-H Markov chain is reversible, with invariant/stationary density f since it satisfies the detailed balance condition f(y) K(y, x) = f(x) K(x, y) 2. As f is a probability measure, the chain is positive recurrent 3. If f(Yt ) q(X(t) |Yt ) Pr 1 < 1. (1) f(X(t) ) q(Yt |X(t) ) that is, the event {X(t+1) = X(t) } is possible, then the chain is aperiodic
  • 34. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithms Random walk Metropolis–Hastings Use of a local perturbation as proposal Yt = X(t) + εt , where εt ∼ g, independent of X(t) . The instrumental density is of the form g(y − x) and the Markov chain is a random walk if we take g to be symmetric g(x) = g(−x)
  • 35. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithms Random walk Metropolis–Hastings [code] Algorithm (Random walk Metropolis) Given x(t) 1. Generate Yt ∼ g(y − x(t) ) 2. Take  Y f(Yt ) (t+1) t with prob. min 1, , X = f(x(t) )  (t) x otherwise.
  • 36. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Extensions Langevin Algorithms Proposal based on the Langevin diffusion Lt is defined by the stochastic differential equation 1 dLt = dBt + log f(Lt )dt, 2 where Bt is the standard Brownian motion Theorem The Langevin diffusion is the only non-explosive diffusion which is reversible with respect to f.
  • 37. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Extensions Discretization Instead, consider the sequence σ2 x(t+1) = x(t) + log f(x(t) ) + σεt , εt ∼ Np (0, Ip ) 2 where σ2 corresponds to the discretization step
  • 38. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Extensions Discretization Instead, consider the sequence σ2 x(t+1) = x(t) + log f(x(t) ) + σεt , εt ∼ Np (0, Ip ) 2 where σ2 corresponds to the discretization step Unfortunately, the discretized chain may be transient, for instance when lim σ2 log f(x)|x|−1 > 1 x→±∞
  • 39. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Extensions MH correction Accept the new value Yt with probability 2 σ2 exp − Yt − x(t) − 2 log f(x(t) ) 2σ2 f(Yt ) · ∧1. f(x(t) ) σ2 2 exp − x(t) − Yt − 2 log f(Yt ) 2σ2 Choice of the scaling factor σ Should lead to an acceptance rate of 0.574 to achieve optimal convergence rates (when the components of x are uncorrelated) [Roberts & Rosenthal, 1998; Girolami & Calderhead, 2011]
  • 40. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Extensions Optimizing the Acceptance Rate Problem of choosing the transition q kernel from a practical point of view Most common solutions: (a) a fully automated algorithm like ARMS; [Gilks & Wild, 1992] (b) an instrumental density g which approximates f, such that f/g is bounded for uniform ergodicity to apply; (c) a random walk In both cases (b) and (c), the choice of g is critical,
  • 41. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Extensions Case of the random walk Different approach to acceptance rates A high acceptance rate does not indicate that the algorithm is moving correctly since it indicates that the random walk is moving too slowly on the surface of f.
  • 42. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Extensions Case of the random walk Different approach to acceptance rates A high acceptance rate does not indicate that the algorithm is moving correctly since it indicates that the random walk is moving too slowly on the surface of f. If x(t) and yt are close, i.e. f(x(t) ) f(yt ) y is accepted with probability f(yt ) min ,1 1. f(x(t) ) For multimodal densities with well separated modes, the negative effect of limited moves on the surface of f clearly shows.
  • 43. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Extensions Case of the random walk (2) If the average acceptance rate is low, the successive values of f(yt ) tend to be small compared with f(x(t) ), which means that the random walk moves quickly on the surface of f since it often reaches the “borders” of the support of f
  • 44. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Extensions Rule of thumb In small dimensions, aim at an average acceptance rate of 50%. In large dimensions, at an average acceptance rate of 25%. [Gelman,Gilks and Roberts, 1995]
  • 45. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Extensions Rule of thumb In small dimensions, aim at an average acceptance rate of 50%. In large dimensions, at an average acceptance rate of 25%. [Gelman,Gilks and Roberts, 1995] warnin: rule to be taken with a pinch of salt!
  • 46. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Extensions Role of scale Example (Noisy AR(1)) Hidden Markov chain from a regular AR(1) model, xt+1 = ϕxt + t+1 t ∼ N(0, τ2 ) and observables yt |xt ∼ N(x2 , σ2 ) t
  • 47. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Extensions Role of scale Example (Noisy AR(1)) Hidden Markov chain from a regular AR(1) model, xt+1 = ϕxt + t+1 t ∼ N(0, τ2 ) and observables yt |xt ∼ N(x2 , σ2 ) t The distribution of xt given xt−1 , xt+1 and yt is −1 τ2 exp (xt − ϕxt−1 )2 + (xt+1 − ϕxt )2 + (yt − x2 )2 t . 2τ2 σ2
  • 48. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Extensions Role of scale Example (Noisy AR(1) continued) For a Gaussian random walk with scale ω small enough, the random walk never jumps to the other mode. But if the scale ω is sufficiently large, the Markov chain explores both modes and give a satisfactory approximation of the target distribution.
  • 49. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Extensions Role of scale Markov chain based on a random walk with scale ω = .1.
  • 50. MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Extensions Role of scale Markov chain based on a random walk with scale ω = .5.
  • 51. MCMC and likelihood-free methods The Gibbs Sampler The Gibbs Sampler Computational issues in Bayesian cosmology The Metropolis-Hastings Algorithm The Gibbs Sampler Approximate Bayesian computation
  • 52. MCMC and likelihood-free methods The Gibbs Sampler General Principles General Principles A very specific simulation algorithm based on the target distribution f: 1. Uses the conditional densities f1 , . . . , fp from f
  • 53. MCMC and likelihood-free methods The Gibbs Sampler General Principles General Principles A very specific simulation algorithm based on the target distribution f: 1. Uses the conditional densities f1 , . . . , fp from f 2. Start with the random variable X = (X1 , . . . , Xp )
  • 54. MCMC and likelihood-free methods The Gibbs Sampler General Principles General Principles A very specific simulation algorithm based on the target distribution f: 1. Uses the conditional densities f1 , . . . , fp from f 2. Start with the random variable X = (X1 , . . . , Xp ) 3. Simulate from the conditional densities, Xi |x1 , x2 , . . . , xi−1 , xi+1 , . . . , xp ∼ fi (xi |x1 , x2 , . . . , xi−1 , xi+1 , . . . , xp ) for i = 1, 2, . . . , p.
  • 55. MCMC and likelihood-free methods The Gibbs Sampler General Principles Gibbs code Algorithm (Gibbs sampler) (t) (t) Given x(t) = (x1 , . . . , xp ), generate (t+1) (t) (t) 1. X1 ∼ f1 (x1 |x2 , . . . , xp ); (t+1) (t+1) (t) (t) 2. X2 ∼ f2 (x2 |x1 , x3 , . . . , xp ), ... (t+1) (t+1) (t+1) p. Xp ∼ fp (xp |x1 , . . . , xp−1 ) X(t+1) → X ∼ f
  • 56. MCMC and likelihood-free methods The Gibbs Sampler General Principles Properties The full conditionals densities f1 , . . . , fp are the only densities used for simulation. Thus, even in a high dimensional problem, all of the simulations may be univariate
  • 57. MCMC and likelihood-free methods The Gibbs Sampler General Principles toy example: iid N(µ, σ2 ) variates iid When Y1 , . . . , Yn ∼ N(y|µ, σ2 ) with both µ and σ unknown, the posterior in (µ, σ2 ) is conjugate outside a standard familly
  • 58. MCMC and likelihood-free methods The Gibbs Sampler General Principles toy example: iid N(µ, σ2 ) variates iid When Y1 , . . . , Yn ∼ N(y|µ, σ2 ) with both µ and σ unknown, the posterior in (µ, σ2 ) is conjugate outside a standard familly But... n σ2 µ|Y 0:n , σ2 ∼ N µ 1 n i=1 Yi , n ) σ2 |Y 1:n , µ ∼ IG σ2 n − 1, 2 n (Yi 2 1 i=1 − µ)2 assuming constant (improper) priors on both µ and σ2 Hence we may use the Gibbs sampler for simulating from the posterior of (µ, σ2 )
  • 59. MCMC and likelihood-free methods The Gibbs Sampler General Principles toy example: R code Gibbs Sampler for Gaussian posterior n = length(Y); S = sum(Y); mu = S/n; for (i in 1:500) S2 = sum((Y-mu)^2); sigma2 = 1/rgamma(1,n/2-1,S2/2); mu = S/n + sqrt(sigma2/n)*rnorm(1);
  • 60. MCMC and likelihood-free methods The Gibbs Sampler General Principles Example of results with n = 10 observations from the N(0, 1) distribution Number of Iterations 1
  • 61. MCMC and likelihood-free methods The Gibbs Sampler General Principles Example of results with n = 10 observations from the N(0, 1) distribution Number of Iterations 1, 2
  • 62. MCMC and likelihood-free methods The Gibbs Sampler General Principles Example of results with n = 10 observations from the N(0, 1) distribution Number of Iterations 1, 2, 3
  • 63. MCMC and likelihood-free methods The Gibbs Sampler General Principles Example of results with n = 10 observations from the N(0, 1) distribution Number of Iterations 1, 2, 3, 4
  • 64. MCMC and likelihood-free methods The Gibbs Sampler General Principles Example of results with n = 10 observations from the N(0, 1) distribution Number of Iterations 1, 2, 3, 4, 5
  • 65. MCMC and likelihood-free methods The Gibbs Sampler General Principles Example of results with n = 10 observations from the N(0, 1) distribution Number of Iterations 1, 2, 3, 4, 5, 10
  • 66. MCMC and likelihood-free methods The Gibbs Sampler General Principles Example of results with n = 10 observations from the N(0, 1) distribution Number of Iterations 1, 2, 3, 4, 5, 10, 25
  • 67. MCMC and likelihood-free methods The Gibbs Sampler General Principles Example of results with n = 10 observations from the N(0, 1) distribution Number of Iterations 1, 2, 3, 4, 5, 10, 25, 50
  • 68. MCMC and likelihood-free methods The Gibbs Sampler General Principles Example of results with n = 10 observations from the N(0, 1) distribution Number of Iterations 1, 2, 3, 4, 5, 10, 25, 50, 100
  • 69. MCMC and likelihood-free methods The Gibbs Sampler General Principles Example of results with n = 10 observations from the N(0, 1) distribution Number of Iterations 1, 2, 3, 4, 5, 10, 25, 50, 100, 500
  • 70. MCMC and likelihood-free methods The Gibbs Sampler General Principles Limitations of the Gibbs sampler Formally, a special case of a sequence of 1-D M-H kernels, all with acceptance rate uniformly equal to 1. The Gibbs sampler 1. limits the choice of instrumental distributions
  • 71. MCMC and likelihood-free methods The Gibbs Sampler General Principles Limitations of the Gibbs sampler Formally, a special case of a sequence of 1-D M-H kernels, all with acceptance rate uniformly equal to 1. The Gibbs sampler 1. limits the choice of instrumental distributions 2. requires some knowledge of f
  • 72. MCMC and likelihood-free methods The Gibbs Sampler General Principles Limitations of the Gibbs sampler Formally, a special case of a sequence of 1-D M-H kernels, all with acceptance rate uniformly equal to 1. The Gibbs sampler 1. limits the choice of instrumental distributions 2. requires some knowledge of f 3. is, by construction, multidimensional
  • 73. MCMC and likelihood-free methods The Gibbs Sampler General Principles Limitations of the Gibbs sampler Formally, a special case of a sequence of 1-D M-H kernels, all with acceptance rate uniformly equal to 1. The Gibbs sampler 1. limits the choice of instrumental distributions 2. requires some knowledge of f 3. is, by construction, multidimensional 4. does not apply to problems where the number of parameters varies as the resulting chain is not irreducible.
  • 74. MCMC and likelihood-free methods The Gibbs Sampler General Principles A wee problem 4 3 2 µ2 1 0 −1 −1 0 1 2 3 4 µ1 Gibbs started at random
  • 75. MCMC and likelihood-free methods The Gibbs Sampler General Principles A wee problem Gibbs stuck at the wrong mode 4 3 3 2 2 µ2 1 µ2 1 0 0 −1 −1 −1 0 1 2 3 4 µ1 Gibbs started at random −1 0 1 2 3 µ1
  • 76. MCMC and likelihood-free methods The Gibbs Sampler General Principles Slice sampler as generic Gibbs If f(θ) can be written as a product k fi (θ), i=1
  • 77. MCMC and likelihood-free methods The Gibbs Sampler General Principles Slice sampler as generic Gibbs If f(θ) can be written as a product k fi (θ), i=1 it can be completed as k I0 ωi fi (θ) , i=1 leading to the following Gibbs algorithm:
  • 78. MCMC and likelihood-free methods The Gibbs Sampler General Principles Slice sampler (code) Algorithm (Slice sampler) Simulate (t+1) 1. ω1 ∼ U[0,f1 (θ(t) )] ; ... (t+1) k. ωk ∼ U[0,fk (θ(t) )] ; k+1. θ(t+1) ∼ UA(t+1) , with (t+1) A(t+1) = {y; fi (y) ωi , i = 1, . . . , k}.
  • 79. MCMC and likelihood-free methods The Gibbs Sampler General Principles Example of results with a truncated N(−3, 1) distribution 0.010 0.008 0.006 y 0.004 0.002 0.000 0.0 0.2 0.4 0.6 0.8 1.0 x Number of Iterations 2
  • 80. MCMC and likelihood-free methods The Gibbs Sampler General Principles Example of results with a truncated N(−3, 1) distribution 0.010 0.008 0.006 y 0.004 0.002 0.000 0.0 0.2 0.4 0.6 0.8 1.0 x Number of Iterations 2, 3
  • 81. MCMC and likelihood-free methods The Gibbs Sampler General Principles Example of results with a truncated N(−3, 1) distribution 0.010 0.008 0.006 y 0.004 0.002 0.000 0.0 0.2 0.4 0.6 0.8 1.0 x Number of Iterations 2, 3, 4
  • 82. MCMC and likelihood-free methods The Gibbs Sampler General Principles Example of results with a truncated N(−3, 1) distribution 0.010 0.008 0.006 y 0.004 0.002 0.000 0.0 0.2 0.4 0.6 0.8 1.0 x Number of Iterations 2, 3, 4, 5
  • 83. MCMC and likelihood-free methods The Gibbs Sampler General Principles Example of results with a truncated N(−3, 1) distribution 0.010 0.008 0.006 y 0.004 0.002 0.000 0.0 0.2 0.4 0.6 0.8 1.0 x Number of Iterations 2, 3, 4, 5, 10
  • 84. MCMC and likelihood-free methods The Gibbs Sampler General Principles Example of results with a truncated N(−3, 1) distribution 0.010 0.008 0.006 y 0.004 0.002 0.000 0.0 0.2 0.4 0.6 0.8 1.0 x Number of Iterations 2, 3, 4, 5, 10, 50
  • 85. MCMC and likelihood-free methods The Gibbs Sampler General Principles Example of results with a truncated N(−3, 1) distribution 0.010 0.008 0.006 y 0.004 0.002 0.000 0.0 0.2 0.4 0.6 0.8 1.0 x Number of Iterations 2, 3, 4, 5, 10, 50, 100
  • 86. MCMC and likelihood-free methods Approximate Bayesian computation Approximate Bayesian computation Computational issues in Bayesian cosmology The Metropolis-Hastings Algorithm The Gibbs Sampler Approximate Bayesian computation
  • 87. MCMC and likelihood-free methods Approximate Bayesian computation ABC basics Regular Bayesian computation issues Recap’: When faced with a non-standard posterior distribution π(θ|y) ∝ π(θ)L(θ|y) the standard solution is to use simulation (Monte Carlo) to produce a sample θ1 , . . . , θT from π(θ|y) (or approximately by Markov chain Monte Carlo methods) [Robert & Casella, 2004]
  • 88. MCMC and likelihood-free methods Approximate Bayesian computation ABC basics Untractable likelihoods Cases when the likelihood function f(y|θ) is unavailable (in analytic and numerical senses) and when the completion step f(y|θ) = f(y, z|θ) dz Z is impossible or too costly because of the dimension of z c MCMC cannot be implemented!
  • 89. MCMC and likelihood-free methods Approximate Bayesian computation ABC basics Illustration Phylogenetic tree: in population genetics, reconstitution of a common ancestor from a sample of genes via a phylogenetic tree that is close to impossible to integrate out [100 processor days with 4 parameters] [Cornuet et al., 2009, Bioinformatics]
  • 90. MCMC and likelihood-free methods Approximate Bayesian computation ABC basics Illustration !""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03! 1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+ Différents scénarios possibles, choix de scenario par ABC demo-genetic inference Genetic model of evolution from a common ancestor (MRCA) characterized by a set of parameters that cover historical, demographic, and genetic factors Dataset of polymorphism (DNA sample) observed at the present time Le scenario 1a est largement soutenu par rapport aux autres ! plaide pour une origine commune des Verdu et al. 2009 populations pygmées d’Afrique de l’Ouest 97
  • 91. MCMC and likelihood-free methods Approximate Bayesian computation ABC basics Illustration !""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03! 1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+ Pygmies population demo-genetics Pygmies populations: do they have a common origin? when and how did they split from non-pygmies populations? were there more recent interactions between pygmies and non-pygmies populations? 94
  • 92. MCMC and likelihood-free methods Approximate Bayesian computation ABC basics The ABC method Bayesian setting: target is π(θ)f(x|θ)
  • 93. MCMC and likelihood-free methods Approximate Bayesian computation ABC basics The ABC method Bayesian setting: target is π(θ)f(x|θ) When likelihood f(x|θ) not in closed form, likelihood-free rejection technique:
  • 94. MCMC and likelihood-free methods Approximate Bayesian computation ABC basics The ABC method Bayesian setting: target is π(θ)f(x|θ) When likelihood f(x|θ) not in closed form, likelihood-free rejection technique: ABC algorithm For an observation y ∼ f(y|θ), under the prior π(θ), keep jointly simulating θ ∼ π(θ) , z ∼ f(z|θ ) , until the auxiliary variable z is equal to the observed value, z = y. [Tavar´ et al., 1997] e
  • 95. MCMC and likelihood-free methods Approximate Bayesian computation ABC basics Why does it work?! The proof is trivial: f(θi ) ∝ π(θi )f(z|θi )Iy (z) z∈D ∝ π(θi )f(y|θi ) = π(θi |y) . [Accept–Reject 101]
  • 96. MCMC and likelihood-free methods Approximate Bayesian computation ABC basics A as approximative When y is a continuous random variable, equality z = y is replaced with a tolerance condition, ρ(y, z) where ρ is a distance
  • 97. MCMC and likelihood-free methods Approximate Bayesian computation ABC basics A as approximative When y is a continuous random variable, equality z = y is replaced with a tolerance condition, ρ(y, z) where ρ is a distance Output distributed from π(θ) Pθ {ρ(y, z) < } ∝ π(θ|ρ(y, z) < )
  • 98. MCMC and likelihood-free methods Approximate Bayesian computation ABC basics ABC algorithm Algorithm 1 Likelihood-free rejection sampler 2 for i = 1 to N do repeat generate θ from the prior distribution π(·) generate z from the likelihood f(·|θ ) until ρ{η(z), η(y)} set θi = θ end for where η(y) defines a (not necessarily sufficient) statistic
  • 99. MCMC and likelihood-free methods Approximate Bayesian computation ABC basics Output The likelihood-free algorithm samples from the marginal in z of: π(θ)f(z|θ)IA ,y (z) π (θ, z|y) = , A ,y ×Θ π(θ)f(z|θ)dzdθ where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
  • 100. MCMC and likelihood-free methods Approximate Bayesian computation ABC basics Output The likelihood-free algorithm samples from the marginal in z of: π(θ)f(z|θ)IA ,y (z) π (θ, z|y) = , A ,y ×Θ π(θ)f(z|θ)dzdθ where A ,y = {z ∈ D|ρ(η(z), η(y)) < }. The idea behind ABC is that the summary statistics coupled with a small tolerance should provide a good approximation of the posterior distribution: π (θ|y) = π (θ, z|y)dz ≈ π(θ|η(y)) .
  • 101. MCMC and likelihood-free methods Approximate Bayesian computation ABC basics Pima Indian benchmark 80 100 1.0 80 60 0.8 60 0.6 Density Density Density 40 40 0.4 20 20 0.2 0.0 0 0 −0.005 0.010 0.020 0.030 −0.05 −0.03 −0.01 −1.0 0.0 1.0 2.0 Figure: Comparison between density estimates of the marginals on β1 (left), β2 (center) and β3 (right) from ABC rejection samples (red) and MCMC samples (black) .
  • 102. MCMC and likelihood-free methods Approximate Bayesian computation ABC basics ABC advances Simulating from the prior is often poor in efficiency
  • 103. MCMC and likelihood-free methods Approximate Bayesian computation ABC basics ABC advances Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]
  • 104. MCMC and likelihood-free methods Approximate Bayesian computation ABC basics ABC advances Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger [Beaumont et al., 2002]
  • 105. MCMC and likelihood-free methods Approximate Bayesian computation ABC basics ABC advances Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger [Beaumont et al., 2002] .....or even by including in the inferential framework [ABCµ ] [Ratmann et al., 2009]
  • 106. MCMC and likelihood-free methods Approximate Bayesian computation ABC basics ABC-MCMC Markov chain (θ(t) ) created via the transition function  θ ∼ Kω (θ |θ(t) ) if x ∼ f(x|θ ) is such that x = y   π(θ )Kω (t) |θ ) θ (t+1) = and u ∼ U(0, 1) π(θ(t) )K (θ |θ(t) ) ,   (t) ω (θ θ otherwise,
  • 107. MCMC and likelihood-free methods Approximate Bayesian computation ABC basics ABC-MCMC Markov chain (θ(t) ) created via the transition function  θ ∼ Kω (θ |θ(t) ) if x ∼ f(x|θ ) is such that x = y   π(θ )Kω (t) |θ ) θ (t+1) = and u ∼ U(0, 1) π(θ(t) )K (θ |θ(t) ) ,   (t) ω (θ θ otherwise, has the posterior π(θ|y) as stationary distribution [Marjoram et al, 2003]
  • 108. MCMC and likelihood-free methods Approximate Bayesian computation ABC basics ABC-MCMC (2) Algorithm 2 Likelihood-free MCMC sampler Use Algorithm 1 to get (θ(0) , z(0) ) for t = 1 to N do Generate θ from Kω ·|θ(t−1) , Generate z from the likelihood f(·|θ ), Generate u from U[0,1] , π(θ )Kω (θ(t−1) |θ ) if u I π(θ(t−1) Kω (θ |θ(t−1) ) A ,y (z ) then set (θ(t) , z(t) ) = (θ , z ) else (θ(t) , z(t) )) = (θ(t−1) , z(t−1) ), end if end for
  • 109. MCMC and likelihood-free methods Approximate Bayesian computation ABC basics Sequential Monte Carlo SMC is a simulation technique to approximate a sequence of related probability distributions πn with π0 “easy” and πT as target. Iterated IS as PMC : particles moved from time n to time n via kernel Kn and use of a sequence of extended targets πn˜ n ˜ πn (z0:n ) = πn (zn ) Lj (zj+1 , zj ) j=0 where the Lj ’s are backward Markov kernels [check that πn (zn ) is a marginal] [Del Moral, Doucet & Jasra, Series B, 2006]
  • 110. MCMC and likelihood-free methods Approximate Bayesian computation ABC basics Sequential Monte Carlo (2) Algorithm 3 SMC sampler [Del Moral, Doucet & Jasra, Series B, 2006] (0) sample zi ∼ γ0 (x) (i = 1, . . . , N) (0) (0) (0) compute weights wi = π0 (zi ))/γ0 (zi ) for t = 1 to N do if ESS(w(t−1) ) < NT then resample N particles z(t−1) and set weights to 1 end if (t−1) (t−1) generate zi ∼ Kt (zi , ·) and set weights to (t) (t) (t−1) (t) (t−1) πt (zi ))Lt−1 (zi ), zi )) wi = Wi−1 (t−1) (t−1) (t) πt−1 (zi ))Kt (zi ), zi )) end for
  • 111. MCMC and likelihood-free methods Approximate Bayesian computation ABC basics ABC-SMC [Del Moral, Doucet & Jasra, 2009] True derivation of an SMC-ABC algorithm Use of a kernel Kn associated with target π n and derivation of the backward kernel π n (z )Kn (z , z) Ln−1 (z, z ) = πn (z) Update of the weights M m=1 IA n (xm ) in win ∝ wi(n−1) M m=1 IA n−1 (xm i(n−1) ) when xm ∼ K(xi(n−1) , ·) in