SlideShare ist ein Scribd-Unternehmen logo
1 von 157
Downloaden Sie, um offline zu lesen
On some computational methods for Bayesian model choice




             On some computational methods for Bayesian
                          model choice

                                            Christian P. Robert

                                Universit´ Paris Dauphine & CREST-INSEE
                                         e
                                http://www.ceremade.dauphine.fr/~xian


                  4th Warwick–Oxford Statistics Seminar
                        Warwick, November 3, 2009
          Joint works with J.-M. Marin, M. Beaumont, N. Chopin,
                       J.-M. Cornuet, & D. Wraith
On some computational methods for Bayesian model choice




Outline


      1    Introduction

      2    Importance sampling solutions compared

      3    Nested sampling

      4    ABC model choice
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Model choice as model comparison



      Choice between models
      Several models available for the same observation

                                    Mi : x ∼ fi (x|θi ),   i∈I

      where I can be finite or infinite
      Replace hypotheses with models
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Model choice as model comparison



      Choice between models
      Several models available for the same observation

                                    Mi : x ∼ fi (x|θi ),   i∈I

      where I can be finite or infinite
      Replace hypotheses with models
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Bayesian model choice

      Probabilise the entire model/parameter space
                 allocate probabilities pi to all models Mi
                 define priors πi (θi ) for each parameter space Θi
                 compute

                                                        pi          fi (x|θi )πi (θi )dθi
                                                               Θi
                               π(Mi |x) =
                                                          pj         fj (x|θj )πj (θj )dθj
                                                    j           Θj



                 take largest π(Mi |x) to determine “best” model,
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Bayesian model choice

      Probabilise the entire model/parameter space
                 allocate probabilities pi to all models Mi
                 define priors πi (θi ) for each parameter space Θi
                 compute

                                                        pi          fi (x|θi )πi (θi )dθi
                                                               Θi
                               π(Mi |x) =
                                                          pj         fj (x|θj )πj (θj )dθj
                                                    j           Θj



                 take largest π(Mi |x) to determine “best” model,
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Bayesian model choice

      Probabilise the entire model/parameter space
                 allocate probabilities pi to all models Mi
                 define priors πi (θi ) for each parameter space Θi
                 compute

                                                        pi          fi (x|θi )πi (θi )dθi
                                                               Θi
                               π(Mi |x) =
                                                          pj         fj (x|θj )πj (θj )dθj
                                                    j           Θj



                 take largest π(Mi |x) to determine “best” model,
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Bayesian model choice

      Probabilise the entire model/parameter space
                 allocate probabilities pi to all models Mi
                 define priors πi (θi ) for each parameter space Θi
                 compute

                                                        pi          fi (x|θi )πi (θi )dθi
                                                               Θi
                               π(Mi |x) =
                                                          pj         fj (x|θj )πj (θj )dθj
                                                    j           Θj



                 take largest π(Mi |x) to determine “best” model,
On some computational methods for Bayesian model choice
  Introduction
     Bayes factor



Bayes factor

      Definition (Bayes factors)
      For testing hypotheses H0 : θ ∈ Θ0 vs. Ha : θ ∈ Θ0 , under prior

                                      π(Θ0 )π0 (θ) + π(Θc )π1 (θ) ,
                                                        0

      central quantity

                                                                   f (x|θ)π0 (θ)dθ
                               π(Θ0 |x)            π(Θ0 )     Θ0
                      B01    =                            =
                               π(Θc |x)
                                  0                π(Θc )
                                                      0            f (x|θ)π1 (θ)dθ
                                                              Θc
                                                               0


                                                                            [Jeffreys, 1939]
On some computational methods for Bayesian model choice
  Introduction
     Evidence



Evidence



      Problems using a similar quantity, the evidence

                                     Zk =            πk (θk )Lk (θk ) dθk ,
                                                Θk

      aka the marginal likelihood.
                                                                              [Jeffreys, 1939]
On some computational methods for Bayesian model choice
  Importance sampling solutions compared




A comparison of importance sampling solutions

      1    Introduction

      2    Importance sampling solutions compared
             Regular importance
             Bridge sampling
             Harmonic means
             Mixtures to bridge
             Chib’s solution
             The Savage–Dickey ratio

      3    Nested sampling

      4    ABC model choice
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Regular importance



Bayes factor approximation

      When approximating the Bayes factor

                                                      f0 (x|θ0 )π0 (θ0 )dθ0
                                                 Θ0
                                    B01 =
                                                      f1 (x|θ1 )π1 (θ1 )dθ1
                                                 Θ1

      use of importance functions                     0   and   1   and

                                       n−1
                                        0
                                                 n0         i       i
                                                 i=1 f0 (x|θ0 )π0 (θ0 )/
                                                                                  i
                                                                              0 (θ0 )
                           B01 =
                                       n−1
                                        1
                                                 n1         i       i
                                                 i=1 f1 (x|θ1 )π1 (θ1 )/
                                                                                  i
                                                                              1 (θ1 )
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Regular importance



Diabetes in Pima Indian women

      Example (R benchmark)
      “A population of women who were at least 21 years old, of Pima
      Indian heritage and living near Phoenix (AZ), was tested for
      diabetes according to WHO criteria. The data were collected by
      the US National Institute of Diabetes and Digestive and Kidney
      Diseases.”
      200 Pima Indian women with observed variables
              plasma glucose concentration in oral glucose tolerance test
              diastolic blood pressure
              diabetes pedigree function
              presence/absence of diabetes
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Regular importance



Probit modelling on Pima Indian women


      Probability of diabetes function of above variables

                            P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
      Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
      g-prior modelling:

                                           β ∼ N3 (0, n XT X)−1
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Regular importance



Probit modelling on Pima Indian women


      Probability of diabetes function of above variables

                            P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
      Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
      g-prior modelling:

                                           β ∼ N3 (0, n XT X)−1
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Regular importance



MCMC 101 for probit models

      Use of either a random walk proposal

                                                   β =β+

      in a Metropolis-Hastings algorithm (since the likelihood is
      available)
      or of a Gibbs sampler that takes advantage of the missing/latent
      variable
                     z|y, x, β ∼ N (xT β, 1) Iy × I1−y
                                              z≥0     z≤0

      (since β|y, X, z is distributed as a standard normal)
                                                  [Gibbs three times faster]
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Regular importance



MCMC 101 for probit models

      Use of either a random walk proposal

                                                   β =β+

      in a Metropolis-Hastings algorithm (since the likelihood is
      available)
      or of a Gibbs sampler that takes advantage of the missing/latent
      variable
                     z|y, x, β ∼ N (xT β, 1) Iy × I1−y
                                              z≥0     z≤0

      (since β|y, X, z is distributed as a standard normal)
                                                  [Gibbs three times faster]
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Regular importance



MCMC 101 for probit models

      Use of either a random walk proposal

                                                   β =β+

      in a Metropolis-Hastings algorithm (since the likelihood is
      available)
      or of a Gibbs sampler that takes advantage of the missing/latent
      variable
                     z|y, x, β ∼ N (xT β, 1) Iy × I1−y
                                              z≥0     z≤0

      (since β|y, X, z is distributed as a standard normal)
                                                  [Gibbs three times faster]
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Regular importance



Importance sampling for the Pima Indian dataset


      Use of the importance function inspired from the MLE estimate
      distribution
                                        ˆ ˆ
                                β ∼ N (β, Σ)

      R Importance sampling code
      model1=summary(glm(y~-1+X1,family=binomial(link="probit")))
      is1=rmvnorm(Niter,mean=model1$coeff[,1],sigma=2*model1$cov.unscaled)
      is2=rmvnorm(Niter,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)
      bfis=mean(exp(probitlpost(is1,y,X1)-dmvlnorm(is1,mean=model1$coeff[,1],
           sigma=2*model1$cov.unscaled))) / mean(exp(probitlpost(is2,y,X2)-
           dmvlnorm(is2,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)))
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Regular importance



Importance sampling for the Pima Indian dataset


      Use of the importance function inspired from the MLE estimate
      distribution
                                        ˆ ˆ
                                β ∼ N (β, Σ)

      R Importance sampling code
      model1=summary(glm(y~-1+X1,family=binomial(link="probit")))
      is1=rmvnorm(Niter,mean=model1$coeff[,1],sigma=2*model1$cov.unscaled)
      is2=rmvnorm(Niter,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)
      bfis=mean(exp(probitlpost(is1,y,X1)-dmvlnorm(is1,mean=model1$coeff[,1],
           sigma=2*model1$cov.unscaled))) / mean(exp(probitlpost(is2,y,X2)-
           dmvlnorm(is2,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)))
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Regular importance



Diabetes in Pima Indian women
      Comparison of the variation of the Bayes factor approximations
      based on 100 replicas for 20, 000 simulations from the prior and
      the above MLE importance sampler
             5
             4




                                                                  q
             3
             2




                                 Monte Carlo              Importance sampling
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Bridge sampling



Bridge sampling


      Special case:
      If
                                           π1 (θ1 |x) ∝ π1 (θ1 |x)
                                                        ˜
                                           π2 (θ2 |x) ∝ π2 (θ2 |x)
                                                        ˜
      live on the same space (Θ1 = Θ2 ), then
                                            n
                                       1         π1 (θi |x)
                                                 ˜
                           B12 ≈                              θi ∼ π2 (θ|x)
                                       n         π2 (θi |x)
                                                 ˜
                                           i=1

                           [Gelman & Meng, 1998; Chen, Shao & Ibrahim, 2000]
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Bridge sampling



Bridge sampling variance



      The bridge sampling estimator does poorly if
                                                                            2
                             var(B12 )  1                 π1 (θ) − π2 (θ)
                                 2     ≈ E
                               B12      n                      π2 (θ)

      is large, i.e. if π1 and π2 have little overlap...
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Bridge sampling



Bridge sampling variance



      The bridge sampling estimator does poorly if
                                                                            2
                             var(B12 )  1                 π1 (θ) − π2 (θ)
                                 2     ≈ E
                               B12      n                      π2 (θ)

      is large, i.e. if π1 and π2 have little overlap...
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Bridge sampling



(Further) bridge sampling

      General identity:

                                       π2 (θ|x)α(θ)π1 (θ|x)dθ
                                       ˜
                  B12 =                                                 ∀ α(·)
                                       π1 (θ|x)α(θ)π2 (θ|x)dθ
                                       ˜

                                           n1
                                  1
                                                 π2 (θ1i |x)α(θ1i )
                                                 ˜
                                  n1
                                           i=1
                           ≈                n2                        θji ∼ πj (θ|x)
                                  1
                                                 π1 (θ2i |x)α(θ2i )
                                                 ˜
                                  n2
                                           i=1
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Bridge sampling



Optimal bridge sampling

      The optimal choice of auxiliary function is
                                                       n1 + n2
                                    α =
                                              n1 π1 (θ|x) + n2 π2 (θ|x)

      leading to
                                             n1
                                       1                    π2 (θ1i |x)
                                                             ˜
                                       n1         n1 π1 (θ1i |x) + n2 π2 (θ1i |x)
                                            i=1
                           B12 ≈             n2
                                       1                    π1 (θ2i |x)
                                                             ˜
                                       n2         n1 π1 (θ2i |x) + n2 π2 (θ2i |x)
                                            i=1

                                                                                    Back later!
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Bridge sampling



Optimal bridge sampling (2)


      Reason:

       Var(B12 )    1                      π1 (θ)π2 (θ)[n1 π1 (θ) + n2 π2 (θ)]α(θ)2 dθ
           2     ≈                                                              2        −1
         B12       n1 n2                                  π1 (θ)π2 (θ)α(θ) dθ

      (by the δ method)
      Drawback: Dependence on the unknown normalising constants
      solved iteratively
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Bridge sampling



Optimal bridge sampling (2)


      Reason:

       Var(B12 )    1                      π1 (θ)π2 (θ)[n1 π1 (θ) + n2 π2 (θ)]α(θ)2 dθ
           2     ≈                                                              2        −1
         B12       n1 n2                                  π1 (θ)π2 (θ)α(θ) dθ

      (by the δ method)
      Drawback: Dependence on the unknown normalising constants
      solved iteratively
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Bridge sampling



Extension to varying dimensions

      When dim(Θ1 ) = dim(Θ2 ), e.g. θ2 = (θ1 , ψ), introduction of a
      pseudo-posterior density, ω(ψ|θ1 , x), augmenting π1 (θ1 |x) into
      joint distribution
                             π1 (θ1 |x) × ω(ψ|θ1 , x)
      on Θ2 so that

                              π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ
                              ˜
               B12 =
                              π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ
                              ˜

                                  π1 (θ1 )ω(ψ|θ1 )
                                  ˜                  Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)]
                                                         π
                       = Eπ2                       =
                                     π2 (θ1 , ψ)
                                     ˜                 Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)]
                                                            π

        for any conditional density ω(ψ|θ1 ) and any joint density ϕ.
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Bridge sampling



Extension to varying dimensions

      When dim(Θ1 ) = dim(Θ2 ), e.g. θ2 = (θ1 , ψ), introduction of a
      pseudo-posterior density, ω(ψ|θ1 , x), augmenting π1 (θ1 |x) into
      joint distribution
                             π1 (θ1 |x) × ω(ψ|θ1 , x)
      on Θ2 so that

                              π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ
                              ˜
               B12 =
                              π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ
                              ˜

                                  π1 (θ1 )ω(ψ|θ1 )
                                  ˜                  Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)]
                                                         π
                       = Eπ2                       =
                                     π2 (θ1 , ψ)
                                     ˜                 Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)]
                                                            π

        for any conditional density ω(ψ|θ1 ) and any joint density ϕ.
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Bridge sampling



Illustration for the Pima Indian dataset

      Use of the MLE induced conditional of β3 given (β1 , β2 ) as a
      pseudo-posterior and mixture of both MLE approximations on β3
      in bridge sampling estimate
      R bridge sampling code
      cova=model2$cov.unscaled
      expecta=model2$coeff[,1]
      covw=cova[3,3]-t(cova[1:2,3])%*%ginv(cova[1:2,1:2])%*%cova[1:2,3]

      probit1=hmprobit(Niter,y,X1)
      probit2=hmprobit(Niter,y,X2)
      pseudo=rnorm(Niter,meanw(probit1),sqrt(covw))
      probit1p=cbind(probit1,pseudo)

      bfbs=mean(exp(probitlpost(probit2[,1:2],y,X1)+dnorm(probit2[,3],meanw(probit2[,1:2]),
           sqrt(covw),log=T))/ (dmvnorm(probit2,expecta,cova)+dnorm(probit2[,3],expecta[3],
           cova[3,3])))/ mean(exp(probitlpost(probit1p,y,X2))/(dmvnorm(probit1p,expecta,cova)+
           dnorm(pseudo,expecta[3],cova[3,3])))
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Bridge sampling



Illustration for the Pima Indian dataset

      Use of the MLE induced conditional of β3 given (β1 , β2 ) as a
      pseudo-posterior and mixture of both MLE approximations on β3
      in bridge sampling estimate
      R bridge sampling code
      cova=model2$cov.unscaled
      expecta=model2$coeff[,1]
      covw=cova[3,3]-t(cova[1:2,3])%*%ginv(cova[1:2,1:2])%*%cova[1:2,3]

      probit1=hmprobit(Niter,y,X1)
      probit2=hmprobit(Niter,y,X2)
      pseudo=rnorm(Niter,meanw(probit1),sqrt(covw))
      probit1p=cbind(probit1,pseudo)

      bfbs=mean(exp(probitlpost(probit2[,1:2],y,X1)+dnorm(probit2[,3],meanw(probit2[,1:2]),
           sqrt(covw),log=T))/ (dmvnorm(probit2,expecta,cova)+dnorm(probit2[,3],expecta[3],
           cova[3,3])))/ mean(exp(probitlpost(probit1p,y,X2))/(dmvnorm(probit1p,expecta,cova)+
           dnorm(pseudo,expecta[3],cova[3,3])))
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Bridge sampling



Diabetes in Pima Indian women (cont’d)
      Comparison of the variation of the Bayes factor approximations
      based on 100 × 20, 000 simulations from the prior (MC), the above
      bridge sampler and the above importance sampler
             5
             4




                                                            q


                                                                   q
             3




                                                            q
             2




                               MC                         Bridge   IS
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Harmonic means



The original harmonic mean estimator



      When θki ∼ πk (θ|x),
                                                     T
                                                1            1
                                                T         L(θkt |x)
                                                    t=1

      is an unbiased estimator of 1/mk (x)
                                                                      [Newton & Raftery, 1994]

      Highly dangerous: Most often leads to an infinite variance!!!
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Harmonic means



The original harmonic mean estimator



      When θki ∼ πk (θ|x),
                                                     T
                                                1            1
                                                T         L(θkt |x)
                                                    t=1

      is an unbiased estimator of 1/mk (x)
                                                                      [Newton & Raftery, 1994]

      Highly dangerous: Most often leads to an infinite variance!!!
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Harmonic means



“The Worst Monte Carlo Method Ever”

      “The good news is that the Law of Large Numbers guarantees that
      this estimator is consistent ie, it will very likely be very close to the
      correct answer if you use a sufficiently large number of points from
      the posterior distribution.
      The bad news is that the number of points required for this
      estimator to get close to the right answer will often be greater
      than the number of atoms in the observable universe. The even
      worse news is that it’s easy for people to not realize this, and to
      na¨ıvely accept estimates that are nowhere close to the correct
      value of the marginal likelihood.”
                                       [Radford Neal’s blog, Aug. 23, 2008]
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Harmonic means



“The Worst Monte Carlo Method Ever”

      “The good news is that the Law of Large Numbers guarantees that
      this estimator is consistent ie, it will very likely be very close to the
      correct answer if you use a sufficiently large number of points from
      the posterior distribution.
      The bad news is that the number of points required for this
      estimator to get close to the right answer will often be greater
      than the number of atoms in the observable universe. The even
      worse news is that it’s easy for people to not realize this, and to
      na¨ıvely accept estimates that are nowhere close to the correct
      value of the marginal likelihood.”
                                       [Radford Neal’s blog, Aug. 23, 2008]
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Harmonic means



Approximating Zk from a posterior sample



      Use of the [harmonic mean] identity

                    ϕ(θk )                               ϕ(θk )       πk (θk )Lk (θk )       1
      Eπk                        x =                                                   dθk =
                πk (θk )Lk (θk )                     πk (θk )Lk (θk )       Zk               Zk

      no matter what the proposal ϕ(·) is.
                          [Gelfand & Dey, 1994; Bartolucci et al., 2006]
      Direct exploitation of the MCMC output
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Harmonic means



Approximating Zk from a posterior sample



      Use of the [harmonic mean] identity

                    ϕ(θk )                               ϕ(θk )       πk (θk )Lk (θk )       1
      Eπk                        x =                                                   dθk =
                πk (θk )Lk (θk )                     πk (θk )Lk (θk )       Zk               Zk

      no matter what the proposal ϕ(·) is.
                          [Gelfand & Dey, 1994; Bartolucci et al., 2006]
      Direct exploitation of the MCMC output
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Harmonic means



Comparison with regular importance sampling


      Harmonic mean: Constraint opposed to usual importance sampling
      constraints: ϕ(θ) must have lighter (rather than fatter) tails than
      πk (θk )Lk (θk ) for the approximation
                                                          T               (t)
                                                  1                 ϕ(θk )
                                Z1k = 1                             (t)         (t)
                                                  T             πk (θk )Lk (θk )
                                                          t=1

      to have a finite variance.
      E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Harmonic means



Comparison with regular importance sampling


      Harmonic mean: Constraint opposed to usual importance sampling
      constraints: ϕ(θ) must have lighter (rather than fatter) tails than
      πk (θk )Lk (θk ) for the approximation
                                                          T               (t)
                                                  1                 ϕ(θk )
                                Z1k = 1                             (t)         (t)
                                                  T             πk (θk )Lk (θk )
                                                          t=1

      to have a finite variance.
      E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Harmonic means



Comparison with regular importance sampling (cont’d)



      Compare Z1k with a standard importance sampling approximation
                                                     T        (t)         (t)
                                             1            πk (θk )Lk (θk )
                                    Z2k    =                        (t)
                                             T                ϕ(θk )
                                                   t=1

                         (t)
      where the θk ’s are generated from the density ϕ(·) (with fatter
      tails like t’s)
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Harmonic means



HPD indicator as ϕ
      Use the convex hull of MCMC simulations corresponding to the
      10% HPD region (easily derived!) and ϕ as indicator:
                                10
                         ϕ(θ) =          Id(θ,θ(t) )≤
                                T
                                                          t∈HPD
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Harmonic means



Diabetes in Pima Indian women (cont’d)
      Comparison of the variation of the Bayes factor approximations
      based on 100 replicas for 20, 000 simulations for a simulation from
      the above harmonic mean sampler and importance samplers
             3.102 3.104 3.106 3.108 3.110 3.112 3.114 3.116




                                                                     q




                                                                                       q




                                                               Harmonic mean   Importance sampling
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Mixtures to bridge



Approximating Zk using a mixture representation



                                                                           Bridge sampling redux

      Design a specific mixture for simulation [importance sampling]
      purposes, with density

                               ϕk (θk ) ∝ ω1 πk (θk )Lk (θk ) + ϕ(θk ) ,

      where ϕ(·) is arbitrary (but normalised)
      Note: ω1 is not a probability weight
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Mixtures to bridge



Approximating Zk using a mixture representation



                                                                           Bridge sampling redux

      Design a specific mixture for simulation [importance sampling]
      purposes, with density

                               ϕk (θk ) ∝ ω1 πk (θk )Lk (θk ) + ϕ(θk ) ,

      where ϕ(·) is arbitrary (but normalised)
      Note: ω1 is not a probability weight
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Mixtures to bridge



Approximating Z using a mixture representation (cont’d)

      Corresponding MCMC (=Gibbs) sampler
      At iteration t
          1   Take δ (t) = 1 with probability

                          (t−1)            (t−1)                 (t−1)         (t−1)          (t−1)
              ω1 πk (θk           )Lk (θk          )      ω1 πk (θk      )Lk (θk       ) + ϕ(θk       )

              and δ (t) = 2 otherwise;
                                                (t)                   (t−1)
          2   If δ (t) = 1, generate θk ∼ MCMC(θk          , θk ) where
              MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated
              with the posterior πk (θk |x) ∝ πk (θk )Lk (θk );
                                                (t)
          3   If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Mixtures to bridge



Approximating Z using a mixture representation (cont’d)

      Corresponding MCMC (=Gibbs) sampler
      At iteration t
          1   Take δ (t) = 1 with probability

                          (t−1)            (t−1)                 (t−1)         (t−1)          (t−1)
              ω1 πk (θk           )Lk (θk          )      ω1 πk (θk      )Lk (θk       ) + ϕ(θk       )

              and δ (t) = 2 otherwise;
                                                (t)                   (t−1)
          2   If δ (t) = 1, generate θk ∼ MCMC(θk          , θk ) where
              MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated
              with the posterior πk (θk |x) ∝ πk (θk )Lk (θk );
                                                (t)
          3   If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Mixtures to bridge



Approximating Z using a mixture representation (cont’d)

      Corresponding MCMC (=Gibbs) sampler
      At iteration t
          1   Take δ (t) = 1 with probability

                          (t−1)            (t−1)                 (t−1)         (t−1)          (t−1)
              ω1 πk (θk           )Lk (θk          )      ω1 πk (θk      )Lk (θk       ) + ϕ(θk       )

              and δ (t) = 2 otherwise;
                                                (t)                   (t−1)
          2   If δ (t) = 1, generate θk ∼ MCMC(θk          , θk ) where
              MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated
              with the posterior πk (θk |x) ∝ πk (θk )Lk (θk );
                                                (t)
          3   If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Mixtures to bridge



Evidence approximation by mixtures
      Rao-Blackwellised estimate
                          T
           ˆ 1
           ξ=
                                           (t)
                                ω1 πk (θk )Lk (θk )
                                                     (t)               (t)          (t)
                                                              ω1 πk (θk )Lk (θk ) + ϕ(θk ) ,
                                                                                                 (t)
              T
                          t=1

      converges to ω1 Zk /{ω1 Zk + 1}
               3k
                           ˆ       ˆ        ˆ
      Deduce Zˆ from ω1 Z3k /{ω1 Z3k + 1} = ξ ie

                           T           (t)     (t)                      (t)          (t)          (t)
                           t=1 ω1 πk (θk )Lk (θk )              ω1 π(θk )Lk (θk ) + ϕ(θk )
           ˆ
           Z3k =
                                   T      (t)                    (t)          (t)          (t)
                                   t=1 ϕ(θk )             ω1 πk (θk )Lk (θk ) + ϕ(θk )

                                                                                    [Bridge sampler]
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Mixtures to bridge



Evidence approximation by mixtures
      Rao-Blackwellised estimate
                          T
           ˆ 1
           ξ=
                                           (t)
                                ω1 πk (θk )Lk (θk )
                                                     (t)               (t)          (t)
                                                              ω1 πk (θk )Lk (θk ) + ϕ(θk ) ,
                                                                                                 (t)
              T
                          t=1

      converges to ω1 Zk /{ω1 Zk + 1}
               3k
                           ˆ       ˆ        ˆ
      Deduce Zˆ from ω1 Z3k /{ω1 Z3k + 1} = ξ ie

                           T           (t)     (t)                      (t)          (t)          (t)
                           t=1 ω1 πk (θk )Lk (θk )              ω1 π(θk )Lk (θk ) + ϕ(θk )
           ˆ
           Z3k =
                                   T      (t)                    (t)          (t)          (t)
                                   t=1 ϕ(θk )             ω1 πk (θk )Lk (θk ) + ϕ(θk )

                                                                                    [Bridge sampler]
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Chib’s solution



Chib’s representation


      Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
      θk ∼ πk (θk ),
                                        fk (x|θk ) πk (θk )
                        Zk = mk (x) =
                                            πk (θk |x)
      Use of an approximation to the posterior
                                                                  ∗       ∗
                                                          fk (x|θk ) πk (θk )
                                  Zk = mk (x) =                               .
                                                               ˆ ∗
                                                              πk (θk |x)
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Chib’s solution



Chib’s representation


      Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
      θk ∼ πk (θk ),
                                        fk (x|θk ) πk (θk )
                        Zk = mk (x) =
                                            πk (θk |x)
      Use of an approximation to the posterior
                                                                  ∗       ∗
                                                          fk (x|θk ) πk (θk )
                                  Zk = mk (x) =                               .
                                                               ˆ ∗
                                                              πk (θk |x)
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Chib’s solution



Case of latent variables



      For missing variable z as in mixture models, natural Rao-Blackwell
      estimate
                                       T
                            ∗       1          ∗      (t)
                       πk (θk |x) =       πk (θk |x, zk ) ,
                                    T
                                                          t=1
                         (t)
      where the zk ’s are Gibbs sampled latent variables
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Chib’s solution



Label switching


      A mixture model [special case of missing variable model] is
      invariant under permutations of the indices of the components.
      E.g., mixtures
                          0.3N (0, 1) + 0.7N (2.3, 1)
      and
                                       0.7N (2.3, 1) + 0.3N (0, 1)
      are exactly the same!
       c The component parameters θi are not identifiable
      marginally since they are exchangeable
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Chib’s solution



Label switching


      A mixture model [special case of missing variable model] is
      invariant under permutations of the indices of the components.
      E.g., mixtures
                          0.3N (0, 1) + 0.7N (2.3, 1)
      and
                                       0.7N (2.3, 1) + 0.3N (0, 1)
      are exactly the same!
       c The component parameters θi are not identifiable
      marginally since they are exchangeable
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Chib’s solution



Connected difficulties


          1   Number of modes of the likelihood of order O(k!):
               c Maximization and even [MCMC] exploration of the
              posterior surface harder
          2   Under exchangeable priors on (θ, p) [prior invariant under
              permutation of the indices], all posterior marginals are
              identical:
               c Posterior expectation of θ1 equal to posterior expectation
              of θ2
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Chib’s solution



Connected difficulties


          1   Number of modes of the likelihood of order O(k!):
               c Maximization and even [MCMC] exploration of the
              posterior surface harder
          2   Under exchangeable priors on (θ, p) [prior invariant under
              permutation of the indices], all posterior marginals are
              identical:
               c Posterior expectation of θ1 equal to posterior expectation
              of θ2
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Chib’s solution



License


      Since Gibbs output does not produce exchangeability, the Gibbs
      sampler has not explored the whole parameter space: it lacks
      energy to switch simultaneously enough component allocations at
      once

                                                                             0.2 0.3 0.4 0.5
                  −1 0 1 2 3
          µi




                                  0   100   200

                                                  n
                                                      300   400   500
                                                                        pi                     −1         0
                                                                                                                    µ
                                                                                                                     1

                                                                                                                     i
                                                                                                                               2       3




                                                                             0.4 0.6 0.8 1.0
                0.2 0.3 0.4 0.5




                                                                        σi
           pi




                                  0   100   200       300   400   500                           0.2           0.3        0.4         0.5
                                                  n                                                                 pi
                0.4 0.6 0.8 1.0




                                                                             −1 0 1 2 3
          σi




                                                                        µi




                                  0   100   200       300   400   500                               0.4         0.6            0.8         1.0
                                                  n                                                                 σi
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Chib’s solution



Label switching paradox




      We should observe the exchangeability of the components [label
      switching] to conclude about convergence of the Gibbs sampler.
      If we observe it, then we do not know how to estimate the
      parameters.
      If we do not, then we are uncertain about the convergence!!!
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Chib’s solution



Label switching paradox




      We should observe the exchangeability of the components [label
      switching] to conclude about convergence of the Gibbs sampler.
      If we observe it, then we do not know how to estimate the
      parameters.
      If we do not, then we are uncertain about the convergence!!!
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Chib’s solution



Label switching paradox




      We should observe the exchangeability of the components [label
      switching] to conclude about convergence of the Gibbs sampler.
      If we observe it, then we do not know how to estimate the
      parameters.
      If we do not, then we are uncertain about the convergence!!!
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Chib’s solution



Compensation for label switching
                                           (t)
      For mixture models, zk usually fails to visit all configurations in a
      balanced way, despite the symmetry predicted by the theory
                                                              1
                       πk (θk |x) = πk (σ(θk )|x) =                      πk (σ(θk )|x)
                                                              k!
                                                                   σ∈S

      for all σ’s in Sk , set of all permutations of {1, . . . , k}.
      Consequences on numerical approximation, biased by an order k!
      Recover the theoretical symmetry by using
                                                          T
                              ∗               1                       ∗        (t)
                         πk (θk |x) =                          πk (σ(θk )|x, zk ) .
                                             T k!
                                                    σ∈Sk t=1

                                                    [Berkhof, Mechelen, & Gelman, 2003]
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Chib’s solution



Compensation for label switching
                                           (t)
      For mixture models, zk usually fails to visit all configurations in a
      balanced way, despite the symmetry predicted by the theory
                                                              1
                       πk (θk |x) = πk (σ(θk )|x) =                      πk (σ(θk )|x)
                                                              k!
                                                                   σ∈S

      for all σ’s in Sk , set of all permutations of {1, . . . , k}.
      Consequences on numerical approximation, biased by an order k!
      Recover the theoretical symmetry by using
                                                          T
                              ∗               1                       ∗        (t)
                         πk (θk |x) =                          πk (σ(θk )|x, zk ) .
                                             T k!
                                                    σ∈Sk t=1

                                                    [Berkhof, Mechelen, & Gelman, 2003]
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Chib’s solution



Galaxy dataset

      n = 82 galaxies as a mixture of k normal distributions with both
      mean and variance unknown.
                                                          [Roeder, 1992]
                                                                         Average density
                                                         0.8
                                                         0.6
                                    Relative Frequency

                                                         0.4
                                                         0.2
                                                         0.0




                                                               −2   −1      0              1   2   3

                                                                                data
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Chib’s solution



Galaxy dataset (k)
                                              ∗
      Using only the original estimate, with θk as the MAP estimator,
                                       log(mk (x)) = −105.1396
                                           ˆ
      for k = 3 (based on 103 simulations), while introducing the
      permutations leads to
                                       log(mk (x)) = −103.3479
                                           ˆ
      Note that
                                 −105.1396 + log(3!) = −103.3479

        k                 2           3             4         5         6         7         8
        mk (x)         -115.68     -103.35       -102.66   -101.93   -102.88   -105.48   -108.44

      Estimations of the marginal likelihoods by the symmetrised Chib’s
      approximation (based on 105 Gibbs iterations and, for k > 5, 100
      permutations selected at random in Sk ).
                                [Lee, Marin, Mengersen & Robert, 2008]
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Chib’s solution



Galaxy dataset (k)
                                              ∗
      Using only the original estimate, with θk as the MAP estimator,
                                       log(mk (x)) = −105.1396
                                           ˆ
      for k = 3 (based on 103 simulations), while introducing the
      permutations leads to
                                       log(mk (x)) = −103.3479
                                           ˆ
      Note that
                                 −105.1396 + log(3!) = −103.3479

        k                 2           3             4         5         6         7         8
        mk (x)         -115.68     -103.35       -102.66   -101.93   -102.88   -105.48   -108.44

      Estimations of the marginal likelihoods by the symmetrised Chib’s
      approximation (based on 105 Gibbs iterations and, for k > 5, 100
      permutations selected at random in Sk ).
                                [Lee, Marin, Mengersen & Robert, 2008]
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Chib’s solution



Galaxy dataset (k)
                                              ∗
      Using only the original estimate, with θk as the MAP estimator,
                                       log(mk (x)) = −105.1396
                                           ˆ
      for k = 3 (based on 103 simulations), while introducing the
      permutations leads to
                                       log(mk (x)) = −103.3479
                                           ˆ
      Note that
                                 −105.1396 + log(3!) = −103.3479

        k                 2           3             4         5         6         7         8
        mk (x)         -115.68     -103.35       -102.66   -101.93   -102.88   -105.48   -108.44

      Estimations of the marginal likelihoods by the symmetrised Chib’s
      approximation (based on 105 Gibbs iterations and, for k > 5, 100
      permutations selected at random in Sk ).
                                [Lee, Marin, Mengersen & Robert, 2008]
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Chib’s solution



Case of the probit model

      For the completion by z,
                                                     1
                                     π (θ|x) =
                                     ˆ                        π(θ|x, z (t) )
                                                     T    t

      is a simple average of normal densities
      R Bridge sampling code
      gibbs1=gibbsprobit(Niter,y,X1)
      gibbs2=gibbsprobit(Niter,y,X2)
      bfchi=mean(exp(dmvlnorm(t(t(gibbs2$mu)-model2$coeff[,1]),mean=rep(0,3),
              sigma=gibbs2$Sigma2)-probitlpost(model2$coeff[,1],y,X2)))/
            mean(exp(dmvlnorm(t(t(gibbs1$mu)-model1$coeff[,1]),mean=rep(0,2),
              sigma=gibbs1$Sigma2)-probitlpost(model1$coeff[,1],y,X1)))
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Chib’s solution



Case of the probit model

      For the completion by z,
                                                     1
                                     π (θ|x) =
                                     ˆ                        π(θ|x, z (t) )
                                                     T    t

      is a simple average of normal densities
      R Bridge sampling code
      gibbs1=gibbsprobit(Niter,y,X1)
      gibbs2=gibbsprobit(Niter,y,X2)
      bfchi=mean(exp(dmvlnorm(t(t(gibbs2$mu)-model2$coeff[,1]),mean=rep(0,3),
              sigma=gibbs2$Sigma2)-probitlpost(model2$coeff[,1],y,X2)))/
            mean(exp(dmvlnorm(t(t(gibbs1$mu)-model1$coeff[,1]),mean=rep(0,2),
              sigma=gibbs1$Sigma2)-probitlpost(model1$coeff[,1],y,X1)))
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     Chib’s solution



Diabetes in Pima Indian women (cont’d)
      Comparison of the variation of the Bayes factor approximations
      based on 100 replicas for 20, 000 simulations for a simulation from
      the above Chib’s and importance samplers

                                                                   q
              0.0255




                                           q
                                           q                       q
                                           q
              0.0250
              0.0245
              0.0240




                                           q
                                           q



                                Chib's method             importance sampling
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     The Savage–Dickey ratio



The Savage–Dickey ratio
      Special representation of the Bayes factor used for simulation
      Given a test H0 : θ = θ0 in a model f (x|θ, ψ) with a nuisance
      parameter ψ, under priors π0 (ψ) and π1 (θ, ψ) such that

                                              π1 (ψ|θ0 ) = π0 (ψ)

      then
                                                          π1 (θ0 |x)
                                              B01 =                  ,
                                                           π1 (θ0 )
      with the obvious notations

                 π1 (θ) =          π1 (θ, ψ)dψ ,          π1 (θ|x) =     π1 (θ, ψ|x)dψ ,

                                           [Dickey, 1971; Verdinelli & Wasserman, 1995]
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     The Savage–Dickey ratio



Measure-theoretic difficulty

      The representation depends on the choice of versions of conditional
      densities:
                   π0 (ψ)f (x|θ0 , ψ) dψ
      B01 =                                          [by definition]
                 π1 (θ, ψ)f (x|θ, ψ) dψdθ
                  π1 (ψ|θ0 )f (x|θ0 , ψ) dψ π1 (θ0 )
             =                                       [specific version of π1 (ψ|θ0 )]
                 π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (θ0 )
                 π1 (θ0 , ψ)f (x|θ0 , ψ) dψ
             =                                       [specific version of π1 (θ0 , ψ)]
                       m1 (x)π1 (θ0 )
               π1 (θ0 |x)
             =
                π1 (θ0 )

                       c Dickey’s (1971) condition is not a condition
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     The Savage–Dickey ratio



Similar measure-theoretic difficulty


      Verdinelli-Wasserman extension:
                                       π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ)
                           B01 =                 E
                                        π1 (θ0 )                 π1 (ψ|θ0 )

      depends on similar choices of versions
      Monte Carlo implementation relies on continuous versions of all
      densities without making mention of it
                                          [Chen, Shao & Ibrahim, 2000]
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     The Savage–Dickey ratio



Similar measure-theoretic difficulty


      Verdinelli-Wasserman extension:
                                       π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ)
                           B01 =                 E
                                        π1 (θ0 )                 π1 (ψ|θ0 )

      depends on similar choices of versions
      Monte Carlo implementation relies on continuous versions of all
      densities without making mention of it
                                          [Chen, Shao & Ibrahim, 2000]
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     The Savage–Dickey ratio



Computational implementation
      Starting from the (new) prior

                                           π1 (θ, ψ) = π1 (θ)π0 (ψ)
                                           ˜

      define the associated posterior

                          π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x)
                          ˜                                    ˜

      and impose
                                 π1 (θ0 |x)
                                 ˜                        π0 (ψ)f (x|θ0 , ψ) dψ
                                            =
                                  π0 (θ0 )                      m1 (x)
                                                                ˜
      to hold.
      Then
                                                    π1 (θ0 |x) m1 (x)
                                                    ˜          ˜
                                           B01 =
                                                     π1 (θ0 ) m1 (x)
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     The Savage–Dickey ratio



Computational implementation
      Starting from the (new) prior

                                           π1 (θ, ψ) = π1 (θ)π0 (ψ)
                                           ˜

      define the associated posterior

                          π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x)
                          ˜                                    ˜

      and impose
                                 π1 (θ0 |x)
                                 ˜                        π0 (ψ)f (x|θ0 , ψ) dψ
                                            =
                                  π0 (θ0 )                      m1 (x)
                                                                ˜
      to hold.
      Then
                                                    π1 (θ0 |x) m1 (x)
                                                    ˜          ˜
                                           B01 =
                                                     π1 (θ0 ) m1 (x)
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     The Savage–Dickey ratio



First ratio

      If (θ(1) , ψ (1) ), . . . , (θ(T ) , ψ (T ) ) ∼ π (θ, ψ|x), then
                                                      ˜
                                               1
                                                           π1 (θ0 |x, ψ (t) )
                                                           ˜
                                               T       t

      converges to π1 (θ0 |x) (if the right version is used in θ0 ).
                    ˜
      When π1 (θ0 |x, ψ unavailable, replace with
            ˜
                                                   T
                                           1
                                                       π1 (θ0 |x, z (t) , ψ (t) )
                                                       ˜
                                           T
                                               t=1
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     The Savage–Dickey ratio



First ratio

      If (θ(1) , ψ (1) ), . . . , (θ(T ) , ψ (T ) ) ∼ π (θ, ψ|x), then
                                                      ˜
                                               1
                                                           π1 (θ0 |x, ψ (t) )
                                                           ˜
                                               T       t

      converges to π1 (θ0 |x) (if the right version is used in θ0 ).
                    ˜
      When π1 (θ0 |x, ψ unavailable, replace with
            ˜
                                                   T
                                           1
                                                       π1 (θ0 |x, z (t) , ψ (t) )
                                                       ˜
                                           T
                                               t=1
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     The Savage–Dickey ratio



Bridge revival (1)

      Since m1 (x)/m1 (x) is unknown, apparent failure!
            ˜
      Use of the identity

                         π1 (θ, ψ)f (x|θ, ψ)                 π1 (ψ|θ)   m1 (x)
      Eπ1 (θ,ψ|x)
       ˜
                                               = Eπ1 (θ,ψ|x)
                                                  ˜
                                                                      =
                        π0 (ψ)π1 (θ)f (x|θ, ψ)                π0 (ψ)    m1 (x)
                                                                        ˜

      to (biasedly) estimate m1 (x)/m1 (x) by
                             ˜
                                                   T
                                                          π1 (ψ (t) |θ(t) )
                                           T
                                                 t=1
                                                            π0 (ψ (t) )

      based on the same sample from π1 .
                                    ˜
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     The Savage–Dickey ratio



Bridge revival (1)

      Since m1 (x)/m1 (x) is unknown, apparent failure!
            ˜
      Use of the identity

                         π1 (θ, ψ)f (x|θ, ψ)                 π1 (ψ|θ)   m1 (x)
      Eπ1 (θ,ψ|x)
       ˜
                                               = Eπ1 (θ,ψ|x)
                                                  ˜
                                                                      =
                        π0 (ψ)π1 (θ)f (x|θ, ψ)                π0 (ψ)    m1 (x)
                                                                        ˜

      to (biasedly) estimate m1 (x)/m1 (x) by
                             ˜
                                                   T
                                                          π1 (ψ (t) |θ(t) )
                                           T
                                                 t=1
                                                            π0 (ψ (t) )

      based on the same sample from π1 .
                                    ˜
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     The Savage–Dickey ratio



Bridge revival (2)
      Alternative identity

                        π0 (ψ)π1 (θ)f (x|θ, ψ)                π0 (ψ)    m1 (x)
                                                                        ˜
      Eπ1 (θ,ψ|x)                              = Eπ1 (θ,ψ|x)          =
                         π1 (θ, ψ)f (x|θ, ψ)                 π1 (ψ|θ)   m1 (x)
                                                 ¯ ¯
      suggests using a second sample (θ(1) , ψ (1) , z (1) ), . . . ,
       ¯(T ) , ψ (T ) , z (T ) ) ∼ π1 (θ, ψ|x) and
      (θ       ¯

                                                  T              ¯
                                            1               π0 (ψ (t) )
                                            T                 ¯ ¯
                                                          π1 (ψ (t) |θ(t) )
                                                 t=1

      Resulting estimate:

                                                          (t) , ψ (t) )       T            ¯
                               1      t π1 (θ0 |x, z
                                        ˜                                 1           π0 (ψ (t) )
                   B01 =
                               T              π1 (θ0 )                    T
                                                                              t=1
                                                                                    π1 (ψ ¯
                                                                                        ¯ (t) |θ (t) )
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     The Savage–Dickey ratio



Bridge revival (2)
      Alternative identity

                        π0 (ψ)π1 (θ)f (x|θ, ψ)                π0 (ψ)    m1 (x)
                                                                        ˜
      Eπ1 (θ,ψ|x)                              = Eπ1 (θ,ψ|x)          =
                         π1 (θ, ψ)f (x|θ, ψ)                 π1 (ψ|θ)   m1 (x)
                                                 ¯ ¯
      suggests using a second sample (θ(1) , ψ (1) , z (1) ), . . . ,
       ¯(T ) , ψ (T ) , z (T ) ) ∼ π1 (θ, ψ|x) and
      (θ       ¯

                                                  T              ¯
                                            1               π0 (ψ (t) )
                                            T                 ¯ ¯
                                                          π1 (ψ (t) |θ(t) )
                                                 t=1

      Resulting estimate:

                                                          (t) , ψ (t) )       T            ¯
                               1      t π1 (θ0 |x, z
                                        ˜                                 1           π0 (ψ (t) )
                   B01 =
                               T              π1 (θ0 )                    T
                                                                              t=1
                                                                                    π1 (ψ ¯
                                                                                        ¯ (t) |θ (t) )
On some computational methods for Bayesian model choice
  Importance sampling solutions compared
     The Savage–Dickey ratio



Diabetes in Pima Indian women (cont’d)
      Comparison of the variation of the Bayes factor approximations
      based on 100 replicas for 20, 000 simulations for a simulation from
      the above importance, Chib’s, Savage–Dickey’s and bridge
      samplers

                                                                            q
             3.4
             3.2
             3.0
             2.8




                                                                            q



                               IS             Chib        Savage−Dickey   Bridge
On some computational methods for Bayesian model choice
  Nested sampling
     Purpose



Nested sampling: Goal


      Skilling’s (2007) technique using the one-dimensional
      representation:
                                                              1
                                     Z = Eπ [L(θ)] =              ϕ(x) dx
                                                          0

      with
                                        ϕ−1 (l) = P π (L(θ) > l).
      Note; ϕ(·) is intractable in most cases.
On some computational methods for Bayesian model choice
  Nested sampling
     Implementation



Nested sampling: First approximation

      Approximate Z by a Riemann sum:
                                                 j
                                        Z=           (xi−1 − xi )ϕ(xi )
                                               i=1

      where the xi ’s are either:
              deterministic:            xi = e−i/N
              or random:

                                x0 = 1,          xi+1 = ti xi ,   ti ∼ Be(N, 1)

              so that E[log xi ] = −i/N .
On some computational methods for Bayesian model choice
  Nested sampling
     Implementation



Extraneous white noise
      Take
                                          1 −(1−δ)θ −δθ      1 −(1−δ)θ
          Z=          e−θ dθ =              e      e    = Eδ   e
                                          δ                  δ
                        N
             1
          ˆ
          Z=                  δ −1 e−(1−δ)θi (xi−1 − xi ) ,   θi ∼ E(δ) I(θi ≤ θi−1 )
             N
                        i=1

        N           deterministic       random
        50              4.64              10.5
                        4.65              10.5
        100             2.47               4.9 Comparison of variances and MSEs
                        2.48              5.02
        500             .549              1.01
                        .550              1.14
On some computational methods for Bayesian model choice
  Nested sampling
     Implementation



Extraneous white noise
      Take
                                          1 −(1−δ)θ −δθ      1 −(1−δ)θ
          Z=          e−θ dθ =              e      e    = Eδ   e
                                          δ                  δ
                        N
             1
          ˆ
          Z=                  δ −1 e−(1−δ)θi (xi−1 − xi ) ,   θi ∼ E(δ) I(θi ≤ θi−1 )
             N
                        i=1

        N           deterministic       random
        50              4.64              10.5
                        4.65              10.5
        100             2.47               4.9 Comparison of variances and MSEs
                        2.48              5.02
        500             .549              1.01
                        .550              1.14
On some computational methods for Bayesian model choice
  Nested sampling
     Implementation



Extraneous white noise
      Take
                                          1 −(1−δ)θ −δθ      1 −(1−δ)θ
          Z=          e−θ dθ =              e      e    = Eδ   e
                                          δ                  δ
                        N
             1
          ˆ
          Z=                  δ −1 e−(1−δ)θi (xi−1 − xi ) ,   θi ∼ E(δ) I(θi ≤ θi−1 )
             N
                        i=1

        N           deterministic       random
        50              4.64              10.5
                        4.65              10.5
        100             2.47               4.9 Comparison of variances and MSEs
                        2.48              5.02
        500             .549              1.01
                        .550              1.14
On some computational methods for Bayesian model choice
  Nested sampling
     Implementation



Nested sampling: Second approximation

      Replace (intractable) ϕ(xi ) by ϕi , obtained by

      Nested sampling
      Start with N values θ1 , . . . , θN sampled from π
      At iteration i,
          1   Take ϕi = L(θk ), where θk is the point with smallest
              likelihood in the pool of θi ’s
          2   Replace θk with a sample from the prior constrained to
              L(θ) > ϕi : the current N points are sampled from prior
              constrained to L(θ) > ϕi .
On some computational methods for Bayesian model choice
  Nested sampling
     Implementation



Nested sampling: Second approximation

      Replace (intractable) ϕ(xi ) by ϕi , obtained by

      Nested sampling
      Start with N values θ1 , . . . , θN sampled from π
      At iteration i,
          1   Take ϕi = L(θk ), where θk is the point with smallest
              likelihood in the pool of θi ’s
          2   Replace θk with a sample from the prior constrained to
              L(θ) > ϕi : the current N points are sampled from prior
              constrained to L(θ) > ϕi .
On some computational methods for Bayesian model choice
  Nested sampling
     Implementation



Nested sampling: Second approximation

      Replace (intractable) ϕ(xi ) by ϕi , obtained by

      Nested sampling
      Start with N values θ1 , . . . , θN sampled from π
      At iteration i,
          1   Take ϕi = L(θk ), where θk is the point with smallest
              likelihood in the pool of θi ’s
          2   Replace θk with a sample from the prior constrained to
              L(θ) > ϕi : the current N points are sampled from prior
              constrained to L(θ) > ϕi .
On some computational methods for Bayesian model choice
  Nested sampling
     Implementation



Nested sampling: Third approximation


      Iterate the above steps until a given stopping iteration j is
      reached: e.g.,
              observe very small changes in the approximation Z;
              reach the maximal value of L(θ) when the likelihood is
              bounded and its maximum is known;
              truncate the integral Z at level , i.e. replace
                                        1                        1
                                            ϕ(x) dx       with       ϕ(x) dx
                                    0
On some computational methods for Bayesian model choice
  Nested sampling
     Error rates



Approximation error


      Error = Z − Z
                        j                                 1
                   =         (xi−1 − xi )ϕi −                 ϕ(x) dx = −         ϕ(x) dx
                       i=1                            0                       0
                         j                                      1
                   +          (xi−1 − xi )ϕ(xi ) −                  ϕ(x) dx       (Quadrature Error)
                        i=1
                         j
                   +          (xi−1 − xi ) {ϕi − ϕ(xi )}                          (Stochastic Error)
                        i=1



                                                                    [Dominated by Monte Carlo!]
On some computational methods for Bayesian model choice
  Nested sampling
     Error rates



A CLT for the Stochastic Error

      The (dominating) stochastic error is OP (N −1/2 ):
                                                                  D
                               N 1/2 {Stochastic Error} → N (0, V )

      with
                         V =−                     sϕ (s)tϕ (t) log(s ∨ t) ds dt.
                                      s,t∈[ ,1]

                                                          [Proof based on Donsker’s theorem]


      The number of simulated points equals the number of iterations j,
      and is a multiple of N : if one stops at first iteration j such that
      e−j/N < , then: j = N − log .
On some computational methods for Bayesian model choice
  Nested sampling
     Error rates



A CLT for the Stochastic Error

      The (dominating) stochastic error is OP (N −1/2 ):
                                                                  D
                               N 1/2 {Stochastic Error} → N (0, V )

      with
                         V =−                     sϕ (s)tϕ (t) log(s ∨ t) ds dt.
                                      s,t∈[ ,1]

                                                          [Proof based on Donsker’s theorem]


      The number of simulated points equals the number of iterations j,
      and is a multiple of N : if one stops at first iteration j such that
      e−j/N < , then: j = N − log .
On some computational methods for Bayesian model choice
  Nested sampling
     Impact of dimension



Curse of dimension


      For a simple Gaussian-Gaussian model of dimension dim(θ) = d,
      the following 3 quantities are O(d):
          1   asymptotic variance of the NS estimator;
          2   number of iterations (necessary to reach a given truncation
              error);
          3   cost of one simulated sample.
      Therefore, CPU time necessary for achieving error level e is

                                                    O(d3 /e2 )
On some computational methods for Bayesian model choice
  Nested sampling
     Impact of dimension



Curse of dimension


      For a simple Gaussian-Gaussian model of dimension dim(θ) = d,
      the following 3 quantities are O(d):
          1   asymptotic variance of the NS estimator;
          2   number of iterations (necessary to reach a given truncation
              error);
          3   cost of one simulated sample.
      Therefore, CPU time necessary for achieving error level e is

                                                    O(d3 /e2 )
On some computational methods for Bayesian model choice
  Nested sampling
     Impact of dimension



Curse of dimension


      For a simple Gaussian-Gaussian model of dimension dim(θ) = d,
      the following 3 quantities are O(d):
          1   asymptotic variance of the NS estimator;
          2   number of iterations (necessary to reach a given truncation
              error);
          3   cost of one simulated sample.
      Therefore, CPU time necessary for achieving error level e is

                                                    O(d3 /e2 )
On some computational methods for Bayesian model choice
  Nested sampling
     Impact of dimension



Curse of dimension


      For a simple Gaussian-Gaussian model of dimension dim(θ) = d,
      the following 3 quantities are O(d):
          1   asymptotic variance of the NS estimator;
          2   number of iterations (necessary to reach a given truncation
              error);
          3   cost of one simulated sample.
      Therefore, CPU time necessary for achieving error level e is

                                                    O(d3 /e2 )
On some computational methods for Bayesian model choice
  Nested sampling
     Constraints



Sampling from constr’d priors

      Exact simulation from the constrained prior is intractable in most
      cases!

      Skilling (2007) proposes to use MCMC, but:
              this introduces a bias (stopping rule).
              if MCMC stationary distribution is unconst’d prior, more and
              more difficult to sample points such that L(θ) > l as l
              increases.

      If implementable, then slice sampler can be devised at the same
      cost!
                                                       [Thanks, Gareth!]
On some computational methods for Bayesian model choice
  Nested sampling
     Constraints



Sampling from constr’d priors

      Exact simulation from the constrained prior is intractable in most
      cases!

      Skilling (2007) proposes to use MCMC, but:
              this introduces a bias (stopping rule).
              if MCMC stationary distribution is unconst’d prior, more and
              more difficult to sample points such that L(θ) > l as l
              increases.

      If implementable, then slice sampler can be devised at the same
      cost!
                                                       [Thanks, Gareth!]
On some computational methods for Bayesian model choice
  Nested sampling
     Constraints



Sampling from constr’d priors

      Exact simulation from the constrained prior is intractable in most
      cases!

      Skilling (2007) proposes to use MCMC, but:
              this introduces a bias (stopping rule).
              if MCMC stationary distribution is unconst’d prior, more and
              more difficult to sample points such that L(θ) > l as l
              increases.

      If implementable, then slice sampler can be devised at the same
      cost!
                                                       [Thanks, Gareth!]
On some computational methods for Bayesian model choice
  Nested sampling
     Importance variant



A IS variant of nested sampling

                                                   ˜
      Consider instrumental prior π and likelihood L, weight function

                                                          π(θ)L(θ)
                                             w(θ) =
                                                          π(θ)L(θ)

      and weighted NS estimator
                                               j
                                     Z=            (xi−1 − xi )ϕi w(θi ).
                                             i=1

      Then choose (π, L) so that sampling from π constrained to
      L(θ) > l is easy; e.g. N (c, Id ) constrained to c − θ < r.
On some computational methods for Bayesian model choice
  Nested sampling
     Importance variant



A IS variant of nested sampling

                                                   ˜
      Consider instrumental prior π and likelihood L, weight function

                                                          π(θ)L(θ)
                                             w(θ) =
                                                          π(θ)L(θ)

      and weighted NS estimator
                                               j
                                     Z=            (xi−1 − xi )ϕi w(θi ).
                                             i=1

      Then choose (π, L) so that sampling from π constrained to
      L(θ) > l is easy; e.g. N (c, Id ) constrained to c − θ < r.
On some computational methods for Bayesian model choice
  Nested sampling
     A mixture comparison



Benchmark: Target distribution




      Posterior distribution on (µ, σ) associated with the mixture

                                     pN (0, 1) + (1 − p)N (µ, σ) ,

      when p is known
On some computational methods for Bayesian model choice
  Nested sampling
     A mixture comparison



Experiment


           n observations with
           µ = 2 and σ = 3/2,
           Use of a uniform prior
           both on (−2, 6) for µ
           and on (.001, 16) for
           log σ 2 .
           occurrences of posterior
           bursts for µ = xi
           computation of the
           various estimates of Z
On some computational methods for Bayesian model choice
  Nested sampling
     A mixture comparison



Experiment (cont’d)




   MCMC sample for n = 16                                 Nested sampling sequence
   observations from the mixture.                         with M = 1000 starting points.
On some computational methods for Bayesian model choice
  Nested sampling
     A mixture comparison



Experiment (cont’d)




   MCMC sample for n = 50                                 Nested sampling sequence
   observations from the mixture.                         with M = 1000 starting points.
On some computational methods for Bayesian model choice
  Nested sampling
     A mixture comparison



Comparison


      Monte Carlo and MCMC (=Gibbs) outputs based on T = 104
      simulations and numerical integration based on a 850 × 950 grid in
      the (µ, σ) parameter space.
      Nested sampling approximation based on a starting sample of
      M = 1000 points followed by at least 103 further simulations from
      the constr’d prior and a stopping rule at 95% of the observed
      maximum likelihood.
      Constr’d prior simulation based on 50 values simulated by random
      walk accepting only steps leading to a lik’hood higher than the
      bound
On some computational methods for Bayesian model choice
  Nested sampling
     A mixture comparison



Comparison (cont’d)
                                                               q

                                                          q
                                                          q    q
                                                          q    q




                                  1.15
                                                               q    q
                                                                    q
                                                          q
                                                          q
                                                          q    q    q
                                                          q

                                                          q
                                                          q


                                  1.10
                                                          q
                                                          q    q
                                                          q    q
                                                          q    q    q
                                                               q
                                                               q    q
                                                               q    q
                                                               q
                                                               q    q
                                                               q    q
                                                                    q
                                  1.05

                                                                    q
                                  1.00
                                  0.95




                                                                    q
                                                                    q
                                                          q
                                                          q    q
                                                               q    q
                                                          q
                                                          q         q
                                                          q
                                                          q
                                                          q
                                                          q
                                  0.90




                                                          q
                                  0.85




                                                          q


                                                          q
                                                          q
                                                          q
                                            V1
                                             q            V2   V3   V4
                                                          q




      Graph based on a sample of 10 observations for µ = 2 and
      σ = 3/2 (150 replicas).
On some computational methods for Bayesian model choice
  Nested sampling
     A mixture comparison



Comparison (cont’d)



                                  1.10
                                  1.05                    q
                                                          q
                                                          q
                                  1.00




                                                          q
                                  0.95




                                                          q

                                                          q         q
                                                          q
                                  0.90




                                            V1            V2   V3   V4




      Graph based on a sample of 50 observations for µ = 2 and
      σ = 3/2 (150 replicas).
On some computational methods for Bayesian model choice
  Nested sampling
     A mixture comparison



Comparison (cont’d)



                                  1.15
                                  1.10
                                  1.05


                                                          q
                                                          q
                                                                    q
                                                               q
                                  1.00




                                                               q
                                                               q
                                                               q
                                                          q
                                  0.95




                                                          q
                                  0.90
                                  0.85




                                            V1            V2   V3   V4




      Graph based on a sample of 100 observations for µ = 2 and
      σ = 3/2 (150 replicas).
On some computational methods for Bayesian model choice
  Nested sampling
     A mixture comparison



Comparison (cont’d)



      Nested sampling gets less reliable as sample size increases
      Most reliable approach is mixture Z3 although harmonic solution
      Z1 close to Chib’s solution [taken as golden standard]
      Monte Carlo method Z2 also producing poor approximations to Z
      (Kernel φ used in Z2 is a t non-parametric kernel estimate with
      standard bandwidth estimation.)
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method



Approximate Bayesian Computation

      Bayesian setting: target is π(θ)f (x|θ)
      When likelihood f (x|θ) not in closed form, likelihood-free rejection
      technique:
      ABC algorithm
      For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly
      simulating
                           θ ∼ π(θ) , x ∼ f (x|θ ) ,
      until the auxiliary variable x is equal to the observed value, x = y.

                                                          [Pritchard et al., 1999]
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method



Approximate Bayesian Computation

      Bayesian setting: target is π(θ)f (x|θ)
      When likelihood f (x|θ) not in closed form, likelihood-free rejection
      technique:
      ABC algorithm
      For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly
      simulating
                           θ ∼ π(θ) , x ∼ f (x|θ ) ,
      until the auxiliary variable x is equal to the observed value, x = y.

                                                          [Pritchard et al., 1999]
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method



Approximate Bayesian Computation

      Bayesian setting: target is π(θ)f (x|θ)
      When likelihood f (x|θ) not in closed form, likelihood-free rejection
      technique:
      ABC algorithm
      For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly
      simulating
                           θ ∼ π(θ) , x ∼ f (x|θ ) ,
      until the auxiliary variable x is equal to the observed value, x = y.

                                                          [Pritchard et al., 1999]
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method



Population genetics example

      Tree of ancestors in a sample of genes
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method



A as approximative


      When y is a continuous random variable, equality x = y is replaced
      with a tolerance condition,

                                                     (x, y) ≤

      where is a distance between summary statistics
      Output distributed from

                            π(θ) Pθ { (x, y) < } ∝ π(θ| (x, y) < )
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method



A as approximative


      When y is a continuous random variable, equality x = y is replaced
      with a tolerance condition,

                                                     (x, y) ≤

      where is a distance between summary statistics
      Output distributed from

                            π(θ) Pθ { (x, y) < } ∝ π(θ| (x, y) < )
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method



ABC improvements


      Simulating from the prior is often poor in efficiency
      Either modify the proposal distribution on θ to increase the density
      of x’s within the vicinity of y...
           [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]

      ...or by viewing the problem as a conditional density estimation
      and by developing techniques to allow for larger
                                                  [Beaumont et al., 2002]
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method



ABC improvements


      Simulating from the prior is often poor in efficiency
      Either modify the proposal distribution on θ to increase the density
      of x’s within the vicinity of y...
           [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]

      ...or by viewing the problem as a conditional density estimation
      and by developing techniques to allow for larger
                                                  [Beaumont et al., 2002]
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method



ABC improvements


      Simulating from the prior is often poor in efficiency
      Either modify the proposal distribution on θ to increase the density
      of x’s within the vicinity of y...
           [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]

      ...or by viewing the problem as a conditional density estimation
      and by developing techniques to allow for larger
                                                  [Beaumont et al., 2002]
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar

Weitere ähnliche Inhalte

Was ist angesagt?

Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
Max Entropy
Max EntropyMax Entropy
Max Entropyjianingy
 
A Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemA Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemErika G. G.
 
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013Christian Robert
 
Principle of Maximum Entropy
Principle of Maximum EntropyPrinciple of Maximum Entropy
Principle of Maximum EntropyJiawang Liu
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes FactorsChristian Robert
 
from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABCChristian Robert
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsChristian Robert
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methodsChristian Robert
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceChristian Robert
 
Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Asma Ben Slimene
 
WSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in StatisticsWSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in StatisticsChristian Robert
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...Christian Robert
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testingChristian Robert
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsChristian Robert
 
Bayesian statistics using r intro
Bayesian statistics using r   introBayesian statistics using r   intro
Bayesian statistics using r introBayesLaplace1
 

Was ist angesagt? (20)

von Mises lecture, Berlin
von Mises lecture, Berlinvon Mises lecture, Berlin
von Mises lecture, Berlin
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
Max Entropy
Max EntropyMax Entropy
Max Entropy
 
A Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemA Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation Problem
 
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
 
Principle of Maximum Entropy
Principle of Maximum EntropyPrinciple of Maximum Entropy
Principle of Maximum Entropy
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes Factors
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABC
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methods
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 
Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...
 
WSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in StatisticsWSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in Statistics
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testing
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
 
Bayesian statistics using r intro
Bayesian statistics using r   introBayesian statistics using r   intro
Bayesian statistics using r intro
 

Ähnlich wie 4th joint Warwick Oxford Statistics Seminar

Computational tools for Bayesian model choice
Computational tools for Bayesian model choiceComputational tools for Bayesian model choice
Computational tools for Bayesian model choiceChristian Robert
 
ABC in London, May 5, 2011
ABC in London, May 5, 2011ABC in London, May 5, 2011
ABC in London, May 5, 2011Christian Robert
 
Course on Bayesian computational methods
Course on Bayesian computational methodsCourse on Bayesian computational methods
Course on Bayesian computational methodsChristian Robert
 
San Antonio short course, March 2010
San Antonio short course, March 2010San Antonio short course, March 2010
San Antonio short course, March 2010Christian Robert
 
Workshop on Bayesian Inference for Latent Gaussian Models with Applications
Workshop on Bayesian Inference for Latent Gaussian Models with ApplicationsWorkshop on Bayesian Inference for Latent Gaussian Models with Applications
Workshop on Bayesian Inference for Latent Gaussian Models with ApplicationsChristian Robert
 
Statistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityStatistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityChristian Robert
 
seminar at Princeton University
seminar at Princeton Universityseminar at Princeton University
seminar at Princeton UniversityChristian Robert
 
(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?Christian Robert
 
ABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU supportABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU supportBiogeeks
 
Rss talk for Bayes 250 by Steven Walker
Rss talk for Bayes 250 by Steven WalkerRss talk for Bayes 250 by Steven Walker
Rss talk for Bayes 250 by Steven WalkerChristian Robert
 
Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]Christian Robert
 
Elementary Probability and Information Theory
Elementary Probability and Information TheoryElementary Probability and Information Theory
Elementary Probability and Information TheoryKhalidSaghiri2
 
Bayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet ProcessBayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet ProcessAlessandro Panella
 
original
originaloriginal
originalbutest
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Non-parametric analysis of models and data
Non-parametric analysis of models and dataNon-parametric analysis of models and data
Non-parametric analysis of models and datahaharrington
 

Ähnlich wie 4th joint Warwick Oxford Statistics Seminar (20)

ABC model choice
ABC model choiceABC model choice
ABC model choice
 
Computational tools for Bayesian model choice
Computational tools for Bayesian model choiceComputational tools for Bayesian model choice
Computational tools for Bayesian model choice
 
ABC in London, May 5, 2011
ABC in London, May 5, 2011ABC in London, May 5, 2011
ABC in London, May 5, 2011
 
Course on Bayesian computational methods
Course on Bayesian computational methodsCourse on Bayesian computational methods
Course on Bayesian computational methods
 
San Antonio short course, March 2010
San Antonio short course, March 2010San Antonio short course, March 2010
San Antonio short course, March 2010
 
Workshop on Bayesian Inference for Latent Gaussian Models with Applications
Workshop on Bayesian Inference for Latent Gaussian Models with ApplicationsWorkshop on Bayesian Inference for Latent Gaussian Models with Applications
Workshop on Bayesian Inference for Latent Gaussian Models with Applications
 
Statistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityStatistics symposium talk, Harvard University
Statistics symposium talk, Harvard University
 
seminar at Princeton University
seminar at Princeton Universityseminar at Princeton University
seminar at Princeton University
 
(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?
 
ABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU supportABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU support
 
Rss talk for Bayes 250 by Steven Walker
Rss talk for Bayes 250 by Steven WalkerRss talk for Bayes 250 by Steven Walker
Rss talk for Bayes 250 by Steven Walker
 
Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]
 
Edinburgh, Bayes-250
Edinburgh, Bayes-250Edinburgh, Bayes-250
Edinburgh, Bayes-250
 
Elementary Probability and Information Theory
Elementary Probability and Information TheoryElementary Probability and Information Theory
Elementary Probability and Information Theory
 
Bayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet ProcessBayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet Process
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
 
original
originaloriginal
original
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Non-parametric analysis of models and data
Non-parametric analysis of models and dataNon-parametric analysis of models and data
Non-parametric analysis of models and data
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 

Mehr von Christian Robert

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceChristian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinChristian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihoodChristian Robert
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsChristian Robert
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distancesChristian Robert
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferenceChristian Robert
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018Christian Robert
 

Mehr von Christian Robert (20)

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conference
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
 

Kürzlich hochgeladen

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfSanaAli374401
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.MateoGardella
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 

Kürzlich hochgeladen (20)

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 

4th joint Warwick Oxford Statistics Seminar

  • 1. On some computational methods for Bayesian model choice On some computational methods for Bayesian model choice Christian P. Robert Universit´ Paris Dauphine & CREST-INSEE e http://www.ceremade.dauphine.fr/~xian 4th Warwick–Oxford Statistics Seminar Warwick, November 3, 2009 Joint works with J.-M. Marin, M. Beaumont, N. Chopin, J.-M. Cornuet, & D. Wraith
  • 2. On some computational methods for Bayesian model choice Outline 1 Introduction 2 Importance sampling solutions compared 3 Nested sampling 4 ABC model choice
  • 3. On some computational methods for Bayesian model choice Introduction Model choice Model choice as model comparison Choice between models Several models available for the same observation Mi : x ∼ fi (x|θi ), i∈I where I can be finite or infinite Replace hypotheses with models
  • 4. On some computational methods for Bayesian model choice Introduction Model choice Model choice as model comparison Choice between models Several models available for the same observation Mi : x ∼ fi (x|θi ), i∈I where I can be finite or infinite Replace hypotheses with models
  • 5. On some computational methods for Bayesian model choice Introduction Model choice Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj j Θj take largest π(Mi |x) to determine “best” model,
  • 6. On some computational methods for Bayesian model choice Introduction Model choice Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj j Θj take largest π(Mi |x) to determine “best” model,
  • 7. On some computational methods for Bayesian model choice Introduction Model choice Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj j Θj take largest π(Mi |x) to determine “best” model,
  • 8. On some computational methods for Bayesian model choice Introduction Model choice Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj j Θj take largest π(Mi |x) to determine “best” model,
  • 9. On some computational methods for Bayesian model choice Introduction Bayes factor Bayes factor Definition (Bayes factors) For testing hypotheses H0 : θ ∈ Θ0 vs. Ha : θ ∈ Θ0 , under prior π(Θ0 )π0 (θ) + π(Θc )π1 (θ) , 0 central quantity f (x|θ)π0 (θ)dθ π(Θ0 |x) π(Θ0 ) Θ0 B01 = = π(Θc |x) 0 π(Θc ) 0 f (x|θ)π1 (θ)dθ Θc 0 [Jeffreys, 1939]
  • 10. On some computational methods for Bayesian model choice Introduction Evidence Evidence Problems using a similar quantity, the evidence Zk = πk (θk )Lk (θk ) dθk , Θk aka the marginal likelihood. [Jeffreys, 1939]
  • 11. On some computational methods for Bayesian model choice Importance sampling solutions compared A comparison of importance sampling solutions 1 Introduction 2 Importance sampling solutions compared Regular importance Bridge sampling Harmonic means Mixtures to bridge Chib’s solution The Savage–Dickey ratio 3 Nested sampling 4 ABC model choice
  • 12. On some computational methods for Bayesian model choice Importance sampling solutions compared Regular importance Bayes factor approximation When approximating the Bayes factor f0 (x|θ0 )π0 (θ0 )dθ0 Θ0 B01 = f1 (x|θ1 )π1 (θ1 )dθ1 Θ1 use of importance functions 0 and 1 and n−1 0 n0 i i i=1 f0 (x|θ0 )π0 (θ0 )/ i 0 (θ0 ) B01 = n−1 1 n1 i i i=1 f1 (x|θ1 )π1 (θ1 )/ i 1 (θ1 )
  • 13. On some computational methods for Bayesian model choice Importance sampling solutions compared Regular importance Diabetes in Pima Indian women Example (R benchmark) “A population of women who were at least 21 years old, of Pima Indian heritage and living near Phoenix (AZ), was tested for diabetes according to WHO criteria. The data were collected by the US National Institute of Diabetes and Digestive and Kidney Diseases.” 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes
  • 14. On some computational methods for Bayesian model choice Importance sampling solutions compared Regular importance Probit modelling on Pima Indian women Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) , Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a g-prior modelling: β ∼ N3 (0, n XT X)−1
  • 15. On some computational methods for Bayesian model choice Importance sampling solutions compared Regular importance Probit modelling on Pima Indian women Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) , Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a g-prior modelling: β ∼ N3 (0, n XT X)−1
  • 16. On some computational methods for Bayesian model choice Importance sampling solutions compared Regular importance MCMC 101 for probit models Use of either a random walk proposal β =β+ in a Metropolis-Hastings algorithm (since the likelihood is available) or of a Gibbs sampler that takes advantage of the missing/latent variable z|y, x, β ∼ N (xT β, 1) Iy × I1−y z≥0 z≤0 (since β|y, X, z is distributed as a standard normal) [Gibbs three times faster]
  • 17. On some computational methods for Bayesian model choice Importance sampling solutions compared Regular importance MCMC 101 for probit models Use of either a random walk proposal β =β+ in a Metropolis-Hastings algorithm (since the likelihood is available) or of a Gibbs sampler that takes advantage of the missing/latent variable z|y, x, β ∼ N (xT β, 1) Iy × I1−y z≥0 z≤0 (since β|y, X, z is distributed as a standard normal) [Gibbs three times faster]
  • 18. On some computational methods for Bayesian model choice Importance sampling solutions compared Regular importance MCMC 101 for probit models Use of either a random walk proposal β =β+ in a Metropolis-Hastings algorithm (since the likelihood is available) or of a Gibbs sampler that takes advantage of the missing/latent variable z|y, x, β ∼ N (xT β, 1) Iy × I1−y z≥0 z≤0 (since β|y, X, z is distributed as a standard normal) [Gibbs three times faster]
  • 19. On some computational methods for Bayesian model choice Importance sampling solutions compared Regular importance Importance sampling for the Pima Indian dataset Use of the importance function inspired from the MLE estimate distribution ˆ ˆ β ∼ N (β, Σ) R Importance sampling code model1=summary(glm(y~-1+X1,family=binomial(link="probit"))) is1=rmvnorm(Niter,mean=model1$coeff[,1],sigma=2*model1$cov.unscaled) is2=rmvnorm(Niter,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled) bfis=mean(exp(probitlpost(is1,y,X1)-dmvlnorm(is1,mean=model1$coeff[,1], sigma=2*model1$cov.unscaled))) / mean(exp(probitlpost(is2,y,X2)- dmvlnorm(is2,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)))
  • 20. On some computational methods for Bayesian model choice Importance sampling solutions compared Regular importance Importance sampling for the Pima Indian dataset Use of the importance function inspired from the MLE estimate distribution ˆ ˆ β ∼ N (β, Σ) R Importance sampling code model1=summary(glm(y~-1+X1,family=binomial(link="probit"))) is1=rmvnorm(Niter,mean=model1$coeff[,1],sigma=2*model1$cov.unscaled) is2=rmvnorm(Niter,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled) bfis=mean(exp(probitlpost(is1,y,X1)-dmvlnorm(is1,mean=model1$coeff[,1], sigma=2*model1$cov.unscaled))) / mean(exp(probitlpost(is2,y,X2)- dmvlnorm(is2,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)))
  • 21. On some computational methods for Bayesian model choice Importance sampling solutions compared Regular importance Diabetes in Pima Indian women Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations from the prior and the above MLE importance sampler 5 4 q 3 2 Monte Carlo Importance sampling
  • 22. On some computational methods for Bayesian model choice Importance sampling solutions compared Bridge sampling Bridge sampling Special case: If π1 (θ1 |x) ∝ π1 (θ1 |x) ˜ π2 (θ2 |x) ∝ π2 (θ2 |x) ˜ live on the same space (Θ1 = Θ2 ), then n 1 π1 (θi |x) ˜ B12 ≈ θi ∼ π2 (θ|x) n π2 (θi |x) ˜ i=1 [Gelman & Meng, 1998; Chen, Shao & Ibrahim, 2000]
  • 23. On some computational methods for Bayesian model choice Importance sampling solutions compared Bridge sampling Bridge sampling variance The bridge sampling estimator does poorly if 2 var(B12 ) 1 π1 (θ) − π2 (θ) 2 ≈ E B12 n π2 (θ) is large, i.e. if π1 and π2 have little overlap...
  • 24. On some computational methods for Bayesian model choice Importance sampling solutions compared Bridge sampling Bridge sampling variance The bridge sampling estimator does poorly if 2 var(B12 ) 1 π1 (θ) − π2 (θ) 2 ≈ E B12 n π2 (θ) is large, i.e. if π1 and π2 have little overlap...
  • 25. On some computational methods for Bayesian model choice Importance sampling solutions compared Bridge sampling (Further) bridge sampling General identity: π2 (θ|x)α(θ)π1 (θ|x)dθ ˜ B12 = ∀ α(·) π1 (θ|x)α(θ)π2 (θ|x)dθ ˜ n1 1 π2 (θ1i |x)α(θ1i ) ˜ n1 i=1 ≈ n2 θji ∼ πj (θ|x) 1 π1 (θ2i |x)α(θ2i ) ˜ n2 i=1
  • 26. On some computational methods for Bayesian model choice Importance sampling solutions compared Bridge sampling Optimal bridge sampling The optimal choice of auxiliary function is n1 + n2 α = n1 π1 (θ|x) + n2 π2 (θ|x) leading to n1 1 π2 (θ1i |x) ˜ n1 n1 π1 (θ1i |x) + n2 π2 (θ1i |x) i=1 B12 ≈ n2 1 π1 (θ2i |x) ˜ n2 n1 π1 (θ2i |x) + n2 π2 (θ2i |x) i=1 Back later!
  • 27. On some computational methods for Bayesian model choice Importance sampling solutions compared Bridge sampling Optimal bridge sampling (2) Reason: Var(B12 ) 1 π1 (θ)π2 (θ)[n1 π1 (θ) + n2 π2 (θ)]α(θ)2 dθ 2 ≈ 2 −1 B12 n1 n2 π1 (θ)π2 (θ)α(θ) dθ (by the δ method) Drawback: Dependence on the unknown normalising constants solved iteratively
  • 28. On some computational methods for Bayesian model choice Importance sampling solutions compared Bridge sampling Optimal bridge sampling (2) Reason: Var(B12 ) 1 π1 (θ)π2 (θ)[n1 π1 (θ) + n2 π2 (θ)]α(θ)2 dθ 2 ≈ 2 −1 B12 n1 n2 π1 (θ)π2 (θ)α(θ) dθ (by the δ method) Drawback: Dependence on the unknown normalising constants solved iteratively
  • 29. On some computational methods for Bayesian model choice Importance sampling solutions compared Bridge sampling Extension to varying dimensions When dim(Θ1 ) = dim(Θ2 ), e.g. θ2 = (θ1 , ψ), introduction of a pseudo-posterior density, ω(ψ|θ1 , x), augmenting π1 (θ1 |x) into joint distribution π1 (θ1 |x) × ω(ψ|θ1 , x) on Θ2 so that π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ ˜ B12 = π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ ˜ π1 (θ1 )ω(ψ|θ1 ) ˜ Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)] π = Eπ2 = π2 (θ1 , ψ) ˜ Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)] π for any conditional density ω(ψ|θ1 ) and any joint density ϕ.
  • 30. On some computational methods for Bayesian model choice Importance sampling solutions compared Bridge sampling Extension to varying dimensions When dim(Θ1 ) = dim(Θ2 ), e.g. θ2 = (θ1 , ψ), introduction of a pseudo-posterior density, ω(ψ|θ1 , x), augmenting π1 (θ1 |x) into joint distribution π1 (θ1 |x) × ω(ψ|θ1 , x) on Θ2 so that π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ ˜ B12 = π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ ˜ π1 (θ1 )ω(ψ|θ1 ) ˜ Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)] π = Eπ2 = π2 (θ1 , ψ) ˜ Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)] π for any conditional density ω(ψ|θ1 ) and any joint density ϕ.
  • 31. On some computational methods for Bayesian model choice Importance sampling solutions compared Bridge sampling Illustration for the Pima Indian dataset Use of the MLE induced conditional of β3 given (β1 , β2 ) as a pseudo-posterior and mixture of both MLE approximations on β3 in bridge sampling estimate R bridge sampling code cova=model2$cov.unscaled expecta=model2$coeff[,1] covw=cova[3,3]-t(cova[1:2,3])%*%ginv(cova[1:2,1:2])%*%cova[1:2,3] probit1=hmprobit(Niter,y,X1) probit2=hmprobit(Niter,y,X2) pseudo=rnorm(Niter,meanw(probit1),sqrt(covw)) probit1p=cbind(probit1,pseudo) bfbs=mean(exp(probitlpost(probit2[,1:2],y,X1)+dnorm(probit2[,3],meanw(probit2[,1:2]), sqrt(covw),log=T))/ (dmvnorm(probit2,expecta,cova)+dnorm(probit2[,3],expecta[3], cova[3,3])))/ mean(exp(probitlpost(probit1p,y,X2))/(dmvnorm(probit1p,expecta,cova)+ dnorm(pseudo,expecta[3],cova[3,3])))
  • 32. On some computational methods for Bayesian model choice Importance sampling solutions compared Bridge sampling Illustration for the Pima Indian dataset Use of the MLE induced conditional of β3 given (β1 , β2 ) as a pseudo-posterior and mixture of both MLE approximations on β3 in bridge sampling estimate R bridge sampling code cova=model2$cov.unscaled expecta=model2$coeff[,1] covw=cova[3,3]-t(cova[1:2,3])%*%ginv(cova[1:2,1:2])%*%cova[1:2,3] probit1=hmprobit(Niter,y,X1) probit2=hmprobit(Niter,y,X2) pseudo=rnorm(Niter,meanw(probit1),sqrt(covw)) probit1p=cbind(probit1,pseudo) bfbs=mean(exp(probitlpost(probit2[,1:2],y,X1)+dnorm(probit2[,3],meanw(probit2[,1:2]), sqrt(covw),log=T))/ (dmvnorm(probit2,expecta,cova)+dnorm(probit2[,3],expecta[3], cova[3,3])))/ mean(exp(probitlpost(probit1p,y,X2))/(dmvnorm(probit1p,expecta,cova)+ dnorm(pseudo,expecta[3],cova[3,3])))
  • 33. On some computational methods for Bayesian model choice Importance sampling solutions compared Bridge sampling Diabetes in Pima Indian women (cont’d) Comparison of the variation of the Bayes factor approximations based on 100 × 20, 000 simulations from the prior (MC), the above bridge sampler and the above importance sampler 5 4 q q 3 q 2 MC Bridge IS
  • 34. On some computational methods for Bayesian model choice Importance sampling solutions compared Harmonic means The original harmonic mean estimator When θki ∼ πk (θ|x), T 1 1 T L(θkt |x) t=1 is an unbiased estimator of 1/mk (x) [Newton & Raftery, 1994] Highly dangerous: Most often leads to an infinite variance!!!
  • 35. On some computational methods for Bayesian model choice Importance sampling solutions compared Harmonic means The original harmonic mean estimator When θki ∼ πk (θ|x), T 1 1 T L(θkt |x) t=1 is an unbiased estimator of 1/mk (x) [Newton & Raftery, 1994] Highly dangerous: Most often leads to an infinite variance!!!
  • 36. On some computational methods for Bayesian model choice Importance sampling solutions compared Harmonic means “The Worst Monte Carlo Method Ever” “The good news is that the Law of Large Numbers guarantees that this estimator is consistent ie, it will very likely be very close to the correct answer if you use a sufficiently large number of points from the posterior distribution. The bad news is that the number of points required for this estimator to get close to the right answer will often be greater than the number of atoms in the observable universe. The even worse news is that it’s easy for people to not realize this, and to na¨ıvely accept estimates that are nowhere close to the correct value of the marginal likelihood.” [Radford Neal’s blog, Aug. 23, 2008]
  • 37. On some computational methods for Bayesian model choice Importance sampling solutions compared Harmonic means “The Worst Monte Carlo Method Ever” “The good news is that the Law of Large Numbers guarantees that this estimator is consistent ie, it will very likely be very close to the correct answer if you use a sufficiently large number of points from the posterior distribution. The bad news is that the number of points required for this estimator to get close to the right answer will often be greater than the number of atoms in the observable universe. The even worse news is that it’s easy for people to not realize this, and to na¨ıvely accept estimates that are nowhere close to the correct value of the marginal likelihood.” [Radford Neal’s blog, Aug. 23, 2008]
  • 38. On some computational methods for Bayesian model choice Importance sampling solutions compared Harmonic means Approximating Zk from a posterior sample Use of the [harmonic mean] identity ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1 Eπk x = dθk = πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk no matter what the proposal ϕ(·) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of the MCMC output
  • 39. On some computational methods for Bayesian model choice Importance sampling solutions compared Harmonic means Approximating Zk from a posterior sample Use of the [harmonic mean] identity ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1 Eπk x = dθk = πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk no matter what the proposal ϕ(·) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of the MCMC output
  • 40. On some computational methods for Bayesian model choice Importance sampling solutions compared Harmonic means Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than πk (θk )Lk (θk ) for the approximation T (t) 1 ϕ(θk ) Z1k = 1 (t) (t) T πk (θk )Lk (θk ) t=1 to have a finite variance. E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
  • 41. On some computational methods for Bayesian model choice Importance sampling solutions compared Harmonic means Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than πk (θk )Lk (θk ) for the approximation T (t) 1 ϕ(θk ) Z1k = 1 (t) (t) T πk (θk )Lk (θk ) t=1 to have a finite variance. E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
  • 42. On some computational methods for Bayesian model choice Importance sampling solutions compared Harmonic means Comparison with regular importance sampling (cont’d) Compare Z1k with a standard importance sampling approximation T (t) (t) 1 πk (θk )Lk (θk ) Z2k = (t) T ϕ(θk ) t=1 (t) where the θk ’s are generated from the density ϕ(·) (with fatter tails like t’s)
  • 43. On some computational methods for Bayesian model choice Importance sampling solutions compared Harmonic means HPD indicator as ϕ Use the convex hull of MCMC simulations corresponding to the 10% HPD region (easily derived!) and ϕ as indicator: 10 ϕ(θ) = Id(θ,θ(t) )≤ T t∈HPD
  • 44. On some computational methods for Bayesian model choice Importance sampling solutions compared Harmonic means Diabetes in Pima Indian women (cont’d) Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations for a simulation from the above harmonic mean sampler and importance samplers 3.102 3.104 3.106 3.108 3.110 3.112 3.114 3.116 q q Harmonic mean Importance sampling
  • 45. On some computational methods for Bayesian model choice Importance sampling solutions compared Mixtures to bridge Approximating Zk using a mixture representation Bridge sampling redux Design a specific mixture for simulation [importance sampling] purposes, with density ϕk (θk ) ∝ ω1 πk (θk )Lk (θk ) + ϕ(θk ) , where ϕ(·) is arbitrary (but normalised) Note: ω1 is not a probability weight
  • 46. On some computational methods for Bayesian model choice Importance sampling solutions compared Mixtures to bridge Approximating Zk using a mixture representation Bridge sampling redux Design a specific mixture for simulation [importance sampling] purposes, with density ϕk (θk ) ∝ ω1 πk (θk )Lk (θk ) + ϕ(θk ) , where ϕ(·) is arbitrary (but normalised) Note: ω1 is not a probability weight
  • 47. On some computational methods for Bayesian model choice Importance sampling solutions compared Mixtures to bridge Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t 1 Take δ (t) = 1 with probability (t−1) (t−1) (t−1) (t−1) (t−1) ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) and δ (t) = 2 otherwise; (t) (t−1) 2 If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated with the posterior πk (θk |x) ∝ πk (θk )Lk (θk ); (t) 3 If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
  • 48. On some computational methods for Bayesian model choice Importance sampling solutions compared Mixtures to bridge Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t 1 Take δ (t) = 1 with probability (t−1) (t−1) (t−1) (t−1) (t−1) ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) and δ (t) = 2 otherwise; (t) (t−1) 2 If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated with the posterior πk (θk |x) ∝ πk (θk )Lk (θk ); (t) 3 If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
  • 49. On some computational methods for Bayesian model choice Importance sampling solutions compared Mixtures to bridge Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t 1 Take δ (t) = 1 with probability (t−1) (t−1) (t−1) (t−1) (t−1) ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) and δ (t) = 2 otherwise; (t) (t−1) 2 If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated with the posterior πk (θk |x) ∝ πk (θk )Lk (θk ); (t) 3 If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
  • 50. On some computational methods for Bayesian model choice Importance sampling solutions compared Mixtures to bridge Evidence approximation by mixtures Rao-Blackwellised estimate T ˆ 1 ξ= (t) ω1 πk (θk )Lk (θk ) (t) (t) (t) ω1 πk (θk )Lk (θk ) + ϕ(θk ) , (t) T t=1 converges to ω1 Zk /{ω1 Zk + 1} 3k ˆ ˆ ˆ Deduce Zˆ from ω1 Z3k /{ω1 Z3k + 1} = ξ ie T (t) (t) (t) (t) (t) t=1 ω1 πk (θk )Lk (θk ) ω1 π(θk )Lk (θk ) + ϕ(θk ) ˆ Z3k = T (t) (t) (t) (t) t=1 ϕ(θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) [Bridge sampler]
  • 51. On some computational methods for Bayesian model choice Importance sampling solutions compared Mixtures to bridge Evidence approximation by mixtures Rao-Blackwellised estimate T ˆ 1 ξ= (t) ω1 πk (θk )Lk (θk ) (t) (t) (t) ω1 πk (θk )Lk (θk ) + ϕ(θk ) , (t) T t=1 converges to ω1 Zk /{ω1 Zk + 1} 3k ˆ ˆ ˆ Deduce Zˆ from ω1 Z3k /{ω1 Z3k + 1} = ξ ie T (t) (t) (t) (t) (t) t=1 ω1 πk (θk )Lk (θk ) ω1 π(θk )Lk (θk ) + ϕ(θk ) ˆ Z3k = T (t) (t) (t) (t) t=1 ϕ(θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) [Bridge sampler]
  • 52. On some computational methods for Bayesian model choice Importance sampling solutions compared Chib’s solution Chib’s representation Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and θk ∼ πk (θk ), fk (x|θk ) πk (θk ) Zk = mk (x) = πk (θk |x) Use of an approximation to the posterior ∗ ∗ fk (x|θk ) πk (θk ) Zk = mk (x) = . ˆ ∗ πk (θk |x)
  • 53. On some computational methods for Bayesian model choice Importance sampling solutions compared Chib’s solution Chib’s representation Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and θk ∼ πk (θk ), fk (x|θk ) πk (θk ) Zk = mk (x) = πk (θk |x) Use of an approximation to the posterior ∗ ∗ fk (x|θk ) πk (θk ) Zk = mk (x) = . ˆ ∗ πk (θk |x)
  • 54. On some computational methods for Bayesian model choice Importance sampling solutions compared Chib’s solution Case of latent variables For missing variable z as in mixture models, natural Rao-Blackwell estimate T ∗ 1 ∗ (t) πk (θk |x) = πk (θk |x, zk ) , T t=1 (t) where the zk ’s are Gibbs sampled latent variables
  • 55. On some computational methods for Bayesian model choice Importance sampling solutions compared Chib’s solution Label switching A mixture model [special case of missing variable model] is invariant under permutations of the indices of the components. E.g., mixtures 0.3N (0, 1) + 0.7N (2.3, 1) and 0.7N (2.3, 1) + 0.3N (0, 1) are exactly the same! c The component parameters θi are not identifiable marginally since they are exchangeable
  • 56. On some computational methods for Bayesian model choice Importance sampling solutions compared Chib’s solution Label switching A mixture model [special case of missing variable model] is invariant under permutations of the indices of the components. E.g., mixtures 0.3N (0, 1) + 0.7N (2.3, 1) and 0.7N (2.3, 1) + 0.3N (0, 1) are exactly the same! c The component parameters θi are not identifiable marginally since they are exchangeable
  • 57. On some computational methods for Bayesian model choice Importance sampling solutions compared Chib’s solution Connected difficulties 1 Number of modes of the likelihood of order O(k!): c Maximization and even [MCMC] exploration of the posterior surface harder 2 Under exchangeable priors on (θ, p) [prior invariant under permutation of the indices], all posterior marginals are identical: c Posterior expectation of θ1 equal to posterior expectation of θ2
  • 58. On some computational methods for Bayesian model choice Importance sampling solutions compared Chib’s solution Connected difficulties 1 Number of modes of the likelihood of order O(k!): c Maximization and even [MCMC] exploration of the posterior surface harder 2 Under exchangeable priors on (θ, p) [prior invariant under permutation of the indices], all posterior marginals are identical: c Posterior expectation of θ1 equal to posterior expectation of θ2
  • 59. On some computational methods for Bayesian model choice Importance sampling solutions compared Chib’s solution License Since Gibbs output does not produce exchangeability, the Gibbs sampler has not explored the whole parameter space: it lacks energy to switch simultaneously enough component allocations at once 0.2 0.3 0.4 0.5 −1 0 1 2 3 µi 0 100 200 n 300 400 500 pi −1 0 µ 1 i 2 3 0.4 0.6 0.8 1.0 0.2 0.3 0.4 0.5 σi pi 0 100 200 300 400 500 0.2 0.3 0.4 0.5 n pi 0.4 0.6 0.8 1.0 −1 0 1 2 3 σi µi 0 100 200 300 400 500 0.4 0.6 0.8 1.0 n σi
  • 60. On some computational methods for Bayesian model choice Importance sampling solutions compared Chib’s solution Label switching paradox We should observe the exchangeability of the components [label switching] to conclude about convergence of the Gibbs sampler. If we observe it, then we do not know how to estimate the parameters. If we do not, then we are uncertain about the convergence!!!
  • 61. On some computational methods for Bayesian model choice Importance sampling solutions compared Chib’s solution Label switching paradox We should observe the exchangeability of the components [label switching] to conclude about convergence of the Gibbs sampler. If we observe it, then we do not know how to estimate the parameters. If we do not, then we are uncertain about the convergence!!!
  • 62. On some computational methods for Bayesian model choice Importance sampling solutions compared Chib’s solution Label switching paradox We should observe the exchangeability of the components [label switching] to conclude about convergence of the Gibbs sampler. If we observe it, then we do not know how to estimate the parameters. If we do not, then we are uncertain about the convergence!!!
  • 63. On some computational methods for Bayesian model choice Importance sampling solutions compared Chib’s solution Compensation for label switching (t) For mixture models, zk usually fails to visit all configurations in a balanced way, despite the symmetry predicted by the theory 1 πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x) k! σ∈S for all σ’s in Sk , set of all permutations of {1, . . . , k}. Consequences on numerical approximation, biased by an order k! Recover the theoretical symmetry by using T ∗ 1 ∗ (t) πk (θk |x) = πk (σ(θk )|x, zk ) . T k! σ∈Sk t=1 [Berkhof, Mechelen, & Gelman, 2003]
  • 64. On some computational methods for Bayesian model choice Importance sampling solutions compared Chib’s solution Compensation for label switching (t) For mixture models, zk usually fails to visit all configurations in a balanced way, despite the symmetry predicted by the theory 1 πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x) k! σ∈S for all σ’s in Sk , set of all permutations of {1, . . . , k}. Consequences on numerical approximation, biased by an order k! Recover the theoretical symmetry by using T ∗ 1 ∗ (t) πk (θk |x) = πk (σ(θk )|x, zk ) . T k! σ∈Sk t=1 [Berkhof, Mechelen, & Gelman, 2003]
  • 65. On some computational methods for Bayesian model choice Importance sampling solutions compared Chib’s solution Galaxy dataset n = 82 galaxies as a mixture of k normal distributions with both mean and variance unknown. [Roeder, 1992] Average density 0.8 0.6 Relative Frequency 0.4 0.2 0.0 −2 −1 0 1 2 3 data
  • 66. On some computational methods for Bayesian model choice Importance sampling solutions compared Chib’s solution Galaxy dataset (k) ∗ Using only the original estimate, with θk as the MAP estimator, log(mk (x)) = −105.1396 ˆ for k = 3 (based on 103 simulations), while introducing the permutations leads to log(mk (x)) = −103.3479 ˆ Note that −105.1396 + log(3!) = −103.3479 k 2 3 4 5 6 7 8 mk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 Estimations of the marginal likelihoods by the symmetrised Chib’s approximation (based on 105 Gibbs iterations and, for k > 5, 100 permutations selected at random in Sk ). [Lee, Marin, Mengersen & Robert, 2008]
  • 67. On some computational methods for Bayesian model choice Importance sampling solutions compared Chib’s solution Galaxy dataset (k) ∗ Using only the original estimate, with θk as the MAP estimator, log(mk (x)) = −105.1396 ˆ for k = 3 (based on 103 simulations), while introducing the permutations leads to log(mk (x)) = −103.3479 ˆ Note that −105.1396 + log(3!) = −103.3479 k 2 3 4 5 6 7 8 mk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 Estimations of the marginal likelihoods by the symmetrised Chib’s approximation (based on 105 Gibbs iterations and, for k > 5, 100 permutations selected at random in Sk ). [Lee, Marin, Mengersen & Robert, 2008]
  • 68. On some computational methods for Bayesian model choice Importance sampling solutions compared Chib’s solution Galaxy dataset (k) ∗ Using only the original estimate, with θk as the MAP estimator, log(mk (x)) = −105.1396 ˆ for k = 3 (based on 103 simulations), while introducing the permutations leads to log(mk (x)) = −103.3479 ˆ Note that −105.1396 + log(3!) = −103.3479 k 2 3 4 5 6 7 8 mk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 Estimations of the marginal likelihoods by the symmetrised Chib’s approximation (based on 105 Gibbs iterations and, for k > 5, 100 permutations selected at random in Sk ). [Lee, Marin, Mengersen & Robert, 2008]
  • 69. On some computational methods for Bayesian model choice Importance sampling solutions compared Chib’s solution Case of the probit model For the completion by z, 1 π (θ|x) = ˆ π(θ|x, z (t) ) T t is a simple average of normal densities R Bridge sampling code gibbs1=gibbsprobit(Niter,y,X1) gibbs2=gibbsprobit(Niter,y,X2) bfchi=mean(exp(dmvlnorm(t(t(gibbs2$mu)-model2$coeff[,1]),mean=rep(0,3), sigma=gibbs2$Sigma2)-probitlpost(model2$coeff[,1],y,X2)))/ mean(exp(dmvlnorm(t(t(gibbs1$mu)-model1$coeff[,1]),mean=rep(0,2), sigma=gibbs1$Sigma2)-probitlpost(model1$coeff[,1],y,X1)))
  • 70. On some computational methods for Bayesian model choice Importance sampling solutions compared Chib’s solution Case of the probit model For the completion by z, 1 π (θ|x) = ˆ π(θ|x, z (t) ) T t is a simple average of normal densities R Bridge sampling code gibbs1=gibbsprobit(Niter,y,X1) gibbs2=gibbsprobit(Niter,y,X2) bfchi=mean(exp(dmvlnorm(t(t(gibbs2$mu)-model2$coeff[,1]),mean=rep(0,3), sigma=gibbs2$Sigma2)-probitlpost(model2$coeff[,1],y,X2)))/ mean(exp(dmvlnorm(t(t(gibbs1$mu)-model1$coeff[,1]),mean=rep(0,2), sigma=gibbs1$Sigma2)-probitlpost(model1$coeff[,1],y,X1)))
  • 71. On some computational methods for Bayesian model choice Importance sampling solutions compared Chib’s solution Diabetes in Pima Indian women (cont’d) Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations for a simulation from the above Chib’s and importance samplers q 0.0255 q q q q 0.0250 0.0245 0.0240 q q Chib's method importance sampling
  • 72. On some computational methods for Bayesian model choice Importance sampling solutions compared The Savage–Dickey ratio The Savage–Dickey ratio Special representation of the Bayes factor used for simulation Given a test H0 : θ = θ0 in a model f (x|θ, ψ) with a nuisance parameter ψ, under priors π0 (ψ) and π1 (θ, ψ) such that π1 (ψ|θ0 ) = π0 (ψ) then π1 (θ0 |x) B01 = , π1 (θ0 ) with the obvious notations π1 (θ) = π1 (θ, ψ)dψ , π1 (θ|x) = π1 (θ, ψ|x)dψ , [Dickey, 1971; Verdinelli & Wasserman, 1995]
  • 73. On some computational methods for Bayesian model choice Importance sampling solutions compared The Savage–Dickey ratio Measure-theoretic difficulty The representation depends on the choice of versions of conditional densities: π0 (ψ)f (x|θ0 , ψ) dψ B01 = [by definition] π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (ψ|θ0 )f (x|θ0 , ψ) dψ π1 (θ0 ) = [specific version of π1 (ψ|θ0 )] π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (θ0 ) π1 (θ0 , ψ)f (x|θ0 , ψ) dψ = [specific version of π1 (θ0 , ψ)] m1 (x)π1 (θ0 ) π1 (θ0 |x) = π1 (θ0 ) c Dickey’s (1971) condition is not a condition
  • 74. On some computational methods for Bayesian model choice Importance sampling solutions compared The Savage–Dickey ratio Similar measure-theoretic difficulty Verdinelli-Wasserman extension: π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ) B01 = E π1 (θ0 ) π1 (ψ|θ0 ) depends on similar choices of versions Monte Carlo implementation relies on continuous versions of all densities without making mention of it [Chen, Shao & Ibrahim, 2000]
  • 75. On some computational methods for Bayesian model choice Importance sampling solutions compared The Savage–Dickey ratio Similar measure-theoretic difficulty Verdinelli-Wasserman extension: π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ) B01 = E π1 (θ0 ) π1 (ψ|θ0 ) depends on similar choices of versions Monte Carlo implementation relies on continuous versions of all densities without making mention of it [Chen, Shao & Ibrahim, 2000]
  • 76. On some computational methods for Bayesian model choice Importance sampling solutions compared The Savage–Dickey ratio Computational implementation Starting from the (new) prior π1 (θ, ψ) = π1 (θ)π0 (ψ) ˜ define the associated posterior π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x) ˜ ˜ and impose π1 (θ0 |x) ˜ π0 (ψ)f (x|θ0 , ψ) dψ = π0 (θ0 ) m1 (x) ˜ to hold. Then π1 (θ0 |x) m1 (x) ˜ ˜ B01 = π1 (θ0 ) m1 (x)
  • 77. On some computational methods for Bayesian model choice Importance sampling solutions compared The Savage–Dickey ratio Computational implementation Starting from the (new) prior π1 (θ, ψ) = π1 (θ)π0 (ψ) ˜ define the associated posterior π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x) ˜ ˜ and impose π1 (θ0 |x) ˜ π0 (ψ)f (x|θ0 , ψ) dψ = π0 (θ0 ) m1 (x) ˜ to hold. Then π1 (θ0 |x) m1 (x) ˜ ˜ B01 = π1 (θ0 ) m1 (x)
  • 78. On some computational methods for Bayesian model choice Importance sampling solutions compared The Savage–Dickey ratio First ratio If (θ(1) , ψ (1) ), . . . , (θ(T ) , ψ (T ) ) ∼ π (θ, ψ|x), then ˜ 1 π1 (θ0 |x, ψ (t) ) ˜ T t converges to π1 (θ0 |x) (if the right version is used in θ0 ). ˜ When π1 (θ0 |x, ψ unavailable, replace with ˜ T 1 π1 (θ0 |x, z (t) , ψ (t) ) ˜ T t=1
  • 79. On some computational methods for Bayesian model choice Importance sampling solutions compared The Savage–Dickey ratio First ratio If (θ(1) , ψ (1) ), . . . , (θ(T ) , ψ (T ) ) ∼ π (θ, ψ|x), then ˜ 1 π1 (θ0 |x, ψ (t) ) ˜ T t converges to π1 (θ0 |x) (if the right version is used in θ0 ). ˜ When π1 (θ0 |x, ψ unavailable, replace with ˜ T 1 π1 (θ0 |x, z (t) , ψ (t) ) ˜ T t=1
  • 80. On some computational methods for Bayesian model choice Importance sampling solutions compared The Savage–Dickey ratio Bridge revival (1) Since m1 (x)/m1 (x) is unknown, apparent failure! ˜ Use of the identity π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) Eπ1 (θ,ψ|x) ˜ = Eπ1 (θ,ψ|x) ˜ = π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ to (biasedly) estimate m1 (x)/m1 (x) by ˜ T π1 (ψ (t) |θ(t) ) T t=1 π0 (ψ (t) ) based on the same sample from π1 . ˜
  • 81. On some computational methods for Bayesian model choice Importance sampling solutions compared The Savage–Dickey ratio Bridge revival (1) Since m1 (x)/m1 (x) is unknown, apparent failure! ˜ Use of the identity π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) Eπ1 (θ,ψ|x) ˜ = Eπ1 (θ,ψ|x) ˜ = π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ to (biasedly) estimate m1 (x)/m1 (x) by ˜ T π1 (ψ (t) |θ(t) ) T t=1 π0 (ψ (t) ) based on the same sample from π1 . ˜
  • 82. On some computational methods for Bayesian model choice Importance sampling solutions compared The Savage–Dickey ratio Bridge revival (2) Alternative identity π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ Eπ1 (θ,ψ|x) = Eπ1 (θ,ψ|x) = π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) ¯ ¯ suggests using a second sample (θ(1) , ψ (1) , z (1) ), . . . , ¯(T ) , ψ (T ) , z (T ) ) ∼ π1 (θ, ψ|x) and (θ ¯ T ¯ 1 π0 (ψ (t) ) T ¯ ¯ π1 (ψ (t) |θ(t) ) t=1 Resulting estimate: (t) , ψ (t) ) T ¯ 1 t π1 (θ0 |x, z ˜ 1 π0 (ψ (t) ) B01 = T π1 (θ0 ) T t=1 π1 (ψ ¯ ¯ (t) |θ (t) )
  • 83. On some computational methods for Bayesian model choice Importance sampling solutions compared The Savage–Dickey ratio Bridge revival (2) Alternative identity π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ Eπ1 (θ,ψ|x) = Eπ1 (θ,ψ|x) = π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) ¯ ¯ suggests using a second sample (θ(1) , ψ (1) , z (1) ), . . . , ¯(T ) , ψ (T ) , z (T ) ) ∼ π1 (θ, ψ|x) and (θ ¯ T ¯ 1 π0 (ψ (t) ) T ¯ ¯ π1 (ψ (t) |θ(t) ) t=1 Resulting estimate: (t) , ψ (t) ) T ¯ 1 t π1 (θ0 |x, z ˜ 1 π0 (ψ (t) ) B01 = T π1 (θ0 ) T t=1 π1 (ψ ¯ ¯ (t) |θ (t) )
  • 84. On some computational methods for Bayesian model choice Importance sampling solutions compared The Savage–Dickey ratio Diabetes in Pima Indian women (cont’d) Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations for a simulation from the above importance, Chib’s, Savage–Dickey’s and bridge samplers q 3.4 3.2 3.0 2.8 q IS Chib Savage−Dickey Bridge
  • 85. On some computational methods for Bayesian model choice Nested sampling Purpose Nested sampling: Goal Skilling’s (2007) technique using the one-dimensional representation: 1 Z = Eπ [L(θ)] = ϕ(x) dx 0 with ϕ−1 (l) = P π (L(θ) > l). Note; ϕ(·) is intractable in most cases.
  • 86. On some computational methods for Bayesian model choice Nested sampling Implementation Nested sampling: First approximation Approximate Z by a Riemann sum: j Z= (xi−1 − xi )ϕ(xi ) i=1 where the xi ’s are either: deterministic: xi = e−i/N or random: x0 = 1, xi+1 = ti xi , ti ∼ Be(N, 1) so that E[log xi ] = −i/N .
  • 87. On some computational methods for Bayesian model choice Nested sampling Implementation Extraneous white noise Take 1 −(1−δ)θ −δθ 1 −(1−δ)θ Z= e−θ dθ = e e = Eδ e δ δ N 1 ˆ Z= δ −1 e−(1−δ)θi (xi−1 − xi ) , θi ∼ E(δ) I(θi ≤ θi−1 ) N i=1 N deterministic random 50 4.64 10.5 4.65 10.5 100 2.47 4.9 Comparison of variances and MSEs 2.48 5.02 500 .549 1.01 .550 1.14
  • 88. On some computational methods for Bayesian model choice Nested sampling Implementation Extraneous white noise Take 1 −(1−δ)θ −δθ 1 −(1−δ)θ Z= e−θ dθ = e e = Eδ e δ δ N 1 ˆ Z= δ −1 e−(1−δ)θi (xi−1 − xi ) , θi ∼ E(δ) I(θi ≤ θi−1 ) N i=1 N deterministic random 50 4.64 10.5 4.65 10.5 100 2.47 4.9 Comparison of variances and MSEs 2.48 5.02 500 .549 1.01 .550 1.14
  • 89. On some computational methods for Bayesian model choice Nested sampling Implementation Extraneous white noise Take 1 −(1−δ)θ −δθ 1 −(1−δ)θ Z= e−θ dθ = e e = Eδ e δ δ N 1 ˆ Z= δ −1 e−(1−δ)θi (xi−1 − xi ) , θi ∼ E(δ) I(θi ≤ θi−1 ) N i=1 N deterministic random 50 4.64 10.5 4.65 10.5 100 2.47 4.9 Comparison of variances and MSEs 2.48 5.02 500 .549 1.01 .550 1.14
  • 90. On some computational methods for Bayesian model choice Nested sampling Implementation Nested sampling: Second approximation Replace (intractable) ϕ(xi ) by ϕi , obtained by Nested sampling Start with N values θ1 , . . . , θN sampled from π At iteration i, 1 Take ϕi = L(θk ), where θk is the point with smallest likelihood in the pool of θi ’s 2 Replace θk with a sample from the prior constrained to L(θ) > ϕi : the current N points are sampled from prior constrained to L(θ) > ϕi .
  • 91. On some computational methods for Bayesian model choice Nested sampling Implementation Nested sampling: Second approximation Replace (intractable) ϕ(xi ) by ϕi , obtained by Nested sampling Start with N values θ1 , . . . , θN sampled from π At iteration i, 1 Take ϕi = L(θk ), where θk is the point with smallest likelihood in the pool of θi ’s 2 Replace θk with a sample from the prior constrained to L(θ) > ϕi : the current N points are sampled from prior constrained to L(θ) > ϕi .
  • 92. On some computational methods for Bayesian model choice Nested sampling Implementation Nested sampling: Second approximation Replace (intractable) ϕ(xi ) by ϕi , obtained by Nested sampling Start with N values θ1 , . . . , θN sampled from π At iteration i, 1 Take ϕi = L(θk ), where θk is the point with smallest likelihood in the pool of θi ’s 2 Replace θk with a sample from the prior constrained to L(θ) > ϕi : the current N points are sampled from prior constrained to L(θ) > ϕi .
  • 93. On some computational methods for Bayesian model choice Nested sampling Implementation Nested sampling: Third approximation Iterate the above steps until a given stopping iteration j is reached: e.g., observe very small changes in the approximation Z; reach the maximal value of L(θ) when the likelihood is bounded and its maximum is known; truncate the integral Z at level , i.e. replace 1 1 ϕ(x) dx with ϕ(x) dx 0
  • 94. On some computational methods for Bayesian model choice Nested sampling Error rates Approximation error Error = Z − Z j 1 = (xi−1 − xi )ϕi − ϕ(x) dx = − ϕ(x) dx i=1 0 0 j 1 + (xi−1 − xi )ϕ(xi ) − ϕ(x) dx (Quadrature Error) i=1 j + (xi−1 − xi ) {ϕi − ϕ(xi )} (Stochastic Error) i=1 [Dominated by Monte Carlo!]
  • 95. On some computational methods for Bayesian model choice Nested sampling Error rates A CLT for the Stochastic Error The (dominating) stochastic error is OP (N −1/2 ): D N 1/2 {Stochastic Error} → N (0, V ) with V =− sϕ (s)tϕ (t) log(s ∨ t) ds dt. s,t∈[ ,1] [Proof based on Donsker’s theorem] The number of simulated points equals the number of iterations j, and is a multiple of N : if one stops at first iteration j such that e−j/N < , then: j = N − log .
  • 96. On some computational methods for Bayesian model choice Nested sampling Error rates A CLT for the Stochastic Error The (dominating) stochastic error is OP (N −1/2 ): D N 1/2 {Stochastic Error} → N (0, V ) with V =− sϕ (s)tϕ (t) log(s ∨ t) ds dt. s,t∈[ ,1] [Proof based on Donsker’s theorem] The number of simulated points equals the number of iterations j, and is a multiple of N : if one stops at first iteration j such that e−j/N < , then: j = N − log .
  • 97. On some computational methods for Bayesian model choice Nested sampling Impact of dimension Curse of dimension For a simple Gaussian-Gaussian model of dimension dim(θ) = d, the following 3 quantities are O(d): 1 asymptotic variance of the NS estimator; 2 number of iterations (necessary to reach a given truncation error); 3 cost of one simulated sample. Therefore, CPU time necessary for achieving error level e is O(d3 /e2 )
  • 98. On some computational methods for Bayesian model choice Nested sampling Impact of dimension Curse of dimension For a simple Gaussian-Gaussian model of dimension dim(θ) = d, the following 3 quantities are O(d): 1 asymptotic variance of the NS estimator; 2 number of iterations (necessary to reach a given truncation error); 3 cost of one simulated sample. Therefore, CPU time necessary for achieving error level e is O(d3 /e2 )
  • 99. On some computational methods for Bayesian model choice Nested sampling Impact of dimension Curse of dimension For a simple Gaussian-Gaussian model of dimension dim(θ) = d, the following 3 quantities are O(d): 1 asymptotic variance of the NS estimator; 2 number of iterations (necessary to reach a given truncation error); 3 cost of one simulated sample. Therefore, CPU time necessary for achieving error level e is O(d3 /e2 )
  • 100. On some computational methods for Bayesian model choice Nested sampling Impact of dimension Curse of dimension For a simple Gaussian-Gaussian model of dimension dim(θ) = d, the following 3 quantities are O(d): 1 asymptotic variance of the NS estimator; 2 number of iterations (necessary to reach a given truncation error); 3 cost of one simulated sample. Therefore, CPU time necessary for achieving error level e is O(d3 /e2 )
  • 101. On some computational methods for Bayesian model choice Nested sampling Constraints Sampling from constr’d priors Exact simulation from the constrained prior is intractable in most cases! Skilling (2007) proposes to use MCMC, but: this introduces a bias (stopping rule). if MCMC stationary distribution is unconst’d prior, more and more difficult to sample points such that L(θ) > l as l increases. If implementable, then slice sampler can be devised at the same cost! [Thanks, Gareth!]
  • 102. On some computational methods for Bayesian model choice Nested sampling Constraints Sampling from constr’d priors Exact simulation from the constrained prior is intractable in most cases! Skilling (2007) proposes to use MCMC, but: this introduces a bias (stopping rule). if MCMC stationary distribution is unconst’d prior, more and more difficult to sample points such that L(θ) > l as l increases. If implementable, then slice sampler can be devised at the same cost! [Thanks, Gareth!]
  • 103. On some computational methods for Bayesian model choice Nested sampling Constraints Sampling from constr’d priors Exact simulation from the constrained prior is intractable in most cases! Skilling (2007) proposes to use MCMC, but: this introduces a bias (stopping rule). if MCMC stationary distribution is unconst’d prior, more and more difficult to sample points such that L(θ) > l as l increases. If implementable, then slice sampler can be devised at the same cost! [Thanks, Gareth!]
  • 104. On some computational methods for Bayesian model choice Nested sampling Importance variant A IS variant of nested sampling ˜ Consider instrumental prior π and likelihood L, weight function π(θ)L(θ) w(θ) = π(θ)L(θ) and weighted NS estimator j Z= (xi−1 − xi )ϕi w(θi ). i=1 Then choose (π, L) so that sampling from π constrained to L(θ) > l is easy; e.g. N (c, Id ) constrained to c − θ < r.
  • 105. On some computational methods for Bayesian model choice Nested sampling Importance variant A IS variant of nested sampling ˜ Consider instrumental prior π and likelihood L, weight function π(θ)L(θ) w(θ) = π(θ)L(θ) and weighted NS estimator j Z= (xi−1 − xi )ϕi w(θi ). i=1 Then choose (π, L) so that sampling from π constrained to L(θ) > l is easy; e.g. N (c, Id ) constrained to c − θ < r.
  • 106. On some computational methods for Bayesian model choice Nested sampling A mixture comparison Benchmark: Target distribution Posterior distribution on (µ, σ) associated with the mixture pN (0, 1) + (1 − p)N (µ, σ) , when p is known
  • 107. On some computational methods for Bayesian model choice Nested sampling A mixture comparison Experiment n observations with µ = 2 and σ = 3/2, Use of a uniform prior both on (−2, 6) for µ and on (.001, 16) for log σ 2 . occurrences of posterior bursts for µ = xi computation of the various estimates of Z
  • 108. On some computational methods for Bayesian model choice Nested sampling A mixture comparison Experiment (cont’d) MCMC sample for n = 16 Nested sampling sequence observations from the mixture. with M = 1000 starting points.
  • 109. On some computational methods for Bayesian model choice Nested sampling A mixture comparison Experiment (cont’d) MCMC sample for n = 50 Nested sampling sequence observations from the mixture. with M = 1000 starting points.
  • 110. On some computational methods for Bayesian model choice Nested sampling A mixture comparison Comparison Monte Carlo and MCMC (=Gibbs) outputs based on T = 104 simulations and numerical integration based on a 850 × 950 grid in the (µ, σ) parameter space. Nested sampling approximation based on a starting sample of M = 1000 points followed by at least 103 further simulations from the constr’d prior and a stopping rule at 95% of the observed maximum likelihood. Constr’d prior simulation based on 50 values simulated by random walk accepting only steps leading to a lik’hood higher than the bound
  • 111. On some computational methods for Bayesian model choice Nested sampling A mixture comparison Comparison (cont’d) q q q q q q 1.15 q q q q q q q q q q q 1.10 q q q q q q q q q q q q q q q q q q q 1.05 q 1.00 0.95 q q q q q q q q q q q q q q 0.90 q 0.85 q q q q V1 q V2 V3 V4 q Graph based on a sample of 10 observations for µ = 2 and σ = 3/2 (150 replicas).
  • 112. On some computational methods for Bayesian model choice Nested sampling A mixture comparison Comparison (cont’d) 1.10 1.05 q q q 1.00 q 0.95 q q q q 0.90 V1 V2 V3 V4 Graph based on a sample of 50 observations for µ = 2 and σ = 3/2 (150 replicas).
  • 113. On some computational methods for Bayesian model choice Nested sampling A mixture comparison Comparison (cont’d) 1.15 1.10 1.05 q q q q 1.00 q q q q 0.95 q 0.90 0.85 V1 V2 V3 V4 Graph based on a sample of 100 observations for µ = 2 and σ = 3/2 (150 replicas).
  • 114. On some computational methods for Bayesian model choice Nested sampling A mixture comparison Comparison (cont’d) Nested sampling gets less reliable as sample size increases Most reliable approach is mixture Z3 although harmonic solution Z1 close to Chib’s solution [taken as golden standard] Monte Carlo method Z2 also producing poor approximations to Z (Kernel φ used in Z2 is a t non-parametric kernel estimate with standard bandwidth estimation.)
  • 115. On some computational methods for Bayesian model choice ABC model choice ABC method Approximate Bayesian Computation Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique: ABC algorithm For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly simulating θ ∼ π(θ) , x ∼ f (x|θ ) , until the auxiliary variable x is equal to the observed value, x = y. [Pritchard et al., 1999]
  • 116. On some computational methods for Bayesian model choice ABC model choice ABC method Approximate Bayesian Computation Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique: ABC algorithm For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly simulating θ ∼ π(θ) , x ∼ f (x|θ ) , until the auxiliary variable x is equal to the observed value, x = y. [Pritchard et al., 1999]
  • 117. On some computational methods for Bayesian model choice ABC model choice ABC method Approximate Bayesian Computation Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique: ABC algorithm For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly simulating θ ∼ π(θ) , x ∼ f (x|θ ) , until the auxiliary variable x is equal to the observed value, x = y. [Pritchard et al., 1999]
  • 118. On some computational methods for Bayesian model choice ABC model choice ABC method Population genetics example Tree of ancestors in a sample of genes
  • 119. On some computational methods for Bayesian model choice ABC model choice ABC method A as approximative When y is a continuous random variable, equality x = y is replaced with a tolerance condition, (x, y) ≤ where is a distance between summary statistics Output distributed from π(θ) Pθ { (x, y) < } ∝ π(θ| (x, y) < )
  • 120. On some computational methods for Bayesian model choice ABC model choice ABC method A as approximative When y is a continuous random variable, equality x = y is replaced with a tolerance condition, (x, y) ≤ where is a distance between summary statistics Output distributed from π(θ) Pθ { (x, y) < } ∝ π(θ| (x, y) < )
  • 121. On some computational methods for Bayesian model choice ABC model choice ABC method ABC improvements Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger [Beaumont et al., 2002]
  • 122. On some computational methods for Bayesian model choice ABC model choice ABC method ABC improvements Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger [Beaumont et al., 2002]
  • 123. On some computational methods for Bayesian model choice ABC model choice ABC method ABC improvements Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger [Beaumont et al., 2002]