SlideShare ist ein Scribd-Unternehmen logo
1 von 204
Downloaden Sie, um offline zu lesen
On some computational methods for Bayesian model choice




             On some computational methods for Bayesian
                          model choice

                                            Christian P. Robert

                               CREST-INSEE and Universit´ Paris Dauphine
                                                        e
                               http://www.ceremade.dauphine.fr/~xian


                 c Cours OFPR, CREST, Malakoff 2-12 mars 2009
On some computational methods for Bayesian model choice




Outline


           Introduction
      1


           Importance sampling solutions
      2


           Cross-model solutions
      3


           Nested sampling
      4


           ABC model choice
      5
On some computational methods for Bayesian model choice
  Introduction
     Bayes tests



Construction of Bayes tests



      Definition (Test)
      Given an hypothesis H0 : θ ∈ Θ0 on the parameter θ ∈ Θ0 of a
      statistical model, a test is a statistical procedure that takes its
      values in {0, 1}.

      Example (Normal mean)
      For x ∼ N (θ, 1), decide whether or not θ ≤ 0.
On some computational methods for Bayesian model choice
  Introduction
     Bayes tests



Construction of Bayes tests



      Definition (Test)
      Given an hypothesis H0 : θ ∈ Θ0 on the parameter θ ∈ Θ0 of a
      statistical model, a test is a statistical procedure that takes its
      values in {0, 1}.

      Example (Normal mean)
      For x ∼ N (θ, 1), decide whether or not θ ≤ 0.
On some computational methods for Bayesian model choice
  Introduction
     Bayes tests



The 0 − 1 loss

      Neyman–Pearson loss for testing hypotheses
      Test of H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ0 .
      Then
                                 D = {0, 1}


      The 0 − 1 loss
                                                      1 − d if θ ∈ Θ0
                                    L(θ, d) =
                                                      d     otherwise,
On some computational methods for Bayesian model choice
  Introduction
     Bayes tests



The 0 − 1 loss

      Neyman–Pearson loss for testing hypotheses
      Test of H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ0 .
      Then
                                 D = {0, 1}


      The 0 − 1 loss
                                                      1 − d if θ ∈ Θ0
                                    L(θ, d) =
                                                      d     otherwise,
On some computational methods for Bayesian model choice
  Introduction
     Bayes tests



Type–one and type–two errors
      Associated with the risk

                           R(θ, δ) = Eθ [L(θ, δ(x))]
                                                                   if θ ∈ Θ0 ,
                                                   Pθ (δ(x) = 0)
                                          =
                                                   Pθ (δ(x) = 1)   otherwise,


      Theorem (Bayes test)
      The Bayes estimator associated with π and with the 0 − 1 loss is

                                              if π(θ ∈ Θ0 |x) > π(θ ∈ Θ0 |x),
                                         1
                        δ π (x) =
                                         0    otherwise,
On some computational methods for Bayesian model choice
  Introduction
     Bayes tests



Type–one and type–two errors
      Associated with the risk

                           R(θ, δ) = Eθ [L(θ, δ(x))]
                                                                   if θ ∈ Θ0 ,
                                                   Pθ (δ(x) = 0)
                                          =
                                                   Pθ (δ(x) = 1)   otherwise,


      Theorem (Bayes test)
      The Bayes estimator associated with π and with the 0 − 1 loss is

                                              if π(θ ∈ Θ0 |x) > π(θ ∈ Θ0 |x),
                                         1
                        δ π (x) =
                                         0    otherwise,
On some computational methods for Bayesian model choice
  Introduction
     Bayes factor



Bayes factor

      Definition (Bayes factors)
      For testing hypotheses H0 : θ ∈ Θ0 vs. Ha : θ ∈ Θ0 , under prior

                                      π(Θ0 )π0 (θ) + π(Θc )π1 (θ) ,
                                                        0

      central quantity

                                                                   f (x|θ)π0 (θ)dθ
                               π(Θ0 |x)            π(Θ0 )     Θ0
                      B01    =                            =
                               π(Θc |x)            π(Θc )
                                  0                   0            f (x|θ)π1 (θ)dθ
                                                              Θc
                                                               0


                                                                            [Jeffreys, 1939]
On some computational methods for Bayesian model choice
  Introduction
     Bayes factor



Self-contained concept

      Outside decision-theoretic environment:
                 eliminates impact of π(Θ0 ) but depends on the choice of
                 (π0 , π1 )
                 Bayesian/marginal equivalent to the likelihood ratio
                 Jeffreys’ scale of evidence:
                                  π
                     if   log10 (B10 )   between 0 and 0.5, evidence against H0 weak,
                                  π
                     if   log10 (B10 )   0.5 and 1, evidence substantial,
                                  π
                     if   log10 (B10 )   1 and 2, evidence strong and
                                  π
                     if   log10 (B10 )   above 2, evidence decisive
                 Requires the computation of the marginal/evidence under
                 both hypotheses/models
On some computational methods for Bayesian model choice
  Introduction
     Bayes factor



Hot hand
      Example (Binomial homogeneity)
      Consider H0 : yi ∼ B(ni , p) (i = 1, . . . , G) vs. H1 : yi ∼ B(ni , pi ).
      Conjugate priors pi ∼ Be(α = ξ/ω, β = (1 − ξ)/ω), with a uniform
      prior on E[pi |ξ, ω] = ξ and on p (ω is fixed)

                                  1G            1
                                                    pyi (1 − pi )ni −yi pα−1 (1 − pi )β−1 d pi
                 B10 =                               i                   i
                                0 i=1       0

                                  ×Γ(1/ω)/[Γ(ξ/ω)Γ((1 − ξ)/ω)]dξ
                                         1
                                           P                    P
                                                                    i (ni −yi )
                                             i yi (1       − p)
                                         0p                                       dp

      For instance, log10 (B10 ) = −0.79 for ω = 0.005 and G = 138
      slightly favours H0 .
                                                   [Kass & Raftery, 1995]
On some computational methods for Bayesian model choice
  Introduction
     Bayes factor



Hot hand
      Example (Binomial homogeneity)
      Consider H0 : yi ∼ B(ni , p) (i = 1, . . . , G) vs. H1 : yi ∼ B(ni , pi ).
      Conjugate priors pi ∼ Be(α = ξ/ω, β = (1 − ξ)/ω), with a uniform
      prior on E[pi |ξ, ω] = ξ and on p (ω is fixed)

                                  1G            1
                                                    pyi (1 − pi )ni −yi pα−1 (1 − pi )β−1 d pi
                 B10 =                               i                   i
                                0 i=1       0

                                  ×Γ(1/ω)/[Γ(ξ/ω)Γ((1 − ξ)/ω)]dξ
                                         1
                                           P                    P
                                                                    i (ni −yi )
                                             i yi (1       − p)
                                         0p                                       dp

      For instance, log10 (B10 ) = −0.79 for ω = 0.005 and G = 138
      slightly favours H0 .
                                                   [Kass & Raftery, 1995]
On some computational methods for Bayesian model choice
  Introduction
     Bayes factor



Hot hand
      Example (Binomial homogeneity)
      Consider H0 : yi ∼ B(ni , p) (i = 1, . . . , G) vs. H1 : yi ∼ B(ni , pi ).
      Conjugate priors pi ∼ Be(α = ξ/ω, β = (1 − ξ)/ω), with a uniform
      prior on E[pi |ξ, ω] = ξ and on p (ω is fixed)

                                  1G            1
                                                    pyi (1 − pi )ni −yi pα−1 (1 − pi )β−1 d pi
                 B10 =                               i                   i
                                0 i=1       0

                                  ×Γ(1/ω)/[Γ(ξ/ω)Γ((1 − ξ)/ω)]dξ
                                         1
                                           P                    P
                                                                    i (ni −yi )
                                             i yi (1       − p)
                                         0p                                       dp

      For instance, log10 (B10 ) = −0.79 for ω = 0.005 and G = 138
      slightly favours H0 .
                                                   [Kass & Raftery, 1995]
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Model choice and model comparison



      Choice between models
      Several models available for the same observation

                                    Mi : x ∼ fi (x|θi ),   i∈I

      where I can be finite or infinite
      Replace hypotheses with models but keep marginal likelihoods and
      Bayes factors
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Model choice and model comparison



      Choice between models
      Several models available for the same observation

                                    Mi : x ∼ fi (x|θi ),   i∈I

      where I can be finite or infinite
      Replace hypotheses with models but keep marginal likelihoods and
      Bayes factors
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Bayesian model choice
      Probabilise the entire model/parameter space
          allocate probabilities pi to all models Mi
          define priors πi (θi ) for each parameter space Θi
          compute

                                                        pi          fi (x|θi )πi (θi )dθi
                                                               Θi
                               π(Mi |x) =
                                                          pj         fj (x|θj )πj (θj )dθj
                                                                Θj
                                                    j


                 take largest π(Mi |x) to determine “best” model,
                 or use averaged predictive

                                        π(Mj |x)               fj (x |θj )πj (θj |x)dθj
                                                          Θj
                                    j
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Bayesian model choice
      Probabilise the entire model/parameter space
          allocate probabilities pi to all models Mi
          define priors πi (θi ) for each parameter space Θi
          compute

                                                        pi          fi (x|θi )πi (θi )dθi
                                                               Θi
                               π(Mi |x) =
                                                          pj         fj (x|θj )πj (θj )dθj
                                                                Θj
                                                    j


                 take largest π(Mi |x) to determine “best” model,
                 or use averaged predictive

                                        π(Mj |x)               fj (x |θj )πj (θj |x)dθj
                                                          Θj
                                    j
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Bayesian model choice
      Probabilise the entire model/parameter space
          allocate probabilities pi to all models Mi
          define priors πi (θi ) for each parameter space Θi
          compute

                                                        pi          fi (x|θi )πi (θi )dθi
                                                               Θi
                               π(Mi |x) =
                                                          pj         fj (x|θj )πj (θj )dθj
                                                                Θj
                                                    j


                 take largest π(Mi |x) to determine “best” model,
                 or use averaged predictive

                                        π(Mj |x)               fj (x |θj )πj (θj |x)dθj
                                                          Θj
                                    j
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Bayesian model choice
      Probabilise the entire model/parameter space
          allocate probabilities pi to all models Mi
          define priors πi (θi ) for each parameter space Θi
          compute

                                                        pi          fi (x|θi )πi (θi )dθi
                                                               Θi
                               π(Mi |x) =
                                                          pj         fj (x|θj )πj (θj )dθj
                                                                Θj
                                                    j


                 take largest π(Mi |x) to determine “best” model,
                 or use averaged predictive

                                        π(Mj |x)               fj (x |θj )πj (θj |x)dθj
                                                          Θj
                                    j
On some computational methods for Bayesian model choice
  Introduction
     Evidence



Evidence



      All these problems end up with a similar quantity, the evidence

                                     Zk =            πk (θk )Lk (θk ) dθk ,
                                                Θk

      aka the marginal likelihood.
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Importance sampling


      Paradox
      Simulation from f (the true density) is not necessarily optimal

      Alternative to direct sampling from f is importance sampling,
      based on the alternative representation

                                                                 f (x)
                            Ef [h(X)] =                   h(x)           g(x) dx .
                                                                 g(x)
                                                  X

      which allows us to use other distributions than f
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Importance sampling


      Paradox
      Simulation from f (the true density) is not necessarily optimal

      Alternative to direct sampling from f is importance sampling,
      based on the alternative representation

                                                                 f (x)
                            Ef [h(X)] =                   h(x)           g(x) dx .
                                                                 g(x)
                                                  X

      which allows us to use other distributions than f
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Importance sampling algorithm

          Evaluation of

                                  Ef [h(X)] =              h(x) f (x) dx
                                                      X

          by
                  Generate a sample X1 , . . . , Xn from a distribution g
             1


                  Use the approximation
             2


                                                 m
                                                          f (Xj )
                                          1
                                                                  h(Xj )
                                          m               g(Xj )
                                                j=1
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Bayes factor approximation

      When approximating the Bayes factor

                                                      f0 (x|θ0 )π0 (θ0 )dθ0
                                                 Θ0
                                    B01 =
                                                      f1 (x|θ1 )π1 (θ1 )dθ1
                                                 Θ1

      use of importance functions                         and       and
                                                      0         1

                                       n−1       n0         i       i             i
                                                 i=1 f0 (x|θ0 )π0 (θ0 )/      0 (θ0 )
                                        0
                           B01 =
                                       n−1       n1         i       i             i
                                                 i=1 f1 (x|θ1 )π1 (θ1 )/      1 (θ1 )
                                        1
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Bridge sampling


      Special case:
      If
                                           π1 (θ1 |x) ∝ π1 (θ1 |x)
                                                        ˜
                                           π2 (θ2 |x) ∝ π2 (θ2 |x)
                                                        ˜
      live on the same space (Θ1 = Θ2 ), then
                                            n
                                                 π1 (θi |x)
                                       1         ˜
                           B12 ≈                              θi ∼ π2 (θ|x)
                                                 π2 (θi |x)
                                       n         ˜
                                           i=1

                           [Gelman & Meng, 1998; Chen, Shao & Ibrahim, 2000]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Bridge sampling variance



      The bridge sampling estimator does poorly if
                                                                            2
                                                          π1 (θ) − π2 (θ)
                              var(B12 )  1
                                        =E
                                  2      n                     π2 (θ)
                                B12

      is large, i.e. if π1 and π2 have little overlap...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Bridge sampling variance



      The bridge sampling estimator does poorly if
                                                                            2
                                                          π1 (θ) − π2 (θ)
                              var(B12 )  1
                                        =E
                                  2      n                     π2 (θ)
                                B12

      is large, i.e. if π1 and π2 have little overlap...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



(Further) bridge sampling

      In addition

                                       π2 (θ|x)α(θ)π1 (θ|x)dθ
                                       ˜
                                                                     ∀ α(·)
                  B12 =
                                       π1 (θ|x)α(θ)π2 (θ|x)dθ
                                       ˜

                                        n1
                                  1
                                              π2 (θ1i |x)α(θ1i )
                                              ˜
                                  n1
                                        i=1
                           ≈                                       θji ∼ πj (θ|x)
                                         n2
                                  1
                                              π1 (θ2i |x)α(θ2i )
                                              ˜
                                  n2
                                        i=1
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



An infamous example
      When
                                                               1
                                              α(θ) =
                                                          π1 (θ)˜2 (θ)
                                                          ˜     π
      harmonic mean approximation to B12
                                         n1
                                   1
                                              1/˜ (θ1i |x)
                                                π
                                   n1
                                        i=1         1
                                                                         θji ∼ πj (θ|x)
                          B12 =          n2
                                   1
                                               1/˜2 (θ2i |x)
                                                 π
                                   n2
                                        i=1

                                               [Newton & Raftery, 1994]
      Infamous: Most often leads to an infinite variance!!!
                                             [Radford Neal’s blog, 2008]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



An infamous example
      When
                                                               1
                                              α(θ) =
                                                          π1 (θ)˜2 (θ)
                                                          ˜     π
      harmonic mean approximation to B12
                                         n1
                                   1
                                              1/˜ (θ1i |x)
                                                π
                                   n1
                                        i=1         1
                                                                         θji ∼ πj (θ|x)
                          B12 =          n2
                                   1
                                               1/˜2 (θ2i |x)
                                                 π
                                   n2
                                        i=1

                                               [Newton & Raftery, 1994]
      Infamous: Most often leads to an infinite variance!!!
                                             [Radford Neal’s blog, 2008]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



“The Worst Monte Carlo Method Ever”

      “The good news is that the Law of Large Numbers guarantees that
      this estimator is consistent ie, it will very likely be very close to the
      correct answer if you use a sufficiently large number of points from
      the posterior distribution.
      The bad news is that the number of points required for this
      estimator to get close to the right answer will often be greater
      than the number of atoms in the observable universe. The even
      worse news is that itws easy for people to not realize this, and to
      naively accept estimates that are nowhere close to the correct
      value of the marginal likelihood.”
                                       [Radford Neal’s blog, Aug. 23, 2008]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



“The Worst Monte Carlo Method Ever”

      “The good news is that the Law of Large Numbers guarantees that
      this estimator is consistent ie, it will very likely be very close to the
      correct answer if you use a sufficiently large number of points from
      the posterior distribution.
      The bad news is that the number of points required for this
      estimator to get close to the right answer will often be greater
      than the number of atoms in the observable universe. The even
      worse news is that itws easy for people to not realize this, and to
      naively accept estimates that are nowhere close to the correct
      value of the marginal likelihood.”
                                       [Radford Neal’s blog, Aug. 23, 2008]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Optimal bridge sampling

      The optimal choice of auxiliary function is
                                                       n1 + n2
                                    α=
                                              n1 π1 (θ|x) + n2 π2 (θ|x)

      leading to
                                             n1
                                                            π2 (θ1i |x)
                                       1                     ˜
                                                  n1 π1 (θ1i |x) + n2 π2 (θ1i |x)
                                       n1
                                            i=1
                           B12 ≈             n2
                                                            π1 (θ2i |x)
                                       1                     ˜
                                                  n1 π1 (θ2i |x) + n2 π2 (θ2i |x)
                                       n2
                                            i=1

                                                                                    Back later!
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Optimal bridge sampling (2)



      Reason:

                                           π1 (θ)π2 (θ)[n1 π1 (θ) + n2 π2 (θ)]α(θ)2 dθ
       Var(B12 )    1
                 ≈                                                                       −1
           2                                                                    2
                   n1 n2
         B12                                              π1 (θ)π2 (θ)α(θ) dθ

      (by the δ method)
      Dependence on the unknown normalising constants solved
      iteratively
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Optimal bridge sampling (2)



      Reason:

                                           π1 (θ)π2 (θ)[n1 π1 (θ) + n2 π2 (θ)]α(θ)2 dθ
       Var(B12 )    1
                 ≈                                                                       −1
           2                                                                    2
                   n1 n2
         B12                                              π1 (θ)π2 (θ)α(θ) dθ

      (by the δ method)
      Dependence on the unknown normalising constants solved
      iteratively
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Ratio importance sampling


      Another identity:
                                                    Eϕ [˜1 (θ)/ϕ(θ)]
                                                        π
                                         B12 =
                                                    Eϕ [˜2 (θ)/ϕ(θ)]
                                                        π
      for any density ϕ with sufficiently large support
                                                    [Torrie & Valleau, 1977]
      Use of a single sample θ1 , . . . , θn from ϕ

                                                      i=1 π1 (θi )/ϕ(θi )
                                                          ˜
                                       B12 =
                                                      i=1 π2 (θi )/ϕ(θi )
                                                          ˜
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Ratio importance sampling


      Another identity:
                                                    Eϕ [˜1 (θ)/ϕ(θ)]
                                                        π
                                         B12 =
                                                    Eϕ [˜2 (θ)/ϕ(θ)]
                                                        π
      for any density ϕ with sufficiently large support
                                                    [Torrie & Valleau, 1977]
      Use of a single sample θ1 , . . . , θn from ϕ

                                                      i=1 π1 (θi )/ϕ(θi )
                                                          ˜
                                       B12 =
                                                      i=1 π2 (θi )/ϕ(θi )
                                                          ˜
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Ratio importance sampling (2)


      Approximate variance:
                                                                               2
                                                          (π1 (θ) − π2 (θ))2
                          var(B12 )  1
                                    = Eϕ
                              2                                 ϕ(θ)2
                                     n
                            B12

      Optimal choice:

                                                     | π1 (θ) − π2 (θ) |
                                  ϕ∗ (θ) =
                                                    | π1 (η) − π2 (η) | dη

                                                             [Chen, Shao & Ibrahim, 2000]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Ratio importance sampling (2)


      Approximate variance:
                                                                               2
                                                          (π1 (θ) − π2 (θ))2
                          var(B12 )  1
                                    = Eϕ
                              2                                 ϕ(θ)2
                                     n
                            B12

      Optimal choice:

                                                     | π1 (θ) − π2 (θ) |
                                  ϕ∗ (θ) =
                                                    | π1 (η) − π2 (η) | dη

                                                             [Chen, Shao & Ibrahim, 2000]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Improving upon bridge sampler

      Theorem 5.5.3: The asymptotic variance of the optimal ratio
      importance sampling estimator is smaller than the asymptotic
      variance of the optimal bridge sampling estimator
                                          [Chen, Shao, & Ibrahim, 2000]
      Does not require the normalising constant

                                              | π1 (η) − π2 (η) | dη

      but a simulation from

                                       ϕ∗ (θ) ∝| π1 (θ) − π2 (θ) | .
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Improving upon bridge sampler

      Theorem 5.5.3: The asymptotic variance of the optimal ratio
      importance sampling estimator is smaller than the asymptotic
      variance of the optimal bridge sampling estimator
                                          [Chen, Shao, & Ibrahim, 2000]
      Does not require the normalising constant

                                              | π1 (η) − π2 (η) | dη

      but a simulation from

                                       ϕ∗ (θ) ∝| π1 (θ) − π2 (θ) | .
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Varying dimensions



Generalisation to point null situations

      When
                                                               π1 (θ1 )dθ1
                                                               ˜
                                                          Θ1
                                          B12 =
                                                               π2 (θ2 )dθ2
                                                               ˜
                                                          Θ2

      and Θ2 = Θ1 × Ψ, we get θ2 = (θ1 , ψ) and

                                                          π1 (θ1 )ω(ψ|θ1 )
                                                          ˜
                                     B12 = Eπ2
                                                             π2 (θ1 , ψ)
                                                             ˜

        holds for any conditional density ω(ψ|θ1 ).
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Varying dimensions



X-dimen’al bridge sampling

      Generalisation of the previous identity:
      For any α,
                             Eπ [˜1 (θ1 )ω(ψ|θ1 )α(θ1 , ψ)]
                                  π
                      B12 = 2
                               Eπ1 ×ω [˜2 (θ1 , ψ)α(θ1 , ψ)]
                                       π
        and, for any density ϕ,

                                          Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)]
                                              π
                                  B12 =
                                            Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)]
                                                 π

                                           [Chen, Shao, & Ibrahim, 2000]
      Optimal choice: ω(ψ|θ1 ) = π2 (ψ|θ1 )
                                                         [Theorem 5.8.2]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Varying dimensions



X-dimen’al bridge sampling

      Generalisation of the previous identity:
      For any α,
                             Eπ [˜1 (θ1 )ω(ψ|θ1 )α(θ1 , ψ)]
                                  π
                      B12 = 2
                               Eπ1 ×ω [˜2 (θ1 , ψ)α(θ1 , ψ)]
                                       π
        and, for any density ϕ,

                                          Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)]
                                              π
                                  B12 =
                                            Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)]
                                                 π

                                           [Chen, Shao, & Ibrahim, 2000]
      Optimal choice: ω(ψ|θ1 ) = π2 (ψ|θ1 )
                                                         [Theorem 5.8.2]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximating Zk from a posterior sample


      Use of the [harmonic mean] identity

                    ϕ(θk )                               ϕ(θk )       πk (θk )Lk (θk )       1
      Eπk                        x=                                                    dθk =
                πk (θk )Lk (θk )                     πk (θk )Lk (θk )       Zk               Zk

      no matter what the proposal ϕ(·) is.
                          [Gelfand & Dey, 1994; Bartolucci et al., 2006]
      Direct exploitation of the MCMC output
                                                                                         RB-RJ
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximating Zk from a posterior sample


      Use of the [harmonic mean] identity

                    ϕ(θk )                               ϕ(θk )       πk (θk )Lk (θk )       1
      Eπk                        x=                                                    dθk =
                πk (θk )Lk (θk )                     πk (θk )Lk (θk )       Zk               Zk

      no matter what the proposal ϕ(·) is.
                          [Gelfand & Dey, 1994; Bartolucci et al., 2006]
      Direct exploitation of the MCMC output
                                                                                         RB-RJ
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Comparison with regular importance sampling


      Harmonic mean: Constraint opposed to usual importance sampling
      constraints: ϕ(θ) must have lighter (rather than fatter) tails than
      πk (θk )Lk (θk ) for the approximation
                                                          T               (t)
                                                                    ϕ(θk )
                                                  1
                                  Z1k = 1                           (t)         (t)
                                                  T             πk (θk )Lk (θk )
                                                          t=1

      to have a finite variance.
      E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Comparison with regular importance sampling


      Harmonic mean: Constraint opposed to usual importance sampling
      constraints: ϕ(θ) must have lighter (rather than fatter) tails than
      πk (θk )Lk (θk ) for the approximation
                                                          T               (t)
                                                                    ϕ(θk )
                                                  1
                                  Z1k = 1                           (t)         (t)
                                                  T             πk (θk )Lk (θk )
                                                          t=1

      to have a finite variance.
      E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Comparison with regular importance sampling (cont’d)



      Compare Z1k with a standard importance sampling approximation
                                                     T        (t)         (t)
                                                          πk (θk )Lk (θk )
                                            1
                                          =
                                    Z2k                             (t)
                                            T                 ϕ(θk )
                                                   t=1

                         (t)
      where the θk ’s are generated from the density ϕ(·) (with fatter
      tails like t’s)
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximating Zk using a mixture representation



                                                                              Bridge sampling redux

      Design a specific mixture for simulation [importance sampling]
      purposes, with density

                                  ϕk (θk ) ∝ ω1 πk (θk )Lk (θk ) + ϕ(θk ) ,

      where ϕ(·) is arbitrary (but normalised)
      Note: ω1 is not a probability weight
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximating Zk using a mixture representation



                                                                              Bridge sampling redux

      Design a specific mixture for simulation [importance sampling]
      purposes, with density

                                  ϕk (θk ) ∝ ω1 πk (θk )Lk (θk ) + ϕ(θk ) ,

      where ϕ(·) is arbitrary (but normalised)
      Note: ω1 is not a probability weight
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximating Z using a mixture representation (cont’d)

      Corresponding MCMC (=Gibbs) sampler
      At iteration t
              Take δ (t) = 1 with probability
          1



                         (t−1)            (t−1)                  (t−1)         (t−1)          (t−1)
              ω1 πk (θk           )Lk (θk         )       ω1 πk (θk      )Lk (θk       ) + ϕ(θk       )

              and δ (t) = 2 otherwise;
                                                (t)                   (t−1)
              If δ (t) = 1, generate θk ∼ MCMC(θk          , θk ) where
          2

              MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated
              with the posterior πk (θk |x) ∝ πk (θk )Lk (θk );
                                                (t)
              If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
          3
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximating Z using a mixture representation (cont’d)

      Corresponding MCMC (=Gibbs) sampler
      At iteration t
              Take δ (t) = 1 with probability
          1



                         (t−1)            (t−1)                  (t−1)         (t−1)          (t−1)
              ω1 πk (θk           )Lk (θk         )       ω1 πk (θk      )Lk (θk       ) + ϕ(θk       )

              and δ (t) = 2 otherwise;
                                                (t)                   (t−1)
              If δ (t) = 1, generate θk ∼ MCMC(θk          , θk ) where
          2

              MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated
              with the posterior πk (θk |x) ∝ πk (θk )Lk (θk );
                                                (t)
              If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
          3
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximating Z using a mixture representation (cont’d)

      Corresponding MCMC (=Gibbs) sampler
      At iteration t
              Take δ (t) = 1 with probability
          1



                         (t−1)            (t−1)                  (t−1)         (t−1)          (t−1)
              ω1 πk (θk           )Lk (θk         )       ω1 πk (θk      )Lk (θk       ) + ϕ(θk       )

              and δ (t) = 2 otherwise;
                                                (t)                   (t−1)
              If δ (t) = 1, generate θk ∼ MCMC(θk          , θk ) where
          2

              MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated
              with the posterior πk (θk |x) ∝ πk (θk )Lk (θk );
                                                (t)
              If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
          3
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Evidence approximation by mixtures
      Rao-Blackwellised estimate
                        T
           ˆ1                           (t)          (t)               (t)          (t)          (t)
           ξ=                ω1 πk (θk )Lk (θk )              ω1 πk (θk )Lk (θk ) + ϕ(θk ) ,
              T
                       t=1

      converges to ω1 Zk /{ω1 Zk + 1}
                                            ˆ
      Deduce Zˆ from ω1 Z3k /{ω1 Z3k + 1} = ξ ie
                           ˆ       ˆ
               3k

                                      (t)     (t)                       (t)          (t)          (t)
                          T
                          t=1 ω1 πk (θk )Lk (θk )               ω1 π(θk )Lk (θk ) + ϕ(θk )
           ˆ
           Z3k =
                                         (t)                     (t)          (t)          (t)
                                  T
                                  t=1 ϕ(θk )              ω1 πk (θk )Lk (θk ) + ϕ(θk )

                                                                                    [Bridge sampler]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Evidence approximation by mixtures
      Rao-Blackwellised estimate
                        T
           ˆ1                           (t)          (t)               (t)          (t)          (t)
           ξ=                ω1 πk (θk )Lk (θk )              ω1 πk (θk )Lk (θk ) + ϕ(θk ) ,
              T
                       t=1

      converges to ω1 Zk /{ω1 Zk + 1}
                                            ˆ
      Deduce Zˆ from ω1 Z3k /{ω1 Z3k + 1} = ξ ie
                           ˆ       ˆ
               3k

                                      (t)     (t)                       (t)          (t)          (t)
                          T
                          t=1 ω1 πk (θk )Lk (θk )               ω1 π(θk )Lk (θk ) + ϕ(θk )
           ˆ
           Z3k =
                                         (t)                     (t)          (t)          (t)
                                  T
                                  t=1 ϕ(θk )              ω1 πk (θk )Lk (θk ) + ϕ(θk )

                                                                                    [Bridge sampler]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Chib’s representation


      Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
      θk ∼ πk (θk ),
                                        fk (x|θk ) πk (θk )
                        Zk = mk (x) =
                                            πk (θk |x)
      Use of an approximation to the posterior
                                                                  ∗       ∗
                                                          fk (x|θk ) πk (θk )
                                  Zk = mk (x) =                               .
                                                               ˆ∗
                                                              πk (θk |x)
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Chib’s representation


      Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
      θk ∼ πk (θk ),
                                        fk (x|θk ) πk (θk )
                        Zk = mk (x) =
                                            πk (θk |x)
      Use of an approximation to the posterior
                                                                  ∗       ∗
                                                          fk (x|θk ) πk (θk )
                                  Zk = mk (x) =                               .
                                                               ˆ∗
                                                              πk (θk |x)
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Case of latent variables



      For missing variable z as in mixture models, natural Rao-Blackwell
      estimate
                                       T
                                    1                 (t)
                            ∗                  ∗
                       πk (θk |x) =       πk (θk |x, zk ) ,
                                    T
                                                          t=1
                         (t)
      where the zk ’s are Gibbs sampled latent variables
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Label switching


      A mixture model [special case of missing variable model] is
      invariant under permutations of the indices of the components.
      E.g., mixtures
                          0.3N (0, 1) + 0.7N (2.3, 1)
      and
                                       0.7N (2.3, 1) + 0.3N (0, 1)
      are exactly the same!
       c The component parameters θi are not identifiable
      marginally since they are exchangeable
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Label switching


      A mixture model [special case of missing variable model] is
      invariant under permutations of the indices of the components.
      E.g., mixtures
                          0.3N (0, 1) + 0.7N (2.3, 1)
      and
                                       0.7N (2.3, 1) + 0.3N (0, 1)
      are exactly the same!
       c The component parameters θi are not identifiable
      marginally since they are exchangeable
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Connected difficulties



              Number of modes of the likelihood of order O(k!):
          1

               c Maximization and even [MCMC] exploration of the
              posterior surface harder
              Under exchangeable priors on (θ, p) [prior invariant under
          2

              permutation of the indices], all posterior marginals are
              identical:
               c Posterior expectation of θ1 equal to posterior expectation
              of θ2
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Connected difficulties



              Number of modes of the likelihood of order O(k!):
          1

               c Maximization and even [MCMC] exploration of the
              posterior surface harder
              Under exchangeable priors on (θ, p) [prior invariant under
          2

              permutation of the indices], all posterior marginals are
              identical:
               c Posterior expectation of θ1 equal to posterior expectation
              of θ2
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



License


      Since Gibbs output does not produce exchangeability, the Gibbs
      sampler has not explored the whole parameter space: it lacks
      energy to switch simultaneously enough component allocations at
      once

                                                                             0.2 0.3 0.4 0.5
                  −1 0 1 2 3
          µi




                                                                        pi                     −1         0          1         2       3
                                  0   100   200       300   400   500

                                                                                                                    µ
                                                  n                                                                  i



                                                                             0.4 0.6 0.8 1.0
                0.2 0.3 0.4 0.5




                                                                        σi
           pi




                                  0   100   200       300   400   500                           0.2           0.3        0.4         0.5
                                                  n                                                                 pi
                0.4 0.6 0.8 1.0




                                                                             −1 0 1 2 3
          σi




                                                                        µi




                                  0   100   200       300   400   500                               0.4         0.6            0.8         1.0
                                                                                                                    σ
                                                  n                                                                  i
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Label switching paradox




      We should observe the exchangeability of the components [label
      switching] to conclude about convergence of the Gibbs sampler.
      If we observe it, then we do not know how to estimate the
      parameters.
      If we do not, then we are uncertain about the convergence!!!
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Label switching paradox




      We should observe the exchangeability of the components [label
      switching] to conclude about convergence of the Gibbs sampler.
      If we observe it, then we do not know how to estimate the
      parameters.
      If we do not, then we are uncertain about the convergence!!!
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Label switching paradox




      We should observe the exchangeability of the components [label
      switching] to conclude about convergence of the Gibbs sampler.
      If we observe it, then we do not know how to estimate the
      parameters.
      If we do not, then we are uncertain about the convergence!!!
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Compensation for label switching
                                         (t)
      For mixture models, zk usually fails to visit all configurations in a
      balanced way, despite the symmetry predicted by the theory
                                                              1
                       πk (θk |x) = πk (σ(θk )|x) =                      πk (σ(θk )|x)
                                                              k!
                                                                   σ∈S

      for all σ’s in Sk , set of all permutations of {1, . . . , k}.
      Consequences on numerical approximation, biased by an order k!
      Recover the theoretical symmetry by using
                                                          T
                                             1                                 (t)
                              ∗                                       ∗
                         πk (θk |x) =                          πk (σ(θk )|x, zk ) .
                                            T k!
                                                    σ∈Sk t=1

                                                    [Berkhof, Mechelen, & Gelman, 2003]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Compensation for label switching
                                         (t)
      For mixture models, zk usually fails to visit all configurations in a
      balanced way, despite the symmetry predicted by the theory
                                                              1
                       πk (θk |x) = πk (σ(θk )|x) =                      πk (σ(θk )|x)
                                                              k!
                                                                   σ∈S

      for all σ’s in Sk , set of all permutations of {1, . . . , k}.
      Consequences on numerical approximation, biased by an order k!
      Recover the theoretical symmetry by using
                                                          T
                                             1                                 (t)
                              ∗                                       ∗
                         πk (θk |x) =                          πk (σ(θk )|x, zk ) .
                                            T k!
                                                    σ∈Sk t=1

                                                    [Berkhof, Mechelen, & Gelman, 2003]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Galaxy dataset

      n = 82 galaxies as a mixture of k normal distributions with both
      mean and variance unknown.
                                                          [Roeder, 1992]
                                                                         Average density
                                                         0.8
                                                         0.6
                                    Relative Frequency

                                                         0.4
                                                         0.2
                                                         0.0




                                                               −2   −1      0              1   2   3

                                                                                data
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Galaxy dataset (k)
                                              ∗
      Using only the original estimate, with θk as the MAP estimator,
                                       log(mk (x)) = −105.1396
                                           ˆ
      for k = 3 (based on 103 simulations), while introducing the
      permutations leads to
                                       log(mk (x)) = −103.3479
                                           ˆ
      Note that
                                  −105.1396 + log(3!) = −103.3479

        k                 2           3             4         5         6         7         8
        mk (x)         -115.68     -103.35       -102.66   -101.93   -102.88   -105.48   -108.44

      Estimations of the marginal likelihoods by the symmetrised Chib’s
      approximation (based on 105 Gibbs iterations and, for k > 5, 100
      permutations selected at random in Sk ).
                                [Lee, Marin, Mengersen & Robert, 2008]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Galaxy dataset (k)
                                              ∗
      Using only the original estimate, with θk as the MAP estimator,
                                       log(mk (x)) = −105.1396
                                           ˆ
      for k = 3 (based on 103 simulations), while introducing the
      permutations leads to
                                       log(mk (x)) = −103.3479
                                           ˆ
      Note that
                                  −105.1396 + log(3!) = −103.3479

        k                 2           3             4         5         6         7         8
        mk (x)         -115.68     -103.35       -102.66   -101.93   -102.88   -105.48   -108.44

      Estimations of the marginal likelihoods by the symmetrised Chib’s
      approximation (based on 105 Gibbs iterations and, for k > 5, 100
      permutations selected at random in Sk ).
                                [Lee, Marin, Mengersen & Robert, 2008]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Galaxy dataset (k)
                                              ∗
      Using only the original estimate, with θk as the MAP estimator,
                                       log(mk (x)) = −105.1396
                                           ˆ
      for k = 3 (based on 103 simulations), while introducing the
      permutations leads to
                                       log(mk (x)) = −103.3479
                                           ˆ
      Note that
                                  −105.1396 + log(3!) = −103.3479

        k                 2           3             4         5         6         7         8
        mk (x)         -115.68     -103.35       -102.66   -101.93   -102.88   -105.48   -108.44

      Estimations of the marginal likelihoods by the symmetrised Chib’s
      approximation (based on 105 Gibbs iterations and, for k > 5, 100
      permutations selected at random in Sk ).
                                [Lee, Marin, Mengersen & Robert, 2008]
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Bayesian variable selection

      Regression setting: one dependent random variable y and a set
      {x1 , . . . , xk } of k explanatory variables.
      Question: Are all xi ’s involved in the regression?
      Assumption: every subset {i1 , . . . , iq } of q (0 ≤ q ≤ k)
      explanatory variables, {1n , xi1 , . . . , xiq }, is a proper set of
      explanatory variables for the regression of y [intercept included in
      every corresponding model]

      Computational issue
                                      2k models in competition...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Bayesian variable selection

      Regression setting: one dependent random variable y and a set
      {x1 , . . . , xk } of k explanatory variables.
      Question: Are all xi ’s involved in the regression?
      Assumption: every subset {i1 , . . . , iq } of q (0 ≤ q ≤ k)
      explanatory variables, {1n , xi1 , . . . , xiq }, is a proper set of
      explanatory variables for the regression of y [intercept included in
      every corresponding model]

      Computational issue
                                      2k models in competition...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Bayesian variable selection

      Regression setting: one dependent random variable y and a set
      {x1 , . . . , xk } of k explanatory variables.
      Question: Are all xi ’s involved in the regression?
      Assumption: every subset {i1 , . . . , iq } of q (0 ≤ q ≤ k)
      explanatory variables, {1n , xi1 , . . . , xiq }, is a proper set of
      explanatory variables for the regression of y [intercept included in
      every corresponding model]

      Computational issue
                                      2k models in competition...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Bayesian variable selection

      Regression setting: one dependent random variable y and a set
      {x1 , . . . , xk } of k explanatory variables.
      Question: Are all xi ’s involved in the regression?
      Assumption: every subset {i1 , . . . , iq } of q (0 ≤ q ≤ k)
      explanatory variables, {1n , xi1 , . . . , xiq }, is a proper set of
      explanatory variables for the regression of y [intercept included in
      every corresponding model]

      Computational issue
                                      2k models in competition...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Model notations

          1

                                            X = 1n x 1 · · ·   xk
               is the matrix containing 1n and all the k potential predictor
               variables
               Each model Mγ associated with binary indicator vector
          2

               γ ∈ Γ = {0, 1}k where γi = 1 means that the variable xi is
               included in the model Mγ
               qγ = 1T γ number of variables included in the model Mγ
          3
                     n
               t1 (γ) and t0 (γ) indices of variables included in the model and
          4

               indices of variables not included in the model
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Model indicators



      For β ∈ Rk+1 and X, we define βγ as the subvector

                                          βγ = β0 , (βi )i∈t1 (γ)

      and Xγ as the submatrix of X where only the column 1n and the
      columns in t1 (γ) have been left.
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Models in competition


      The model Mγ is thus defined as

                                y|γ, βγ , σ 2 , X ∼ Nn Xγ βγ , σ 2 In

      where βγ ∈ Rqγ +1 and σ 2 ∈ R∗ are the unknown parameters.
                                   +

      Warning
      σ 2 is common to all models and thus uses the same prior for all
      models
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Models in competition


      The model Mγ is thus defined as

                                y|γ, βγ , σ 2 , X ∼ Nn Xγ βγ , σ 2 In

      where βγ ∈ Rqγ +1 and σ 2 ∈ R∗ are the unknown parameters.
                                   +

      Warning
      σ 2 is common to all models and thus uses the same prior for all
      models
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Informative G-prior




      Many (2k ) models in competition: we cannot expect a practitioner
      to specify a prior on every Mγ in a completely subjective and
      autonomous manner.
      Shortcut: We derive all priors from a single global prior associated
      with the so-called full model that corresponds to γ = (1, . . . , 1).
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Prior definitions


                For the full model, Zellner’s G-prior:
         (i)



                β|σ 2 , X ∼ Nk+1 (β, cσ 2 (X T X)−1 ) and σ 2 ∼ π(σ 2 |X) = σ −2
                                  ˜

                For each model Mγ , the prior distribution of βγ conditional
         (ii)

                on σ 2 is fixed as
                                                                          −1
                                                  ˜
                              βγ |γ, σ 2 ∼ Nqγ +1 βγ , cσ 2 Xγ Xγ
                                                             T
                                                                               ,

                                               −1
                      ˜                              T˜
                            T                       Xγ β and same prior on σ 2 .
                where βγ = Xγ Xγ
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Prior completion



      The joint prior for model Mγ is the improper prior

                                                                        1               T
                                              −(qγ +1)/2−1                         ˜
           π(βγ , σ 2 |γ) ∝             σ2                   exp −            βγ − βγ
                                                                     2(cσ 2 )
                                                         ˜
                                             T
                                           (Xγ Xγ ) βγ − βγ          .
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Prior competition (2)


      Infinitely many ways of defining a prior on the model index γ:
      choice of uniform prior π(γ|X) = 2−k .
      Posterior distribution of γ central to variable selection since it is
      proportional to marginal density of y on Mγ (or evidence of Mγ )

       π(γ|y, X) ∝ f (y|γ, X)π(γ|X) ∝ f (y|γ, X)

                                         f (y|γ, β, σ 2 , X)π(β|γ, σ 2 , X) dβ π(σ 2 |X) dσ 2 .
                          =
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection




      f (y|γ, σ 2 , X) =                f (y|γ, β, σ 2 )π(β|γ, σ 2 ) dβ

                                                                                           1T
                                                                           −n/2
                             = (c + 1)−(qγ +1)/2 (2π)−n/2 σ 2                     exp −        yy
                                                                                          2σ 2
                                               1                            −1
                                                                                        ˜T T     ˜
                                                          cy T Xγ Xγ Xγ
                                                                   T              T
                                                                                 Xγ y − βγ Xγ Xγ βγ
                                     +                                                                   ,
                                         2σ 2 (c   + 1)

      this posterior density satisfies
                                                                                             −1
                                                                   cT
      π(γ|y, X) ∝ (c + 1)−(qγ +1)/2 y T y −                                 T                      T
                                                                      y Xγ Xγ Xγ                  Xγ y
                                                                  c+1
                                                                −n/2
                                         1 ˜T T   ˜
                                    −      β X Xγ βγ                   .
                                        c+1 γ γ
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Pine processionary caterpillars
                                        t1 (γ)            π(γ|y, X)
                                        0,1,2,4,5            0.2316
                                        0,1,2,4,5,9          0.0374
                                        0,1,9                0.0344
                                        0,1,2,4,5,10         0.0328
                                        0,1,4,5              0.0306
                                        0,1,2,9              0.0250
                                        0,1,2,4,5,7          0.0241
                                        0,1,2,4,5,8          0.0238
                                        0,1,2,4,5,6          0.0237
                                        0,1,2,3,4,5          0.0232
                                        0,1,6,9              0.0146
                                        0,1,2,3,9            0.0145
                                        0,9                  0.0143
                                        0,1,2,6,9            0.0135
                                        0,1,4,5,9            0.0128
                                        0,1,3,9              0.0117
                                        0,1,2,8              0.0115
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Pine processionary caterpillars (cont’d)


      Interpretation
      Model Mγ with the highest posterior probability is
      t1 (γ) = (1, 2, 4, 5), which corresponds to the variables
           - altitude,
           - slope,
           - height of the tree sampled in the center of the area, and
           - diameter of the tree sampled in the center of the area.

      Corresponds to the five variables identified in the R regression
      output
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Pine processionary caterpillars (cont’d)


      Interpretation
      Model Mγ with the highest posterior probability is
      t1 (γ) = (1, 2, 4, 5), which corresponds to the variables
           - altitude,
           - slope,
           - height of the tree sampled in the center of the area, and
           - diameter of the tree sampled in the center of the area.

      Corresponds to the five variables identified in the R regression
      output
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Noninformative extension



      For Zellner noninformative prior with π(c) = 1/c, we have
                                           ∞
                                                c−1 (c + 1)−(qγ +1)/2 y T y−
                 π(γ|y, X) ∝
                                          c=1
                                                                                 −n/2
                                                                     −1
                                                cT       T                 T
                                                   y Xγ Xγ Xγ             Xγ y          .
                                               c+1
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Pine processionary caterpillars
                                       t1 (γ)             π(γ|y, X)
                                       0,1,2,4,5             0.0929
                                       0,1,2,4,5,9           0.0325
                                       0,1,2,4,5,10          0.0295
                                       0,1,2,4,5,7           0.0231
                                       0,1,2,4,5,8           0.0228
                                       0,1,2,4,5,6           0.0228
                                       0,1,2,3,4,5           0.0224
                                       0,1,2,3,4,5,9         0.0167
                                       0,1,2,4,5,6,9         0.0167
                                       0,1,2,4,5,8,9         0.0137
                                       0,1,4,5               0.0110
                                       0,1,2,4,5,9,10        0.0100
                                       0,1,2,3,9             0.0097
                                       0,1,2,9               0.0093
                                       0,1,2,4,5,7,9         0.0092
                                       0,1,2,6,9             0.0092
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Stochastic search for the most likely model

      When k gets large, impossible to compute the posterior
      probabilities of the 2k models.
      Need of a tailored algorithm that samples from π(γ|y, X) and
      selects the most likely models.
      Can be done by Gibbs sampling, given the availability of the full
      conditional posterior probabilities of the γi ’s.
      If γ−i = (γ1 , . . . , γi−1 , γi+1 , . . . , γk ) (1 ≤ i ≤ k)

                                     π(γi |y, γ−i , X) ∝ π(γ|y, X)

      (to be evaluated in both γi = 0 and γi = 1)
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Stochastic search for the most likely model

      When k gets large, impossible to compute the posterior
      probabilities of the 2k models.
      Need of a tailored algorithm that samples from π(γ|y, X) and
      selects the most likely models.
      Can be done by Gibbs sampling, given the availability of the full
      conditional posterior probabilities of the γi ’s.
      If γ−i = (γ1 , . . . , γi−1 , γi+1 , . . . , γk ) (1 ≤ i ≤ k)

                                     π(γi |y, γ−i , X) ∝ π(γ|y, X)

      (to be evaluated in both γi = 0 and γi = 1)
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Gibbs sampling for variable selection

                   Initialization: Draw γ 0 from the uniform
                   distribution on Γ

                                                    (t−1)               (t−1)
                   Iteration t: Given (γ1                   , . . . , γk        ), generate
                            (t)                                  (t−1)                 (t−1)
                      1. γ1 according to π(γ1 |y, γ2                       , . . . , γk        , X)
                            (t)
                      2. γ2 according to
                                   (t)  (t−1)            (t−1)
                         π(γ2 |y, γ1 , γ3     , . . . , γk     , X)
                                       .
                                       .
                                       .
                            (t)                                   (t)            (t)
                      p. γk according to π(γk |y, γ1 , . . . , γk−1 , X)
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



MCMC interpretation

      After T     1 MCMC iterations, output used to approximate the
      posterior probabilities π(γ|y, X) by empirical averages
                                                                       T
                                                       1
                             π(γ|y, X) =                                    Iγ (t) =γ .
                                                   T − T0 + 1
                                                                   t=T0

      where the T0 first values are eliminated as burnin.
      And approximation of the probability to include i-th variable,
                                                                            T
                                                              1
                          P π (γi = 1|y, X) =                                     Iγ (t) =1 .
                                                          T − T0 + 1                i
                                                                           t=T0
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



MCMC interpretation

      After T     1 MCMC iterations, output used to approximate the
      posterior probabilities π(γ|y, X) by empirical averages
                                                                       T
                                                       1
                             π(γ|y, X) =                                    Iγ (t) =γ .
                                                   T − T0 + 1
                                                                   t=T0

      where the T0 first values are eliminated as burnin.
      And approximation of the probability to include i-th variable,
                                                                            T
                                                              1
                          P π (γi = 1|y, X) =                                     Iγ (t) =1 .
                                                          T − T0 + 1                i
                                                                           t=T0
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Pine processionary caterpillars

                                       P π (γi = 1|y, X)   P π (γi = 1|y, X)
                          γi
                          γ1                      0.8624              0.8844
                          γ2                      0.7060              0.7716
                          γ3                      0.1482              0.2978
                          γ4                      0.6671              0.7261
                          γ5                      0.6515              0.7006
                          γ6                      0.1678              0.3115
                          γ7                      0.1371              0.2880
                          γ8                      0.1555              0.2876
                          γ9                      0.4039              0.5168
                          γ10                     0.1151              0.2609
                                                         ˜
       Probabilities of inclusion with both informative (β = 011 , c = 100)
                        and noninformative Zellner’s priors
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Reversible jump


      Idea: Set up a proper measure–theoretic framework for designing
      moves between models Mk
                                                         [Green, 1995]
      Create a reversible kernel K on H = k {k} × Θk such that

                                  K(x, dy)π(x)dx =                K(y, dx)π(y)dy
                          A   B                           B   A

      for the invariant density π [x is of the form (k, θ(k) )]
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Reversible jump


      Idea: Set up a proper measure–theoretic framework for designing
      moves between models Mk
                                                         [Green, 1995]
      Create a reversible kernel K on H = k {k} × Θk such that

                                  K(x, dy)π(x)dx =                K(y, dx)π(y)dy
                          A   B                           B   A

      for the invariant density π [x is of the form (k, θ(k) )]
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Local moves
      For a move between two models, M1 and M2 , the Markov chain
      being in state θ1 ∈ M1 , denote by K1→2 (θ1 , dθ) and K2→1 (θ2 , dθ)
      the corresponding kernels, under the detailed balance condition

                          π(dθ1 ) K1→2 (θ1 , dθ) = π(dθ2 ) K2→1 (θ2 , dθ) ,

      and take, wlog, dim(M2 ) > dim(M1 ).
      Proposal expressed as

                                           θ2 = Ψ1→2 (θ1 , v1→2 )

      where v1→2 is a random variable of dimension
      dim(M2 ) − dim(M1 ), generated as

                                          v1→2 ∼ ϕ1→2 (v1→2 ) .
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Local moves
      For a move between two models, M1 and M2 , the Markov chain
      being in state θ1 ∈ M1 , denote by K1→2 (θ1 , dθ) and K2→1 (θ2 , dθ)
      the corresponding kernels, under the detailed balance condition

                          π(dθ1 ) K1→2 (θ1 , dθ) = π(dθ2 ) K2→1 (θ2 , dθ) ,

      and take, wlog, dim(M2 ) > dim(M1 ).
      Proposal expressed as

                                           θ2 = Ψ1→2 (θ1 , v1→2 )

      where v1→2 is a random variable of dimension
      dim(M2 ) − dim(M1 ), generated as

                                          v1→2 ∼ ϕ1→2 (v1→2 ) .
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Local moves (2)

      In this case, q1→2 (θ1 , dθ2 ) has density
                                                                          −1
                                                     ∂Ψ1→2 (θ1 , v1→2 )
                               ϕ1→2 (v1→2 )                                    ,
                                                       ∂(θ1 , v1→2 )

      by the Jacobian rule.
                                                                                   Reverse importance link

      If probability 1→2 of choosing move to M2 while in M1 ,
      acceptance probability reduces to

                                         π(M2 , θ2 ) 2→1        ∂Ψ1→2 (θ1 , v1→2 )
      α(θ1 , v1→2 ) = 1∧                                                           .
                                   π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 )   ∂(θ1 , v1→2 )

                                           c Difficult calibration
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Local moves (2)

      In this case, q1→2 (θ1 , dθ2 ) has density
                                                                          −1
                                                     ∂Ψ1→2 (θ1 , v1→2 )
                               ϕ1→2 (v1→2 )                                    ,
                                                       ∂(θ1 , v1→2 )

      by the Jacobian rule.
                                                                                   Reverse importance link

      If probability 1→2 of choosing move to M2 while in M1 ,
      acceptance probability reduces to

                                         π(M2 , θ2 ) 2→1        ∂Ψ1→2 (θ1 , v1→2 )
      α(θ1 , v1→2 ) = 1∧                                                           .
                                   π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 )   ∂(θ1 , v1→2 )

                                           c Difficult calibration
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Local moves (2)

      In this case, q1→2 (θ1 , dθ2 ) has density
                                                                          −1
                                                     ∂Ψ1→2 (θ1 , v1→2 )
                               ϕ1→2 (v1→2 )                                    ,
                                                       ∂(θ1 , v1→2 )

      by the Jacobian rule.
                                                                                   Reverse importance link

      If probability 1→2 of choosing move to M2 while in M1 ,
      acceptance probability reduces to

                                         π(M2 , θ2 ) 2→1        ∂Ψ1→2 (θ1 , v1→2 )
      α(θ1 , v1→2 ) = 1∧                                                           .
                                   π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 )   ∂(θ1 , v1→2 )

                                           c Difficult calibration
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Interpretation

      The representation puts us back in a fixed dimension setting:
              M1 × V1→2 and M2 in one-to-one relation.
              reversibility imposes that θ1 is derived as

                                             (θ1 , v1→2 ) = Ψ−1 (θ2 )
                                                             1→2

              appears like a regular Metropolis–Hastings move from the
              couple (θ1 , v1→2 ) to θ2 when stationary distributions are
              π(M1 , θ1 ) × ϕ1→2 (v1→2 ) and π(M2 , θ2 ), and when proposal
              distribution is deterministic (??)
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Interpretation

      The representation puts us back in a fixed dimension setting:
              M1 × V1→2 and M2 in one-to-one relation.
              reversibility imposes that θ1 is derived as

                                             (θ1 , v1→2 ) = Ψ−1 (θ2 )
                                                             1→2

              appears like a regular Metropolis–Hastings move from the
              couple (θ1 , v1→2 ) to θ2 when stationary distributions are
              π(M1 , θ1 ) × ϕ1→2 (v1→2 ) and π(M2 , θ2 ), and when proposal
              distribution is deterministic (??)
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Pseudo-deterministic reasoning
      Consider the proposals

      θ2 ∼ N (Ψ1→2 (θ1 , v1→2 ), ε)                             Ψ1→2 (θ1 , v1→2 ) ∼ N (θ2 , ε)
                                                          and

      Reciprocal proposal has density
                 exp −(θ2 − Ψ1→2 (θ1 , v1→2 ))2 /2ε   ∂Ψ1→2 (θ1 , v1→2 )
                             √                      ×
                                                        ∂(θ1 , v1→2 )
                               2πε

      by the Jacobian rule.
      Thus Metropolis–Hastings acceptance probability is
                                     π(M2 , θ2 )        ∂Ψ1→2 (θ1 , v1→2 )
                          1∧
                               π(M1 , θ1 ) ϕ1→2 (v1→2 )   ∂(θ1 , v1→2 )

      Does not depend on ε: Let ε go to 0
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Pseudo-deterministic reasoning
      Consider the proposals

      θ2 ∼ N (Ψ1→2 (θ1 , v1→2 ), ε)                             Ψ1→2 (θ1 , v1→2 ) ∼ N (θ2 , ε)
                                                          and

      Reciprocal proposal has density
                 exp −(θ2 − Ψ1→2 (θ1 , v1→2 ))2 /2ε   ∂Ψ1→2 (θ1 , v1→2 )
                             √                      ×
                                                        ∂(θ1 , v1→2 )
                               2πε

      by the Jacobian rule.
      Thus Metropolis–Hastings acceptance probability is
                                     π(M2 , θ2 )        ∂Ψ1→2 (θ1 , v1→2 )
                          1∧
                               π(M1 , θ1 ) ϕ1→2 (v1→2 )   ∂(θ1 , v1→2 )

      Does not depend on ε: Let ε go to 0
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Generic reversible jump acceptance probability
      If several models are considered simultaneously, with probability
         1→2 of choosing move to M2 while in M1 , as in
                                         ∞
                                         XZ
                          K(x, B) =               ρm (x, y)qm (x, dy) + ω(x)IB (x)
                                         m=1


      acceptance probability of θ2 = Ψ1→2 (θ1 , v1→2 ) is
                                           π(M2 , θ2 ) 2→1        ∂Ψ1→2 (θ1 , v1→2 )
          α(θ1 , v1→2 ) = 1 ∧
                                     π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 )   ∂(θ1 , v1→2 )

      while acceptance probability of θ1 with (θ1 , v1→2 ) = Ψ−1 (θ2 ) is
                                                              1→2
                                                                                     −1
                                   π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂Ψ1→2 (θ1 , v1→2 )
        α(θ1 , v1→2 ) = 1 ∧
                                         π(M2 , θ2 ) 2→1          ∂(θ1 , v1→2 )
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Generic reversible jump acceptance probability
      If several models are considered simultaneously, with probability
         1→2 of choosing move to M2 while in M1 , as in
                                         ∞
                                         XZ
                          K(x, B) =               ρm (x, y)qm (x, dy) + ω(x)IB (x)
                                         m=1


      acceptance probability of θ2 = Ψ1→2 (θ1 , v1→2 ) is
                                           π(M2 , θ2 ) 2→1        ∂Ψ1→2 (θ1 , v1→2 )
          α(θ1 , v1→2 ) = 1 ∧
                                     π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 )   ∂(θ1 , v1→2 )

      while acceptance probability of θ1 with (θ1 , v1→2 ) = Ψ−1 (θ2 ) is
                                                              1→2
                                                                                     −1
                                   π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂Ψ1→2 (θ1 , v1→2 )
        α(θ1 , v1→2 ) = 1 ∧
                                         π(M2 , θ2 ) 2→1          ∂(θ1 , v1→2 )
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Generic reversible jump acceptance probability
      If several models are considered simultaneously, with probability
         1→2 of choosing move to M2 while in M1 , as in
                                         ∞
                                         XZ
                          K(x, B) =               ρm (x, y)qm (x, dy) + ω(x)IB (x)
                                         m=1


      acceptance probability of θ2 = Ψ1→2 (θ1 , v1→2 ) is
                                           π(M2 , θ2 ) 2→1        ∂Ψ1→2 (θ1 , v1→2 )
          α(θ1 , v1→2 ) = 1 ∧
                                     π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 )   ∂(θ1 , v1→2 )

      while acceptance probability of θ1 with (θ1 , v1→2 ) = Ψ−1 (θ2 ) is
                                                              1→2
                                                                                     −1
                                   π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂Ψ1→2 (θ1 , v1→2 )
        α(θ1 , v1→2 ) = 1 ∧
                                         π(M2 , θ2 ) 2→1          ∂(θ1 , v1→2 )
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Green’s sampler

      Algorithm
      Iteration t (t ≥ 1): if x(t) = (m, θ(m) ),
              Select model Mn with probability πmn
          1


              Generate umn ∼ ϕmn (u) and set
          2

              (θ(n) , vnm ) = Ψm→n (θ(m) , umn )
              Take x(t+1) = (n, θ(n) ) with probability
          3



                            π(n, θ(n) ) πnm ϕnm (vnm )    ∂Ψm→n (θ(m) , umn )
                   min                                                        ,1
                            π(m, θ(m) ) πmn ϕmn (umn )      ∂(θ(m) , umn )

              and take x(t+1) = x(t) otherwise.
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Mixture of normal distributions



                                                                                 
                                                           k
                                                                                 
                                                                              2
                                                                pjk N (µjk , σjk )
                          Mk = (pjk , µjk , σjk );
                                                                                 
                                                          j=1

      Restrict moves from Mk to adjacent models, like Mk+1 and
      Mk−1 , with probabilities πk(k+1) and πk(k−1) .
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Mixture of normal distributions



                                                                                 
                                                           k
                                                                                 
                                                                              2
                                                                pjk N (µjk , σjk )
                          Mk = (pjk , µjk , σjk );
                                                                                 
                                                          j=1

      Restrict moves from Mk to adjacent models, like Mk+1 and
      Mk−1 , with probabilities πk(k+1) and πk(k−1) .
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice

Weitere ähnliche Inhalte

Was ist angesagt?

Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Gabriel Peyré
 
Proximal Splitting and Optimal Transport
Proximal Splitting and Optimal TransportProximal Splitting and Optimal Transport
Proximal Splitting and Optimal TransportGabriel Peyré
 
Model Selection with Piecewise Regular Gauges
Model Selection with Piecewise Regular GaugesModel Selection with Piecewise Regular Gauges
Model Selection with Piecewise Regular GaugesGabriel Peyré
 
Signal Processing Course : Convex Optimization
Signal Processing Course : Convex OptimizationSignal Processing Course : Convex Optimization
Signal Processing Course : Convex OptimizationGabriel Peyré
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future TrendCVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trendzukun
 
Signal Processing Course : Inverse Problems Regularization
Signal Processing Course : Inverse Problems RegularizationSignal Processing Course : Inverse Problems Regularization
Signal Processing Course : Inverse Problems RegularizationGabriel Peyré
 
Low Complexity Regularization of Inverse Problems - Course #1 Inverse Problems
Low Complexity Regularization of Inverse Problems - Course #1 Inverse ProblemsLow Complexity Regularization of Inverse Problems - Course #1 Inverse Problems
Low Complexity Regularization of Inverse Problems - Course #1 Inverse ProblemsGabriel Peyré
 
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Gabriel Peyré
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingEnthought, Inc.
 
Mesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic SamplingMesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic SamplingGabriel Peyré
 
Geodesic Method in Computer Vision and Graphics
Geodesic Method in Computer Vision and GraphicsGeodesic Method in Computer Vision and Graphics
Geodesic Method in Computer Vision and GraphicsGabriel Peyré
 

Was ist angesagt? (14)

Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
 
Proximal Splitting and Optimal Transport
Proximal Splitting and Optimal TransportProximal Splitting and Optimal Transport
Proximal Splitting and Optimal Transport
 
Model Selection with Piecewise Regular Gauges
Model Selection with Piecewise Regular GaugesModel Selection with Piecewise Regular Gauges
Model Selection with Piecewise Regular Gauges
 
BS1501 tutorial 2
BS1501 tutorial 2BS1501 tutorial 2
BS1501 tutorial 2
 
Signal Processing Course : Convex Optimization
Signal Processing Course : Convex OptimizationSignal Processing Course : Convex Optimization
Signal Processing Course : Convex Optimization
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future TrendCVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
 
Image denoising
Image denoisingImage denoising
Image denoising
 
Signal Processing Course : Inverse Problems Regularization
Signal Processing Course : Inverse Problems RegularizationSignal Processing Course : Inverse Problems Regularization
Signal Processing Course : Inverse Problems Regularization
 
Low Complexity Regularization of Inverse Problems - Course #1 Inverse Problems
Low Complexity Regularization of Inverse Problems - Course #1 Inverse ProblemsLow Complexity Regularization of Inverse Problems - Course #1 Inverse Problems
Low Complexity Regularization of Inverse Problems - Course #1 Inverse Problems
 
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
 
Mesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic SamplingMesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic Sampling
 
Savage-Dickey paradox
Savage-Dickey paradoxSavage-Dickey paradox
Savage-Dickey paradox
 
Geodesic Method in Computer Vision and Graphics
Geodesic Method in Computer Vision and GraphicsGeodesic Method in Computer Vision and Graphics
Geodesic Method in Computer Vision and Graphics
 

Andere mochten auch

Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forestsChristian Robert
 
Statistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: ModelsStatistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: ModelsChristian Robert
 
Chapter 0: the what and why of statistics
Chapter 0: the what and why of statisticsChapter 0: the what and why of statistics
Chapter 0: the what and why of statisticsChristian Robert
 
Testing as estimation: the demise of the Bayes factor
Testing as estimation: the demise of the Bayes factorTesting as estimation: the demise of the Bayes factor
Testing as estimation: the demise of the Bayes factorChristian Robert
 
from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABCChristian Robert
 
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015Christian Robert
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big DataChristian Robert
 
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithmno U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithmChristian Robert
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsChristian Robert
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationChristian Robert
 
Chapter 4: Decision theory and Bayesian analysis
Chapter 4: Decision theory and Bayesian analysisChapter 4: Decision theory and Bayesian analysis
Chapter 4: Decision theory and Bayesian analysisChristian Robert
 
Theory of Probability revisited
Theory of Probability revisitedTheory of Probability revisited
Theory of Probability revisitedChristian Robert
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...Christian Robert
 
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Christian Robert
 

Andere mochten auch (20)

Ab cancun
Ab cancunAb cancun
Ab cancun
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forests
 
Statistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: ModelsStatistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: Models
 
Chapter 0: the what and why of statistics
Chapter 0: the what and why of statisticsChapter 0: the what and why of statistics
Chapter 0: the what and why of statistics
 
Testing as estimation: the demise of the Bayes factor
Testing as estimation: the demise of the Bayes factorTesting as estimation: the demise of the Bayes factor
Testing as estimation: the demise of the Bayes factor
 
from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABC
 
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithmno U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
 
Chapter 4: Decision theory and Bayesian analysis
Chapter 4: Decision theory and Bayesian analysisChapter 4: Decision theory and Bayesian analysis
Chapter 4: Decision theory and Bayesian analysis
 
Big model, big data
Big model, big dataBig model, big data
Big model, big data
 
Theory of Probability revisited
Theory of Probability revisitedTheory of Probability revisited
Theory of Probability revisited
 
Jere Koskela slides
Jere Koskela slidesJere Koskela slides
Jere Koskela slides
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
 
ISBA 2016: Foundations
ISBA 2016: FoundationsISBA 2016: Foundations
ISBA 2016: Foundations
 
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
 

Mehr von Christian Robert

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceChristian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinChristian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihoodChristian Robert
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 

Mehr von Christian Robert (20)

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 

Kürzlich hochgeladen

4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 

Kürzlich hochgeladen (20)

4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 

Computational methods for Bayesian model choice

  • 1. On some computational methods for Bayesian model choice On some computational methods for Bayesian model choice Christian P. Robert CREST-INSEE and Universit´ Paris Dauphine e http://www.ceremade.dauphine.fr/~xian c Cours OFPR, CREST, Malakoff 2-12 mars 2009
  • 2. On some computational methods for Bayesian model choice Outline Introduction 1 Importance sampling solutions 2 Cross-model solutions 3 Nested sampling 4 ABC model choice 5
  • 3. On some computational methods for Bayesian model choice Introduction Bayes tests Construction of Bayes tests Definition (Test) Given an hypothesis H0 : θ ∈ Θ0 on the parameter θ ∈ Θ0 of a statistical model, a test is a statistical procedure that takes its values in {0, 1}. Example (Normal mean) For x ∼ N (θ, 1), decide whether or not θ ≤ 0.
  • 4. On some computational methods for Bayesian model choice Introduction Bayes tests Construction of Bayes tests Definition (Test) Given an hypothesis H0 : θ ∈ Θ0 on the parameter θ ∈ Θ0 of a statistical model, a test is a statistical procedure that takes its values in {0, 1}. Example (Normal mean) For x ∼ N (θ, 1), decide whether or not θ ≤ 0.
  • 5. On some computational methods for Bayesian model choice Introduction Bayes tests The 0 − 1 loss Neyman–Pearson loss for testing hypotheses Test of H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ0 . Then D = {0, 1} The 0 − 1 loss 1 − d if θ ∈ Θ0 L(θ, d) = d otherwise,
  • 6. On some computational methods for Bayesian model choice Introduction Bayes tests The 0 − 1 loss Neyman–Pearson loss for testing hypotheses Test of H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ0 . Then D = {0, 1} The 0 − 1 loss 1 − d if θ ∈ Θ0 L(θ, d) = d otherwise,
  • 7. On some computational methods for Bayesian model choice Introduction Bayes tests Type–one and type–two errors Associated with the risk R(θ, δ) = Eθ [L(θ, δ(x))] if θ ∈ Θ0 , Pθ (δ(x) = 0) = Pθ (δ(x) = 1) otherwise, Theorem (Bayes test) The Bayes estimator associated with π and with the 0 − 1 loss is if π(θ ∈ Θ0 |x) > π(θ ∈ Θ0 |x), 1 δ π (x) = 0 otherwise,
  • 8. On some computational methods for Bayesian model choice Introduction Bayes tests Type–one and type–two errors Associated with the risk R(θ, δ) = Eθ [L(θ, δ(x))] if θ ∈ Θ0 , Pθ (δ(x) = 0) = Pθ (δ(x) = 1) otherwise, Theorem (Bayes test) The Bayes estimator associated with π and with the 0 − 1 loss is if π(θ ∈ Θ0 |x) > π(θ ∈ Θ0 |x), 1 δ π (x) = 0 otherwise,
  • 9. On some computational methods for Bayesian model choice Introduction Bayes factor Bayes factor Definition (Bayes factors) For testing hypotheses H0 : θ ∈ Θ0 vs. Ha : θ ∈ Θ0 , under prior π(Θ0 )π0 (θ) + π(Θc )π1 (θ) , 0 central quantity f (x|θ)π0 (θ)dθ π(Θ0 |x) π(Θ0 ) Θ0 B01 = = π(Θc |x) π(Θc ) 0 0 f (x|θ)π1 (θ)dθ Θc 0 [Jeffreys, 1939]
  • 10. On some computational methods for Bayesian model choice Introduction Bayes factor Self-contained concept Outside decision-theoretic environment: eliminates impact of π(Θ0 ) but depends on the choice of (π0 , π1 ) Bayesian/marginal equivalent to the likelihood ratio Jeffreys’ scale of evidence: π if log10 (B10 ) between 0 and 0.5, evidence against H0 weak, π if log10 (B10 ) 0.5 and 1, evidence substantial, π if log10 (B10 ) 1 and 2, evidence strong and π if log10 (B10 ) above 2, evidence decisive Requires the computation of the marginal/evidence under both hypotheses/models
  • 11. On some computational methods for Bayesian model choice Introduction Bayes factor Hot hand Example (Binomial homogeneity) Consider H0 : yi ∼ B(ni , p) (i = 1, . . . , G) vs. H1 : yi ∼ B(ni , pi ). Conjugate priors pi ∼ Be(α = ξ/ω, β = (1 − ξ)/ω), with a uniform prior on E[pi |ξ, ω] = ξ and on p (ω is fixed) 1G 1 pyi (1 − pi )ni −yi pα−1 (1 − pi )β−1 d pi B10 = i i 0 i=1 0 ×Γ(1/ω)/[Γ(ξ/ω)Γ((1 − ξ)/ω)]dξ 1 P P i (ni −yi ) i yi (1 − p) 0p dp For instance, log10 (B10 ) = −0.79 for ω = 0.005 and G = 138 slightly favours H0 . [Kass & Raftery, 1995]
  • 12. On some computational methods for Bayesian model choice Introduction Bayes factor Hot hand Example (Binomial homogeneity) Consider H0 : yi ∼ B(ni , p) (i = 1, . . . , G) vs. H1 : yi ∼ B(ni , pi ). Conjugate priors pi ∼ Be(α = ξ/ω, β = (1 − ξ)/ω), with a uniform prior on E[pi |ξ, ω] = ξ and on p (ω is fixed) 1G 1 pyi (1 − pi )ni −yi pα−1 (1 − pi )β−1 d pi B10 = i i 0 i=1 0 ×Γ(1/ω)/[Γ(ξ/ω)Γ((1 − ξ)/ω)]dξ 1 P P i (ni −yi ) i yi (1 − p) 0p dp For instance, log10 (B10 ) = −0.79 for ω = 0.005 and G = 138 slightly favours H0 . [Kass & Raftery, 1995]
  • 13. On some computational methods for Bayesian model choice Introduction Bayes factor Hot hand Example (Binomial homogeneity) Consider H0 : yi ∼ B(ni , p) (i = 1, . . . , G) vs. H1 : yi ∼ B(ni , pi ). Conjugate priors pi ∼ Be(α = ξ/ω, β = (1 − ξ)/ω), with a uniform prior on E[pi |ξ, ω] = ξ and on p (ω is fixed) 1G 1 pyi (1 − pi )ni −yi pα−1 (1 − pi )β−1 d pi B10 = i i 0 i=1 0 ×Γ(1/ω)/[Γ(ξ/ω)Γ((1 − ξ)/ω)]dξ 1 P P i (ni −yi ) i yi (1 − p) 0p dp For instance, log10 (B10 ) = −0.79 for ω = 0.005 and G = 138 slightly favours H0 . [Kass & Raftery, 1995]
  • 14. On some computational methods for Bayesian model choice Introduction Model choice Model choice and model comparison Choice between models Several models available for the same observation Mi : x ∼ fi (x|θi ), i∈I where I can be finite or infinite Replace hypotheses with models but keep marginal likelihoods and Bayes factors
  • 15. On some computational methods for Bayesian model choice Introduction Model choice Model choice and model comparison Choice between models Several models available for the same observation Mi : x ∼ fi (x|θi ), i∈I where I can be finite or infinite Replace hypotheses with models but keep marginal likelihoods and Bayes factors
  • 16. On some computational methods for Bayesian model choice Introduction Model choice Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj Θj j take largest π(Mi |x) to determine “best” model, or use averaged predictive π(Mj |x) fj (x |θj )πj (θj |x)dθj Θj j
  • 17. On some computational methods for Bayesian model choice Introduction Model choice Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj Θj j take largest π(Mi |x) to determine “best” model, or use averaged predictive π(Mj |x) fj (x |θj )πj (θj |x)dθj Θj j
  • 18. On some computational methods for Bayesian model choice Introduction Model choice Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj Θj j take largest π(Mi |x) to determine “best” model, or use averaged predictive π(Mj |x) fj (x |θj )πj (θj |x)dθj Θj j
  • 19. On some computational methods for Bayesian model choice Introduction Model choice Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj Θj j take largest π(Mi |x) to determine “best” model, or use averaged predictive π(Mj |x) fj (x |θj )πj (θj |x)dθj Θj j
  • 20. On some computational methods for Bayesian model choice Introduction Evidence Evidence All these problems end up with a similar quantity, the evidence Zk = πk (θk )Lk (θk ) dθk , Θk aka the marginal likelihood.
  • 21. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Importance sampling Paradox Simulation from f (the true density) is not necessarily optimal Alternative to direct sampling from f is importance sampling, based on the alternative representation f (x) Ef [h(X)] = h(x) g(x) dx . g(x) X which allows us to use other distributions than f
  • 22. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Importance sampling Paradox Simulation from f (the true density) is not necessarily optimal Alternative to direct sampling from f is importance sampling, based on the alternative representation f (x) Ef [h(X)] = h(x) g(x) dx . g(x) X which allows us to use other distributions than f
  • 23. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Importance sampling algorithm Evaluation of Ef [h(X)] = h(x) f (x) dx X by Generate a sample X1 , . . . , Xn from a distribution g 1 Use the approximation 2 m f (Xj ) 1 h(Xj ) m g(Xj ) j=1
  • 24. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Bayes factor approximation When approximating the Bayes factor f0 (x|θ0 )π0 (θ0 )dθ0 Θ0 B01 = f1 (x|θ1 )π1 (θ1 )dθ1 Θ1 use of importance functions and and 0 1 n−1 n0 i i i i=1 f0 (x|θ0 )π0 (θ0 )/ 0 (θ0 ) 0 B01 = n−1 n1 i i i i=1 f1 (x|θ1 )π1 (θ1 )/ 1 (θ1 ) 1
  • 25. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Bridge sampling Special case: If π1 (θ1 |x) ∝ π1 (θ1 |x) ˜ π2 (θ2 |x) ∝ π2 (θ2 |x) ˜ live on the same space (Θ1 = Θ2 ), then n π1 (θi |x) 1 ˜ B12 ≈ θi ∼ π2 (θ|x) π2 (θi |x) n ˜ i=1 [Gelman & Meng, 1998; Chen, Shao & Ibrahim, 2000]
  • 26. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Bridge sampling variance The bridge sampling estimator does poorly if 2 π1 (θ) − π2 (θ) var(B12 ) 1 =E 2 n π2 (θ) B12 is large, i.e. if π1 and π2 have little overlap...
  • 27. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Bridge sampling variance The bridge sampling estimator does poorly if 2 π1 (θ) − π2 (θ) var(B12 ) 1 =E 2 n π2 (θ) B12 is large, i.e. if π1 and π2 have little overlap...
  • 28. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance (Further) bridge sampling In addition π2 (θ|x)α(θ)π1 (θ|x)dθ ˜ ∀ α(·) B12 = π1 (θ|x)α(θ)π2 (θ|x)dθ ˜ n1 1 π2 (θ1i |x)α(θ1i ) ˜ n1 i=1 ≈ θji ∼ πj (θ|x) n2 1 π1 (θ2i |x)α(θ2i ) ˜ n2 i=1
  • 29. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance An infamous example When 1 α(θ) = π1 (θ)˜2 (θ) ˜ π harmonic mean approximation to B12 n1 1 1/˜ (θ1i |x) π n1 i=1 1 θji ∼ πj (θ|x) B12 = n2 1 1/˜2 (θ2i |x) π n2 i=1 [Newton & Raftery, 1994] Infamous: Most often leads to an infinite variance!!! [Radford Neal’s blog, 2008]
  • 30. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance An infamous example When 1 α(θ) = π1 (θ)˜2 (θ) ˜ π harmonic mean approximation to B12 n1 1 1/˜ (θ1i |x) π n1 i=1 1 θji ∼ πj (θ|x) B12 = n2 1 1/˜2 (θ2i |x) π n2 i=1 [Newton & Raftery, 1994] Infamous: Most often leads to an infinite variance!!! [Radford Neal’s blog, 2008]
  • 31. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance “The Worst Monte Carlo Method Ever” “The good news is that the Law of Large Numbers guarantees that this estimator is consistent ie, it will very likely be very close to the correct answer if you use a sufficiently large number of points from the posterior distribution. The bad news is that the number of points required for this estimator to get close to the right answer will often be greater than the number of atoms in the observable universe. The even worse news is that itws easy for people to not realize this, and to naively accept estimates that are nowhere close to the correct value of the marginal likelihood.” [Radford Neal’s blog, Aug. 23, 2008]
  • 32. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance “The Worst Monte Carlo Method Ever” “The good news is that the Law of Large Numbers guarantees that this estimator is consistent ie, it will very likely be very close to the correct answer if you use a sufficiently large number of points from the posterior distribution. The bad news is that the number of points required for this estimator to get close to the right answer will often be greater than the number of atoms in the observable universe. The even worse news is that itws easy for people to not realize this, and to naively accept estimates that are nowhere close to the correct value of the marginal likelihood.” [Radford Neal’s blog, Aug. 23, 2008]
  • 33. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Optimal bridge sampling The optimal choice of auxiliary function is n1 + n2 α= n1 π1 (θ|x) + n2 π2 (θ|x) leading to n1 π2 (θ1i |x) 1 ˜ n1 π1 (θ1i |x) + n2 π2 (θ1i |x) n1 i=1 B12 ≈ n2 π1 (θ2i |x) 1 ˜ n1 π1 (θ2i |x) + n2 π2 (θ2i |x) n2 i=1 Back later!
  • 34. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Optimal bridge sampling (2) Reason: π1 (θ)π2 (θ)[n1 π1 (θ) + n2 π2 (θ)]α(θ)2 dθ Var(B12 ) 1 ≈ −1 2 2 n1 n2 B12 π1 (θ)π2 (θ)α(θ) dθ (by the δ method) Dependence on the unknown normalising constants solved iteratively
  • 35. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Optimal bridge sampling (2) Reason: π1 (θ)π2 (θ)[n1 π1 (θ) + n2 π2 (θ)]α(θ)2 dθ Var(B12 ) 1 ≈ −1 2 2 n1 n2 B12 π1 (θ)π2 (θ)α(θ) dθ (by the δ method) Dependence on the unknown normalising constants solved iteratively
  • 36. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Ratio importance sampling Another identity: Eϕ [˜1 (θ)/ϕ(θ)] π B12 = Eϕ [˜2 (θ)/ϕ(θ)] π for any density ϕ with sufficiently large support [Torrie & Valleau, 1977] Use of a single sample θ1 , . . . , θn from ϕ i=1 π1 (θi )/ϕ(θi ) ˜ B12 = i=1 π2 (θi )/ϕ(θi ) ˜
  • 37. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Ratio importance sampling Another identity: Eϕ [˜1 (θ)/ϕ(θ)] π B12 = Eϕ [˜2 (θ)/ϕ(θ)] π for any density ϕ with sufficiently large support [Torrie & Valleau, 1977] Use of a single sample θ1 , . . . , θn from ϕ i=1 π1 (θi )/ϕ(θi ) ˜ B12 = i=1 π2 (θi )/ϕ(θi ) ˜
  • 38. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Ratio importance sampling (2) Approximate variance: 2 (π1 (θ) − π2 (θ))2 var(B12 ) 1 = Eϕ 2 ϕ(θ)2 n B12 Optimal choice: | π1 (θ) − π2 (θ) | ϕ∗ (θ) = | π1 (η) − π2 (η) | dη [Chen, Shao & Ibrahim, 2000]
  • 39. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Ratio importance sampling (2) Approximate variance: 2 (π1 (θ) − π2 (θ))2 var(B12 ) 1 = Eϕ 2 ϕ(θ)2 n B12 Optimal choice: | π1 (θ) − π2 (θ) | ϕ∗ (θ) = | π1 (η) − π2 (η) | dη [Chen, Shao & Ibrahim, 2000]
  • 40. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Improving upon bridge sampler Theorem 5.5.3: The asymptotic variance of the optimal ratio importance sampling estimator is smaller than the asymptotic variance of the optimal bridge sampling estimator [Chen, Shao, & Ibrahim, 2000] Does not require the normalising constant | π1 (η) − π2 (η) | dη but a simulation from ϕ∗ (θ) ∝| π1 (θ) − π2 (θ) | .
  • 41. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Improving upon bridge sampler Theorem 5.5.3: The asymptotic variance of the optimal ratio importance sampling estimator is smaller than the asymptotic variance of the optimal bridge sampling estimator [Chen, Shao, & Ibrahim, 2000] Does not require the normalising constant | π1 (η) − π2 (η) | dη but a simulation from ϕ∗ (θ) ∝| π1 (θ) − π2 (θ) | .
  • 42. On some computational methods for Bayesian model choice Importance sampling solutions Varying dimensions Generalisation to point null situations When π1 (θ1 )dθ1 ˜ Θ1 B12 = π2 (θ2 )dθ2 ˜ Θ2 and Θ2 = Θ1 × Ψ, we get θ2 = (θ1 , ψ) and π1 (θ1 )ω(ψ|θ1 ) ˜ B12 = Eπ2 π2 (θ1 , ψ) ˜ holds for any conditional density ω(ψ|θ1 ).
  • 43. On some computational methods for Bayesian model choice Importance sampling solutions Varying dimensions X-dimen’al bridge sampling Generalisation of the previous identity: For any α, Eπ [˜1 (θ1 )ω(ψ|θ1 )α(θ1 , ψ)] π B12 = 2 Eπ1 ×ω [˜2 (θ1 , ψ)α(θ1 , ψ)] π and, for any density ϕ, Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)] π B12 = Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)] π [Chen, Shao, & Ibrahim, 2000] Optimal choice: ω(ψ|θ1 ) = π2 (ψ|θ1 ) [Theorem 5.8.2]
  • 44. On some computational methods for Bayesian model choice Importance sampling solutions Varying dimensions X-dimen’al bridge sampling Generalisation of the previous identity: For any α, Eπ [˜1 (θ1 )ω(ψ|θ1 )α(θ1 , ψ)] π B12 = 2 Eπ1 ×ω [˜2 (θ1 , ψ)α(θ1 , ψ)] π and, for any density ϕ, Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)] π B12 = Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)] π [Chen, Shao, & Ibrahim, 2000] Optimal choice: ω(ψ|θ1 ) = π2 (ψ|θ1 ) [Theorem 5.8.2]
  • 45. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Zk from a posterior sample Use of the [harmonic mean] identity ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1 Eπk x= dθk = πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk no matter what the proposal ϕ(·) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of the MCMC output RB-RJ
  • 46. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Zk from a posterior sample Use of the [harmonic mean] identity ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1 Eπk x= dθk = πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk no matter what the proposal ϕ(·) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of the MCMC output RB-RJ
  • 47. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than πk (θk )Lk (θk ) for the approximation T (t) ϕ(θk ) 1 Z1k = 1 (t) (t) T πk (θk )Lk (θk ) t=1 to have a finite variance. E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
  • 48. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than πk (θk )Lk (θk ) for the approximation T (t) ϕ(θk ) 1 Z1k = 1 (t) (t) T πk (θk )Lk (θk ) t=1 to have a finite variance. E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
  • 49. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Comparison with regular importance sampling (cont’d) Compare Z1k with a standard importance sampling approximation T (t) (t) πk (θk )Lk (θk ) 1 = Z2k (t) T ϕ(θk ) t=1 (t) where the θk ’s are generated from the density ϕ(·) (with fatter tails like t’s)
  • 50. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Zk using a mixture representation Bridge sampling redux Design a specific mixture for simulation [importance sampling] purposes, with density ϕk (θk ) ∝ ω1 πk (θk )Lk (θk ) + ϕ(θk ) , where ϕ(·) is arbitrary (but normalised) Note: ω1 is not a probability weight
  • 51. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Zk using a mixture representation Bridge sampling redux Design a specific mixture for simulation [importance sampling] purposes, with density ϕk (θk ) ∝ ω1 πk (θk )Lk (θk ) + ϕ(θk ) , where ϕ(·) is arbitrary (but normalised) Note: ω1 is not a probability weight
  • 52. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t Take δ (t) = 1 with probability 1 (t−1) (t−1) (t−1) (t−1) (t−1) ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) and δ (t) = 2 otherwise; (t) (t−1) If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where 2 MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated with the posterior πk (θk |x) ∝ πk (θk )Lk (θk ); (t) If δ (t) = 2, generate θk ∼ ϕ(θk ) independently 3
  • 53. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t Take δ (t) = 1 with probability 1 (t−1) (t−1) (t−1) (t−1) (t−1) ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) and δ (t) = 2 otherwise; (t) (t−1) If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where 2 MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated with the posterior πk (θk |x) ∝ πk (θk )Lk (θk ); (t) If δ (t) = 2, generate θk ∼ ϕ(θk ) independently 3
  • 54. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t Take δ (t) = 1 with probability 1 (t−1) (t−1) (t−1) (t−1) (t−1) ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) and δ (t) = 2 otherwise; (t) (t−1) If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where 2 MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated with the posterior πk (θk |x) ∝ πk (θk )Lk (θk ); (t) If δ (t) = 2, generate θk ∼ ϕ(θk ) independently 3
  • 55. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Evidence approximation by mixtures Rao-Blackwellised estimate T ˆ1 (t) (t) (t) (t) (t) ξ= ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) , T t=1 converges to ω1 Zk /{ω1 Zk + 1} ˆ Deduce Zˆ from ω1 Z3k /{ω1 Z3k + 1} = ξ ie ˆ ˆ 3k (t) (t) (t) (t) (t) T t=1 ω1 πk (θk )Lk (θk ) ω1 π(θk )Lk (θk ) + ϕ(θk ) ˆ Z3k = (t) (t) (t) (t) T t=1 ϕ(θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) [Bridge sampler]
  • 56. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Evidence approximation by mixtures Rao-Blackwellised estimate T ˆ1 (t) (t) (t) (t) (t) ξ= ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) , T t=1 converges to ω1 Zk /{ω1 Zk + 1} ˆ Deduce Zˆ from ω1 Z3k /{ω1 Z3k + 1} = ξ ie ˆ ˆ 3k (t) (t) (t) (t) (t) T t=1 ω1 πk (θk )Lk (θk ) ω1 π(θk )Lk (θk ) + ϕ(θk ) ˆ Z3k = (t) (t) (t) (t) T t=1 ϕ(θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) [Bridge sampler]
  • 57. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Chib’s representation Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and θk ∼ πk (θk ), fk (x|θk ) πk (θk ) Zk = mk (x) = πk (θk |x) Use of an approximation to the posterior ∗ ∗ fk (x|θk ) πk (θk ) Zk = mk (x) = . ˆ∗ πk (θk |x)
  • 58. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Chib’s representation Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and θk ∼ πk (θk ), fk (x|θk ) πk (θk ) Zk = mk (x) = πk (θk |x) Use of an approximation to the posterior ∗ ∗ fk (x|θk ) πk (θk ) Zk = mk (x) = . ˆ∗ πk (θk |x)
  • 59. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Case of latent variables For missing variable z as in mixture models, natural Rao-Blackwell estimate T 1 (t) ∗ ∗ πk (θk |x) = πk (θk |x, zk ) , T t=1 (t) where the zk ’s are Gibbs sampled latent variables
  • 60. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Label switching A mixture model [special case of missing variable model] is invariant under permutations of the indices of the components. E.g., mixtures 0.3N (0, 1) + 0.7N (2.3, 1) and 0.7N (2.3, 1) + 0.3N (0, 1) are exactly the same! c The component parameters θi are not identifiable marginally since they are exchangeable
  • 61. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Label switching A mixture model [special case of missing variable model] is invariant under permutations of the indices of the components. E.g., mixtures 0.3N (0, 1) + 0.7N (2.3, 1) and 0.7N (2.3, 1) + 0.3N (0, 1) are exactly the same! c The component parameters θi are not identifiable marginally since they are exchangeable
  • 62. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Connected difficulties Number of modes of the likelihood of order O(k!): 1 c Maximization and even [MCMC] exploration of the posterior surface harder Under exchangeable priors on (θ, p) [prior invariant under 2 permutation of the indices], all posterior marginals are identical: c Posterior expectation of θ1 equal to posterior expectation of θ2
  • 63. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Connected difficulties Number of modes of the likelihood of order O(k!): 1 c Maximization and even [MCMC] exploration of the posterior surface harder Under exchangeable priors on (θ, p) [prior invariant under 2 permutation of the indices], all posterior marginals are identical: c Posterior expectation of θ1 equal to posterior expectation of θ2
  • 64. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution License Since Gibbs output does not produce exchangeability, the Gibbs sampler has not explored the whole parameter space: it lacks energy to switch simultaneously enough component allocations at once 0.2 0.3 0.4 0.5 −1 0 1 2 3 µi pi −1 0 1 2 3 0 100 200 300 400 500 µ n i 0.4 0.6 0.8 1.0 0.2 0.3 0.4 0.5 σi pi 0 100 200 300 400 500 0.2 0.3 0.4 0.5 n pi 0.4 0.6 0.8 1.0 −1 0 1 2 3 σi µi 0 100 200 300 400 500 0.4 0.6 0.8 1.0 σ n i
  • 65. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Label switching paradox We should observe the exchangeability of the components [label switching] to conclude about convergence of the Gibbs sampler. If we observe it, then we do not know how to estimate the parameters. If we do not, then we are uncertain about the convergence!!!
  • 66. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Label switching paradox We should observe the exchangeability of the components [label switching] to conclude about convergence of the Gibbs sampler. If we observe it, then we do not know how to estimate the parameters. If we do not, then we are uncertain about the convergence!!!
  • 67. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Label switching paradox We should observe the exchangeability of the components [label switching] to conclude about convergence of the Gibbs sampler. If we observe it, then we do not know how to estimate the parameters. If we do not, then we are uncertain about the convergence!!!
  • 68. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Compensation for label switching (t) For mixture models, zk usually fails to visit all configurations in a balanced way, despite the symmetry predicted by the theory 1 πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x) k! σ∈S for all σ’s in Sk , set of all permutations of {1, . . . , k}. Consequences on numerical approximation, biased by an order k! Recover the theoretical symmetry by using T 1 (t) ∗ ∗ πk (θk |x) = πk (σ(θk )|x, zk ) . T k! σ∈Sk t=1 [Berkhof, Mechelen, & Gelman, 2003]
  • 69. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Compensation for label switching (t) For mixture models, zk usually fails to visit all configurations in a balanced way, despite the symmetry predicted by the theory 1 πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x) k! σ∈S for all σ’s in Sk , set of all permutations of {1, . . . , k}. Consequences on numerical approximation, biased by an order k! Recover the theoretical symmetry by using T 1 (t) ∗ ∗ πk (θk |x) = πk (σ(θk )|x, zk ) . T k! σ∈Sk t=1 [Berkhof, Mechelen, & Gelman, 2003]
  • 70. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Galaxy dataset n = 82 galaxies as a mixture of k normal distributions with both mean and variance unknown. [Roeder, 1992] Average density 0.8 0.6 Relative Frequency 0.4 0.2 0.0 −2 −1 0 1 2 3 data
  • 71. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Galaxy dataset (k) ∗ Using only the original estimate, with θk as the MAP estimator, log(mk (x)) = −105.1396 ˆ for k = 3 (based on 103 simulations), while introducing the permutations leads to log(mk (x)) = −103.3479 ˆ Note that −105.1396 + log(3!) = −103.3479 k 2 3 4 5 6 7 8 mk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 Estimations of the marginal likelihoods by the symmetrised Chib’s approximation (based on 105 Gibbs iterations and, for k > 5, 100 permutations selected at random in Sk ). [Lee, Marin, Mengersen & Robert, 2008]
  • 72. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Galaxy dataset (k) ∗ Using only the original estimate, with θk as the MAP estimator, log(mk (x)) = −105.1396 ˆ for k = 3 (based on 103 simulations), while introducing the permutations leads to log(mk (x)) = −103.3479 ˆ Note that −105.1396 + log(3!) = −103.3479 k 2 3 4 5 6 7 8 mk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 Estimations of the marginal likelihoods by the symmetrised Chib’s approximation (based on 105 Gibbs iterations and, for k > 5, 100 permutations selected at random in Sk ). [Lee, Marin, Mengersen & Robert, 2008]
  • 73. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Galaxy dataset (k) ∗ Using only the original estimate, with θk as the MAP estimator, log(mk (x)) = −105.1396 ˆ for k = 3 (based on 103 simulations), while introducing the permutations leads to log(mk (x)) = −103.3479 ˆ Note that −105.1396 + log(3!) = −103.3479 k 2 3 4 5 6 7 8 mk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 Estimations of the marginal likelihoods by the symmetrised Chib’s approximation (based on 105 Gibbs iterations and, for k > 5, 100 permutations selected at random in Sk ). [Lee, Marin, Mengersen & Robert, 2008]
  • 74. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Bayesian variable selection Regression setting: one dependent random variable y and a set {x1 , . . . , xk } of k explanatory variables. Question: Are all xi ’s involved in the regression? Assumption: every subset {i1 , . . . , iq } of q (0 ≤ q ≤ k) explanatory variables, {1n , xi1 , . . . , xiq }, is a proper set of explanatory variables for the regression of y [intercept included in every corresponding model] Computational issue 2k models in competition...
  • 75. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Bayesian variable selection Regression setting: one dependent random variable y and a set {x1 , . . . , xk } of k explanatory variables. Question: Are all xi ’s involved in the regression? Assumption: every subset {i1 , . . . , iq } of q (0 ≤ q ≤ k) explanatory variables, {1n , xi1 , . . . , xiq }, is a proper set of explanatory variables for the regression of y [intercept included in every corresponding model] Computational issue 2k models in competition...
  • 76. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Bayesian variable selection Regression setting: one dependent random variable y and a set {x1 , . . . , xk } of k explanatory variables. Question: Are all xi ’s involved in the regression? Assumption: every subset {i1 , . . . , iq } of q (0 ≤ q ≤ k) explanatory variables, {1n , xi1 , . . . , xiq }, is a proper set of explanatory variables for the regression of y [intercept included in every corresponding model] Computational issue 2k models in competition...
  • 77. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Bayesian variable selection Regression setting: one dependent random variable y and a set {x1 , . . . , xk } of k explanatory variables. Question: Are all xi ’s involved in the regression? Assumption: every subset {i1 , . . . , iq } of q (0 ≤ q ≤ k) explanatory variables, {1n , xi1 , . . . , xiq }, is a proper set of explanatory variables for the regression of y [intercept included in every corresponding model] Computational issue 2k models in competition...
  • 78. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Model notations 1 X = 1n x 1 · · · xk is the matrix containing 1n and all the k potential predictor variables Each model Mγ associated with binary indicator vector 2 γ ∈ Γ = {0, 1}k where γi = 1 means that the variable xi is included in the model Mγ qγ = 1T γ number of variables included in the model Mγ 3 n t1 (γ) and t0 (γ) indices of variables included in the model and 4 indices of variables not included in the model
  • 79. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Model indicators For β ∈ Rk+1 and X, we define βγ as the subvector βγ = β0 , (βi )i∈t1 (γ) and Xγ as the submatrix of X where only the column 1n and the columns in t1 (γ) have been left.
  • 80. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Models in competition The model Mγ is thus defined as y|γ, βγ , σ 2 , X ∼ Nn Xγ βγ , σ 2 In where βγ ∈ Rqγ +1 and σ 2 ∈ R∗ are the unknown parameters. + Warning σ 2 is common to all models and thus uses the same prior for all models
  • 81. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Models in competition The model Mγ is thus defined as y|γ, βγ , σ 2 , X ∼ Nn Xγ βγ , σ 2 In where βγ ∈ Rqγ +1 and σ 2 ∈ R∗ are the unknown parameters. + Warning σ 2 is common to all models and thus uses the same prior for all models
  • 82. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Informative G-prior Many (2k ) models in competition: we cannot expect a practitioner to specify a prior on every Mγ in a completely subjective and autonomous manner. Shortcut: We derive all priors from a single global prior associated with the so-called full model that corresponds to γ = (1, . . . , 1).
  • 83. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Prior definitions For the full model, Zellner’s G-prior: (i) β|σ 2 , X ∼ Nk+1 (β, cσ 2 (X T X)−1 ) and σ 2 ∼ π(σ 2 |X) = σ −2 ˜ For each model Mγ , the prior distribution of βγ conditional (ii) on σ 2 is fixed as −1 ˜ βγ |γ, σ 2 ∼ Nqγ +1 βγ , cσ 2 Xγ Xγ T , −1 ˜ T˜ T Xγ β and same prior on σ 2 . where βγ = Xγ Xγ
  • 84. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Prior completion The joint prior for model Mγ is the improper prior 1 T −(qγ +1)/2−1 ˜ π(βγ , σ 2 |γ) ∝ σ2 exp − βγ − βγ 2(cσ 2 ) ˜ T (Xγ Xγ ) βγ − βγ .
  • 85. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Prior competition (2) Infinitely many ways of defining a prior on the model index γ: choice of uniform prior π(γ|X) = 2−k . Posterior distribution of γ central to variable selection since it is proportional to marginal density of y on Mγ (or evidence of Mγ ) π(γ|y, X) ∝ f (y|γ, X)π(γ|X) ∝ f (y|γ, X) f (y|γ, β, σ 2 , X)π(β|γ, σ 2 , X) dβ π(σ 2 |X) dσ 2 . =
  • 86. On some computational methods for Bayesian model choice Cross-model solutions Variable selection f (y|γ, σ 2 , X) = f (y|γ, β, σ 2 )π(β|γ, σ 2 ) dβ 1T −n/2 = (c + 1)−(qγ +1)/2 (2π)−n/2 σ 2 exp − yy 2σ 2 1 −1 ˜T T ˜ cy T Xγ Xγ Xγ T T Xγ y − βγ Xγ Xγ βγ + , 2σ 2 (c + 1) this posterior density satisfies −1 cT π(γ|y, X) ∝ (c + 1)−(qγ +1)/2 y T y − T T y Xγ Xγ Xγ Xγ y c+1 −n/2 1 ˜T T ˜ − β X Xγ βγ . c+1 γ γ
  • 87. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Pine processionary caterpillars t1 (γ) π(γ|y, X) 0,1,2,4,5 0.2316 0,1,2,4,5,9 0.0374 0,1,9 0.0344 0,1,2,4,5,10 0.0328 0,1,4,5 0.0306 0,1,2,9 0.0250 0,1,2,4,5,7 0.0241 0,1,2,4,5,8 0.0238 0,1,2,4,5,6 0.0237 0,1,2,3,4,5 0.0232 0,1,6,9 0.0146 0,1,2,3,9 0.0145 0,9 0.0143 0,1,2,6,9 0.0135 0,1,4,5,9 0.0128 0,1,3,9 0.0117 0,1,2,8 0.0115
  • 88. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Pine processionary caterpillars (cont’d) Interpretation Model Mγ with the highest posterior probability is t1 (γ) = (1, 2, 4, 5), which corresponds to the variables - altitude, - slope, - height of the tree sampled in the center of the area, and - diameter of the tree sampled in the center of the area. Corresponds to the five variables identified in the R regression output
  • 89. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Pine processionary caterpillars (cont’d) Interpretation Model Mγ with the highest posterior probability is t1 (γ) = (1, 2, 4, 5), which corresponds to the variables - altitude, - slope, - height of the tree sampled in the center of the area, and - diameter of the tree sampled in the center of the area. Corresponds to the five variables identified in the R regression output
  • 90. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Noninformative extension For Zellner noninformative prior with π(c) = 1/c, we have ∞ c−1 (c + 1)−(qγ +1)/2 y T y− π(γ|y, X) ∝ c=1 −n/2 −1 cT T T y Xγ Xγ Xγ Xγ y . c+1
  • 91. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Pine processionary caterpillars t1 (γ) π(γ|y, X) 0,1,2,4,5 0.0929 0,1,2,4,5,9 0.0325 0,1,2,4,5,10 0.0295 0,1,2,4,5,7 0.0231 0,1,2,4,5,8 0.0228 0,1,2,4,5,6 0.0228 0,1,2,3,4,5 0.0224 0,1,2,3,4,5,9 0.0167 0,1,2,4,5,6,9 0.0167 0,1,2,4,5,8,9 0.0137 0,1,4,5 0.0110 0,1,2,4,5,9,10 0.0100 0,1,2,3,9 0.0097 0,1,2,9 0.0093 0,1,2,4,5,7,9 0.0092 0,1,2,6,9 0.0092
  • 92. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Stochastic search for the most likely model When k gets large, impossible to compute the posterior probabilities of the 2k models. Need of a tailored algorithm that samples from π(γ|y, X) and selects the most likely models. Can be done by Gibbs sampling, given the availability of the full conditional posterior probabilities of the γi ’s. If γ−i = (γ1 , . . . , γi−1 , γi+1 , . . . , γk ) (1 ≤ i ≤ k) π(γi |y, γ−i , X) ∝ π(γ|y, X) (to be evaluated in both γi = 0 and γi = 1)
  • 93. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Stochastic search for the most likely model When k gets large, impossible to compute the posterior probabilities of the 2k models. Need of a tailored algorithm that samples from π(γ|y, X) and selects the most likely models. Can be done by Gibbs sampling, given the availability of the full conditional posterior probabilities of the γi ’s. If γ−i = (γ1 , . . . , γi−1 , γi+1 , . . . , γk ) (1 ≤ i ≤ k) π(γi |y, γ−i , X) ∝ π(γ|y, X) (to be evaluated in both γi = 0 and γi = 1)
  • 94. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Gibbs sampling for variable selection Initialization: Draw γ 0 from the uniform distribution on Γ (t−1) (t−1) Iteration t: Given (γ1 , . . . , γk ), generate (t) (t−1) (t−1) 1. γ1 according to π(γ1 |y, γ2 , . . . , γk , X) (t) 2. γ2 according to (t) (t−1) (t−1) π(γ2 |y, γ1 , γ3 , . . . , γk , X) . . . (t) (t) (t) p. γk according to π(γk |y, γ1 , . . . , γk−1 , X)
  • 95. On some computational methods for Bayesian model choice Cross-model solutions Variable selection MCMC interpretation After T 1 MCMC iterations, output used to approximate the posterior probabilities π(γ|y, X) by empirical averages T 1 π(γ|y, X) = Iγ (t) =γ . T − T0 + 1 t=T0 where the T0 first values are eliminated as burnin. And approximation of the probability to include i-th variable, T 1 P π (γi = 1|y, X) = Iγ (t) =1 . T − T0 + 1 i t=T0
  • 96. On some computational methods for Bayesian model choice Cross-model solutions Variable selection MCMC interpretation After T 1 MCMC iterations, output used to approximate the posterior probabilities π(γ|y, X) by empirical averages T 1 π(γ|y, X) = Iγ (t) =γ . T − T0 + 1 t=T0 where the T0 first values are eliminated as burnin. And approximation of the probability to include i-th variable, T 1 P π (γi = 1|y, X) = Iγ (t) =1 . T − T0 + 1 i t=T0
  • 97. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Pine processionary caterpillars P π (γi = 1|y, X) P π (γi = 1|y, X) γi γ1 0.8624 0.8844 γ2 0.7060 0.7716 γ3 0.1482 0.2978 γ4 0.6671 0.7261 γ5 0.6515 0.7006 γ6 0.1678 0.3115 γ7 0.1371 0.2880 γ8 0.1555 0.2876 γ9 0.4039 0.5168 γ10 0.1151 0.2609 ˜ Probabilities of inclusion with both informative (β = 011 , c = 100) and noninformative Zellner’s priors
  • 98. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Reversible jump Idea: Set up a proper measure–theoretic framework for designing moves between models Mk [Green, 1995] Create a reversible kernel K on H = k {k} × Θk such that K(x, dy)π(x)dx = K(y, dx)π(y)dy A B B A for the invariant density π [x is of the form (k, θ(k) )]
  • 99. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Reversible jump Idea: Set up a proper measure–theoretic framework for designing moves between models Mk [Green, 1995] Create a reversible kernel K on H = k {k} × Θk such that K(x, dy)π(x)dx = K(y, dx)π(y)dy A B B A for the invariant density π [x is of the form (k, θ(k) )]
  • 100. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Local moves For a move between two models, M1 and M2 , the Markov chain being in state θ1 ∈ M1 , denote by K1→2 (θ1 , dθ) and K2→1 (θ2 , dθ) the corresponding kernels, under the detailed balance condition π(dθ1 ) K1→2 (θ1 , dθ) = π(dθ2 ) K2→1 (θ2 , dθ) , and take, wlog, dim(M2 ) > dim(M1 ). Proposal expressed as θ2 = Ψ1→2 (θ1 , v1→2 ) where v1→2 is a random variable of dimension dim(M2 ) − dim(M1 ), generated as v1→2 ∼ ϕ1→2 (v1→2 ) .
  • 101. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Local moves For a move between two models, M1 and M2 , the Markov chain being in state θ1 ∈ M1 , denote by K1→2 (θ1 , dθ) and K2→1 (θ2 , dθ) the corresponding kernels, under the detailed balance condition π(dθ1 ) K1→2 (θ1 , dθ) = π(dθ2 ) K2→1 (θ2 , dθ) , and take, wlog, dim(M2 ) > dim(M1 ). Proposal expressed as θ2 = Ψ1→2 (θ1 , v1→2 ) where v1→2 is a random variable of dimension dim(M2 ) − dim(M1 ), generated as v1→2 ∼ ϕ1→2 (v1→2 ) .
  • 102. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Local moves (2) In this case, q1→2 (θ1 , dθ2 ) has density −1 ∂Ψ1→2 (θ1 , v1→2 ) ϕ1→2 (v1→2 ) , ∂(θ1 , v1→2 ) by the Jacobian rule. Reverse importance link If probability 1→2 of choosing move to M2 while in M1 , acceptance probability reduces to π(M2 , θ2 ) 2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1∧ . π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) c Difficult calibration
  • 103. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Local moves (2) In this case, q1→2 (θ1 , dθ2 ) has density −1 ∂Ψ1→2 (θ1 , v1→2 ) ϕ1→2 (v1→2 ) , ∂(θ1 , v1→2 ) by the Jacobian rule. Reverse importance link If probability 1→2 of choosing move to M2 while in M1 , acceptance probability reduces to π(M2 , θ2 ) 2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1∧ . π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) c Difficult calibration
  • 104. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Local moves (2) In this case, q1→2 (θ1 , dθ2 ) has density −1 ∂Ψ1→2 (θ1 , v1→2 ) ϕ1→2 (v1→2 ) , ∂(θ1 , v1→2 ) by the Jacobian rule. Reverse importance link If probability 1→2 of choosing move to M2 while in M1 , acceptance probability reduces to π(M2 , θ2 ) 2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1∧ . π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) c Difficult calibration
  • 105. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Interpretation The representation puts us back in a fixed dimension setting: M1 × V1→2 and M2 in one-to-one relation. reversibility imposes that θ1 is derived as (θ1 , v1→2 ) = Ψ−1 (θ2 ) 1→2 appears like a regular Metropolis–Hastings move from the couple (θ1 , v1→2 ) to θ2 when stationary distributions are π(M1 , θ1 ) × ϕ1→2 (v1→2 ) and π(M2 , θ2 ), and when proposal distribution is deterministic (??)
  • 106. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Interpretation The representation puts us back in a fixed dimension setting: M1 × V1→2 and M2 in one-to-one relation. reversibility imposes that θ1 is derived as (θ1 , v1→2 ) = Ψ−1 (θ2 ) 1→2 appears like a regular Metropolis–Hastings move from the couple (θ1 , v1→2 ) to θ2 when stationary distributions are π(M1 , θ1 ) × ϕ1→2 (v1→2 ) and π(M2 , θ2 ), and when proposal distribution is deterministic (??)
  • 107. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Pseudo-deterministic reasoning Consider the proposals θ2 ∼ N (Ψ1→2 (θ1 , v1→2 ), ε) Ψ1→2 (θ1 , v1→2 ) ∼ N (θ2 , ε) and Reciprocal proposal has density exp −(θ2 − Ψ1→2 (θ1 , v1→2 ))2 /2ε ∂Ψ1→2 (θ1 , v1→2 ) √ × ∂(θ1 , v1→2 ) 2πε by the Jacobian rule. Thus Metropolis–Hastings acceptance probability is π(M2 , θ2 ) ∂Ψ1→2 (θ1 , v1→2 ) 1∧ π(M1 , θ1 ) ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) Does not depend on ε: Let ε go to 0
  • 108. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Pseudo-deterministic reasoning Consider the proposals θ2 ∼ N (Ψ1→2 (θ1 , v1→2 ), ε) Ψ1→2 (θ1 , v1→2 ) ∼ N (θ2 , ε) and Reciprocal proposal has density exp −(θ2 − Ψ1→2 (θ1 , v1→2 ))2 /2ε ∂Ψ1→2 (θ1 , v1→2 ) √ × ∂(θ1 , v1→2 ) 2πε by the Jacobian rule. Thus Metropolis–Hastings acceptance probability is π(M2 , θ2 ) ∂Ψ1→2 (θ1 , v1→2 ) 1∧ π(M1 , θ1 ) ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) Does not depend on ε: Let ε go to 0
  • 109. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Generic reversible jump acceptance probability If several models are considered simultaneously, with probability 1→2 of choosing move to M2 while in M1 , as in ∞ XZ K(x, B) = ρm (x, y)qm (x, dy) + ω(x)IB (x) m=1 acceptance probability of θ2 = Ψ1→2 (θ1 , v1→2 ) is π(M2 , θ2 ) 2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) while acceptance probability of θ1 with (θ1 , v1→2 ) = Ψ−1 (θ2 ) is 1→2 −1 π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M2 , θ2 ) 2→1 ∂(θ1 , v1→2 )
  • 110. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Generic reversible jump acceptance probability If several models are considered simultaneously, with probability 1→2 of choosing move to M2 while in M1 , as in ∞ XZ K(x, B) = ρm (x, y)qm (x, dy) + ω(x)IB (x) m=1 acceptance probability of θ2 = Ψ1→2 (θ1 , v1→2 ) is π(M2 , θ2 ) 2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) while acceptance probability of θ1 with (θ1 , v1→2 ) = Ψ−1 (θ2 ) is 1→2 −1 π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M2 , θ2 ) 2→1 ∂(θ1 , v1→2 )
  • 111. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Generic reversible jump acceptance probability If several models are considered simultaneously, with probability 1→2 of choosing move to M2 while in M1 , as in ∞ XZ K(x, B) = ρm (x, y)qm (x, dy) + ω(x)IB (x) m=1 acceptance probability of θ2 = Ψ1→2 (θ1 , v1→2 ) is π(M2 , θ2 ) 2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) while acceptance probability of θ1 with (θ1 , v1→2 ) = Ψ−1 (θ2 ) is 1→2 −1 π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M2 , θ2 ) 2→1 ∂(θ1 , v1→2 )
  • 112. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Green’s sampler Algorithm Iteration t (t ≥ 1): if x(t) = (m, θ(m) ), Select model Mn with probability πmn 1 Generate umn ∼ ϕmn (u) and set 2 (θ(n) , vnm ) = Ψm→n (θ(m) , umn ) Take x(t+1) = (n, θ(n) ) with probability 3 π(n, θ(n) ) πnm ϕnm (vnm ) ∂Ψm→n (θ(m) , umn ) min ,1 π(m, θ(m) ) πmn ϕmn (umn ) ∂(θ(m) , umn ) and take x(t+1) = x(t) otherwise.
  • 113. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Mixture of normal distributions   k   2 pjk N (µjk , σjk ) Mk = (pjk , µjk , σjk );   j=1 Restrict moves from Mk to adjacent models, like Mk+1 and Mk−1 , with probabilities πk(k+1) and πk(k−1) .
  • 114. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Mixture of normal distributions   k   2 pjk N (µjk , σjk ) Mk = (pjk , µjk , σjk );   j=1 Restrict moves from Mk to adjacent models, like Mk+1 and Mk−1 , with probabilities πk(k+1) and πk(k−1) .