SlideShare ist ein Scribd-Unternehmen logo
1 von 192
Downloaden Sie, um offline zu lesen
Bayesian Core:A Practical Approach to Computational Bayesian Statistics




                          Bayesian Core:
          A Practical Approach to Computational Bayesian
                             Statistics

                            Jean-Michel Marin & Christian P. Robert


                                  QUT, Brisbane, August 4–8 2008




                                                                          1 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics




Outline

           The normal model
      1


           Regression and variable selection
      2


           Generalized linear models
      3


           Capture-recapture experiments
      4


           Mixture models
      5


           Dynamic models
      6


           Image analysis
      7


                                                                          2 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model




The normal model


            The normal model
       1
              Normal problems
              The Bayesian toolbox
              Prior selection
              Bayesian estimation
              Confidence regions
              Testing
              Monte Carlo integration
              Prediction




                                                                          3 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Normal problems



Normal model
       Sample
                                                      x1 , . . . , xn
       from a normal N (µ, σ 2 ) distribution

                                                         normal sample
                 0.6
                 0.5
                 0.4
       Density

                 0.3
                 0.2
                 0.1
                 0.0




                            −2                 −1                 0       1   2




                                                                                  4 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Normal problems



Inference on (µ, σ) based on this sample




               Estimation of [transforms of] (µ, σ)




                                                                          5 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Normal problems



Inference on (µ, σ) based on this sample




               Estimation of [transforms of] (µ, σ)
               Confidence region [interval] on (µ, σ)




                                                                          6 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Normal problems



Inference on (µ, σ) based on this sample




               Estimation of [transforms of] (µ, σ)
               Confidence region [interval] on (µ, σ)
               Test on (µ, σ) and comparison with other samples




                                                                          7 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Normal problems



Datasets

       Larcenies =normaldata




                                                                          4
       Relative changes in reported




                                                                          3
       larcenies between 1991 and
       1995 (relative to 1991) for


                                                                          2
       the 90 most populous US
       counties (Source: FBI)
                                                                          1
                                                                          0


                                                                              −0.4   −0.2   0.0   0.2   0.4   0.6




                                                                                                                    8 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Normal problems




       Cosmological background =CMBdata

     Spectral representation of the
     “cosmological microwave
     background” (CMB),
     i.e. electromagnetic radiation
     from photons back to 300, 000
     years after the Big Bang,
     expressed as difference in
     apparent temperature from the
     mean temperature




                                                                          9 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Normal problems




       Cosmological background =CMBdata
       Normal estimation


                                         5
                                         4
                                         3
                                         2
                                         1
                                         0




                                             −0.1   0.0   0.1   0.2     0.3   0.4   0.5   0.6

                                                                  CMB




                                                                                                10 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Normal problems




       Cosmological background =CMBdata
       Normal estimation


                                         5
                                         4
                                         3
                                         2
                                         1
                                         0




                                             −0.1   0.0   0.1   0.2     0.3   0.4   0.5   0.6

                                                                  CMB




                                                                                                11 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     The Bayesian toolbox



The Bayesian toolbox


                        Bayes theorem = Inversion of probabilities




                                                                          12 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     The Bayesian toolbox



The Bayesian toolbox


                        Bayes theorem = Inversion of probabilities

       If A and E are events such that P (E) = 0, P (A|E) and P (E|A)
       are related by

                                                         P (E|A)P (A)
                       P (A|E) =
                                                P (E|A)P (A) + P (E|Ac )P (Ac )
                                                P (E|A)P (A)
                                        =
                                                    P (E)




                                                                                  13 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     The Bayesian toolbox



Who’s Bayes?
       Reverend Thomas Bayes (ca. 1702–1761)
       Presbyterian minister in
       Tunbridge Wells (Kent) from
       1731, son of Joshua Bayes,
       nonconformist minister. Election
       to the Royal Society based on a
       tract of 1736 where he defended
       the views and philosophy of
       Newton.




                                                                          14 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     The Bayesian toolbox



Who’s Bayes?
       Reverend Thomas Bayes (ca. 1702–1761)
       Presbyterian minister in
       Tunbridge Wells (Kent) from
       1731, son of Joshua Bayes,
       nonconformist minister. Election
       to the Royal Society based on a
       tract of 1736 where he defended
       the views and philosophy of
       Newton.
       Sole probability paper, “Essay Towards Solving a Problem in the
       Doctrine of Chances”, published posthumously in 1763 by Pierce
       and containing the seeds of Bayes’ Theorem.

                                                                          15 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     The Bayesian toolbox



New perspective


               Uncertainty on the parameters θ of a model modeled through
               a probability distribution π on Θ, called prior distribution




                                                                              16 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     The Bayesian toolbox



New perspective


               Uncertainty on the parameters θ of a model modeled through
               a probability distribution π on Θ, called prior distribution
               Inference based on the distribution of θ conditional on x,
               π(θ|x), called posterior distribution

                                                               f (x|θ)π(θ)
                                           π(θ|x) =                           .
                                                               f (x|θ)π(θ) dθ




                                                                                  17 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     The Bayesian toolbox



Bayesian model


       A Bayesian statistical model is made of
               a likelihood
          1

                                                            f (x|θ),




                                                                          18 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     The Bayesian toolbox



Bayesian model


       A Bayesian statistical model is made of
               a likelihood
          1

                                                            f (x|θ),
               and of
               a prior distribution on the parameters,
          2



                                                              π(θ) .




                                                                          19 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     The Bayesian toolbox



Justifications


               Semantic drift from unknown θ to random θ




                                                                          20 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     The Bayesian toolbox



Justifications


               Semantic drift from unknown θ to random θ
               Actualization of information/knowledge on θ by extracting
               information/knowledge on θ contained in the observation x




                                                                           21 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     The Bayesian toolbox



Justifications


               Semantic drift from unknown θ to random θ
               Actualization of information/knowledge on θ by extracting
               information/knowledge on θ contained in the observation x
               Allows incorporation of imperfect/imprecise information in the
               decision process




                                                                                22 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     The Bayesian toolbox



Justifications


               Semantic drift from unknown θ to random θ
               Actualization of information/knowledge on θ by extracting
               information/knowledge on θ contained in the observation x
               Allows incorporation of imperfect/imprecise information in the
               decision process
               Unique mathematical way to condition upon the observations
               (conditional perspective)




                                                                                23 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     The Bayesian toolbox




       Example (Normal illustration (σ 2 = 1))
       Assume
                                            π(θ) = exp{−θ} Iθ>0




                                                                          24 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     The Bayesian toolbox




       Example (Normal illustration (σ 2 = 1))
       Assume
                                            π(θ) = exp{−θ} Iθ>0
        Then

                π(θ|x1 , . . . , xn ) ∝ exp{−θ} exp{−n(θ − x)2 /2} Iθ>0
                                             ∝ exp −nθ2 /2 + θ(nx − 1) Iθ>0
                                             ∝ exp −n(θ − (x − 1/n))2 /2 Iθ>0




                                                                                25 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     The Bayesian toolbox


       Example (Normal illustration (2))
       Truncated normal distribution

                                              N + ((x − 1/n), 1/n)



                                                              mu=−0.08935864
                            2.5
                            2.0
                            1.5
                            1.0
                            0.5
                            0.0




                                  0.0       0.1         0.2                    0.3   0.4   0.5




                                                                                                 26 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     The Bayesian toolbox



Prior and posterior distributions

       Given f (x|θ) and π(θ), several distributions of interest:
               the joint distribution of (θ, x),
          1



                                                ϕ(θ, x) = f (x|θ)π(θ) ;




                                                                          27 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     The Bayesian toolbox



Prior and posterior distributions

       Given f (x|θ) and π(θ), several distributions of interest:
               the joint distribution of (θ, x),
          1



                                                ϕ(θ, x) = f (x|θ)π(θ) ;

               the marginal distribution of x,
          2




                                          m(x) =                  ϕ(θ, x) dθ

                                                      =           f (x|θ)π(θ) dθ ;




                                                                                     28 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     The Bayesian toolbox




               the posterior distribution of θ,
          3



                                                                   f (x|θ)π(θ)
                                          π(θ|x) =
                                                                   f (x|θ)π(θ) dθ
                                                                f (x|θ)π(θ)
                                                        =                    ;
                                                                    m(x)


               the predictive distribution of y, when y ∼ g(y|θ, x),
          4




                                         g(y|x) =            g(y|θ, x)π(θ|x)dθ .




                                                                                    29 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     The Bayesian toolbox



Posterior distribution center of to Bayesian inference

                                             π(θ|x) ∝ f (x|θ) π(θ)


               Operates conditional upon the observations




                                                                          30 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     The Bayesian toolbox



Posterior distribution center of to Bayesian inference

                                             π(θ|x) ∝ f (x|θ) π(θ)


               Operates conditional upon the observations
               Integrate simultaneously prior information/knowledge and
               information brought by x




                                                                          31 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     The Bayesian toolbox



Posterior distribution center of to Bayesian inference

                                             π(θ|x) ∝ f (x|θ) π(θ)


               Operates conditional upon the observations
               Integrate simultaneously prior information/knowledge and
               information brought by x
               Avoids averaging over the unobserved values of x




                                                                          32 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     The Bayesian toolbox



Posterior distribution center of to Bayesian inference

                                             π(θ|x) ∝ f (x|θ) π(θ)


               Operates conditional upon the observations
               Integrate simultaneously prior information/knowledge and
               information brought by x
               Avoids averaging over the unobserved values of x
               Coherent updating of the information available on θ,
               independent of the order in which i.i.d. observations are
               collected



                                                                           33 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     The Bayesian toolbox



Posterior distribution center of to Bayesian inference

                                             π(θ|x) ∝ f (x|θ) π(θ)


               Operates conditional upon the observations
               Integrate simultaneously prior information/knowledge and
               information brought by x
               Avoids averaging over the unobserved values of x
               Coherent updating of the information available on θ,
               independent of the order in which i.i.d. observations are
               collected
               Provides a complete inferential scope and an unique motor of
               inference

                                                                              34 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     The Bayesian toolbox




       Example (Normal-normal case)
       Consider x|θ ∼ N (θ, 1) and θ ∼ N (a, 10).

                                                                          (x − θ)2 (θ − a)2
               π(θ|x) ∝ f (x|θ)π(θ) ∝ exp −                                       −
                                                                             2        20
                                    11θ2
                             ∝ exp −     + θ(x + a/10)
                                     20
                                    11
                             ∝ exp − {θ − ((10x + a)/11)}2
                                    20




                                                                                              35 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     The Bayesian toolbox




       Example (Normal-normal case)
       Consider x|θ ∼ N (θ, 1) and θ ∼ N (a, 10).

                                                                          (x − θ)2 (θ − a)2
               π(θ|x) ∝ f (x|θ)π(θ) ∝ exp −                                       −
                                                                             2        20
                                    11θ2
                             ∝ exp −     + θ(x + a/10)
                                     20
                                    11
                             ∝ exp − {θ − ((10x + a)/11)}2
                                    20

       and
                                    θ|x ∼ N (10x + a) 11, 10 11



                                                                                              36 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Prior selection



              The prior distribution is the key to Bayesian inference




                                                                          37 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Prior selection



              The prior distribution is the key to Bayesian inference
       But...
       In practice, it seldom occurs that the available prior information is
       precise enough to lead to an exact determination of the prior
       distribution




                                                                               38 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Prior selection



              The prior distribution is the key to Bayesian inference
       But...
       In practice, it seldom occurs that the available prior information is
       precise enough to lead to an exact determination of the prior
       distribution
                       There is no such thing as the prior distribution!




                                                                               39 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Strategies for prior determination
               Ungrounded prior distributions produce
               unjustified posterior inference.
               —Anonymous, ca. 2006




                                                                          40 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Strategies for prior determination
               Ungrounded prior distributions produce
               unjustified posterior inference.
               —Anonymous, ca. 2006


               Use a partition of Θ in sets (e.g., intervals), determine the
               probability of each set, and approach π by an histogram




                                                                               41 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Strategies for prior determination
               Ungrounded prior distributions produce
               unjustified posterior inference.
               —Anonymous, ca. 2006


               Use a partition of Θ in sets (e.g., intervals), determine the
               probability of each set, and approach π by an histogram
               Select significant elements of Θ, evaluate their respective
               likelihoods and deduce a likelihood curve proportional to π




                                                                               42 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Strategies for prior determination
               Ungrounded prior distributions produce
               unjustified posterior inference.
               —Anonymous, ca. 2006


               Use a partition of Θ in sets (e.g., intervals), determine the
               probability of each set, and approach π by an histogram
               Select significant elements of Θ, evaluate their respective
               likelihoods and deduce a likelihood curve proportional to π
               Use the marginal distribution of x,

                                             m(x) =              f (x|θ)π(θ) dθ
                                                             Θ




                                                                                  43 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Strategies for prior determination
               Ungrounded prior distributions produce
               unjustified posterior inference.
               —Anonymous, ca. 2006


               Use a partition of Θ in sets (e.g., intervals), determine the
               probability of each set, and approach π by an histogram
               Select significant elements of Θ, evaluate their respective
               likelihoods and deduce a likelihood curve proportional to π
               Use the marginal distribution of x,

                                             m(x) =              f (x|θ)π(θ) dθ
                                                             Θ


               Empirical and hierarchical Bayes techniques
                                                                                  44 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Conjugate priors

       Specific parametric family with analytical properties
       Conjugate prior
        A family F of probability distributions on Θ is conjugate for a
       likelihood function f (x|θ) if, for every π ∈ F, the posterior
       distribution π(θ|x) also belongs to F.




                                                                          45 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Conjugate priors

       Specific parametric family with analytical properties
       Conjugate prior
        A family F of probability distributions on Θ is conjugate for a
       likelihood function f (x|θ) if, for every π ∈ F, the posterior
       distribution π(θ|x) also belongs to F.

       Only of interest when F is parameterised : switching from prior to
       posterior distribution is reduced to an updating of the
       corresponding parameters.



                                                                            46 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Justifications


               Limited/finite information conveyed by x
               Preservation of the structure of π(θ)




                                                                          47 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Justifications


               Limited/finite information conveyed by x
               Preservation of the structure of π(θ)
               Exchangeability motivations
               Device of virtual past observations




                                                                          48 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Justifications


               Limited/finite information conveyed by x
               Preservation of the structure of π(θ)
               Exchangeability motivations
               Device of virtual past observations
               Linearity of some estimators
               But mostly...




                                                                          49 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Justifications


               Limited/finite information conveyed by x
               Preservation of the structure of π(θ)
               Exchangeability motivations
               Device of virtual past observations
               Linearity of some estimators
               But mostly... tractability and simplicity




                                                                          50 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Justifications


               Limited/finite information conveyed by x
               Preservation of the structure of π(θ)
               Exchangeability motivations
               Device of virtual past observations
               Linearity of some estimators
               But mostly... tractability and simplicity
               First approximations to adequate priors, backed up by
               robustness analysis




                                                                          51 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Exponential families

       Sampling models of interest
       Exponential family
       The family of distributions

                               f (x|θ) = C(θ)h(x) exp{R(θ) · T (x)}

       is called an exponential family of dimension k. When Θ ⊂ Rk ,
       X ⊂ Rk and

                                  f (x|θ) = h(x) exp{θ · x − Ψ(θ)},

        the family is said to be natural.


                                                                          52 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Analytical properties of exponential families



               Sufficient statistics (Pitman–Koopman Lemma)




                                                                          53 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Analytical properties of exponential families



               Sufficient statistics (Pitman–Koopman Lemma)
               Common enough structure (normal, Poisson, &tc...)




                                                                          54 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Analytical properties of exponential families



               Sufficient statistics (Pitman–Koopman Lemma)
               Common enough structure (normal, Poisson, &tc...)
               Analyticity (E[x] = ∇Ψ(θ), ...)




                                                                          55 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Analytical properties of exponential families



               Sufficient statistics (Pitman–Koopman Lemma)
               Common enough structure (normal, Poisson, &tc...)
               Analyticity (E[x] = ∇Ψ(θ), ...)
               Allow for conjugate priors

                                π(θ|µ, λ) = K(µ, λ) eθ.µ−λΨ(θ)            λ>0




                                                                                56 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Standard exponential families

          f (x|θ)              π(θ)                            π(θ|x)
          Normal             Normal
          N (θ, σ 2 )        N (µ, τ 2 )        N (ρ(σ 2 µ + τ 2 x), ρσ 2 τ 2 )
                                                           ρ−1 = σ 2 + τ 2
          Poisson            Gamma
             P(θ)            G(α, β)                    G(α + x, β + 1)
          Gamma              Gamma
           G(ν, θ)            G(α, β)                   G(α + ν, β + x)
         Binomial             Beta
           B(n, θ)           Be(α, β)              Be(α + x, β + n − x)



                                                                                  57 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



More...


              f (x|θ)                            π(θ)                                π(θ|x)
         Negative Binomial                      Beta
                 N eg(m, θ)                    Be(α, β)                        Be(α + m, β + x)
              Multinomial                      Dirichlet
            Mk (θ1 , . . . , θk )            D(α1 , . . . , αk )           D(α1 + x1 , . . . , αk + xk )
                    Normal                     Gamma
                                                                          G(α + 0.5, β + (µ − x)2 /2)
                 N (µ, 1/θ)                    Ga(α, β)




                                                                                                           58 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Linearity of the posterior mean

       If
                                          θ ∼ πλ,µ (θ) ∝ eθ·µ−λΨ(θ)
       with µ ∈ X , then
                                              µ
                                                Eπ [∇Ψ(θ)] =
                                                .
                                              λ
       where ∇Ψ(θ) = (∂Ψ(θ)/∂θ1 , . . . , ∂Ψ(θ)/∂θp )




                                                                          59 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Linearity of the posterior mean

       If
                                          θ ∼ πλ,µ (θ) ∝ eθ·µ−λΨ(θ)
       with µ ∈ X , then
                                                       µ
                                                Eπ [∇Ψ(θ)] =
                                                         .
                                                       λ
       where ∇Ψ(θ) = (∂Ψ(θ)/∂θ1 , . . . , ∂Ψ(θ)/∂θp )
       Therefore, if x1 , . . . , xn are i.i.d. f (x|θ),
                                                                          µ + n¯
                                                                               x
                                  Eπ [∇Ψ(θ)|x1 , . . . , xn ] =                  .
                                                                          λ+n




                                                                                     60 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection




       Example (Normal-normal)
       In the normal N (θ, σ 2 ) case, conjugate also normal N (µ, τ 2 ) and

                             Eπ [∇Ψ(θ)|x] = Eπ [θ|x] = ρ(σ 2 µ + τ 2 x)

       where
                                                  ρ−1 = σ 2 + τ 2




                                                                               61 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection




       Example (Full normal)
       In the normal N (µ, σ 2 ) case, when both µ and σ are unknown,
       there still is a conjugate prior on θ = (µ, σ 2 ), of the form

                              (σ 2 )−λσ exp − λµ (µ − ξ)2 + α /2σ 2




                                                                          62 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection




       Example (Full normal)
       In the normal N (µ, σ 2 ) case, when both µ and σ are unknown,
       there still is a conjugate prior on θ = (µ, σ 2 ), of the form

                              (σ 2 )−λσ exp − λµ (µ − ξ)2 + α /2σ 2

       since

        π(µ, σ 2 |x1 , . . . , xn )     ∝ (σ 2 )−λσ exp − λµ (µ − ξ)2 + α /2σ 2
                                                      ×(σ 2 )−n/2 exp − n(µ − x)2 + s2 /2σ 2
                                                                                     x

                                        ∝ (σ 2 )−λσ −n/2 exp − (λµ + n)(µ − ξx )2

                                                                          nλµ (x − ξ)2
                                                       +α + s2 +                         /2σ 2
                                                             x
                                                                            n + λµ


                                                                                                 63 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection




                                                       parameters (0,1,1,1)
           2.0
           1.5
       σ

           1.0
           0.5




                 −1.0                 −0.5                     0.0            0.5   1.0

                                                                µ




                                                                                          64 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Improper prior distribution


       Extension from a prior distribution to a prior σ-finite measure π
       such that
                                    π(θ) dθ = +∞
                                                 Θ




                                                                          65 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Improper prior distribution


       Extension from a prior distribution to a prior σ-finite measure π
       such that
                                    π(θ) dθ = +∞
                                                 Θ



       Formal extension: π cannot be interpreted as a probability any
       longer




                                                                          66 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Justifications


               Often only way to derive a prior in noninformative/automatic
          1

               settings




                                                                              67 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Justifications


               Often only way to derive a prior in noninformative/automatic
          1

               settings
               Performances of associated estimators usually good
          2




                                                                              68 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Justifications


               Often only way to derive a prior in noninformative/automatic
          1

               settings
               Performances of associated estimators usually good
          2


               Often occur as limits of proper distributions
          3




                                                                              69 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Justifications


               Often only way to derive a prior in noninformative/automatic
          1

               settings
               Performances of associated estimators usually good
          2


               Often occur as limits of proper distributions
          3


               More robust answer against possible misspecifications of the
          4

               prior




                                                                              70 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Justifications


               Often only way to derive a prior in noninformative/automatic
          1

               settings
               Performances of associated estimators usually good
          2


               Often occur as limits of proper distributions
          3


               More robust answer against possible misspecifications of the
          4

               prior
               Improper priors (infinitely!) preferable to vague proper priors
          5

               such as a N (0, 1002 ) distribution [e.g., BUGS]




                                                                                71 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Validation


       Extension of the posterior distribution π(θ|x) associated with an
       improper prior π given by Bayes’s formula

                                f (x|θ)π(θ)
            π(θ|x) =                           ,
                              Θ f (x|θ)π(θ) dθ

       when

                       f (x|θ)π(θ) dθ < ∞
                   Θ




                                                                           72 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection




       Example (Normal+improper)
       If x ∼ N (θ, 1) and π(θ) = ̟, constant, the pseudo marginal
       distribution is
                                           +∞
                                                  1
                                                 √ exp −(x − θ)2 /2 dθ = ̟
                       m(x) = ̟
                                                  2π
                                         −∞

       and the posterior distribution of θ is

                                                   (x − θ)2
                                            1
                                π(θ | x) = √ exp −                          ,
                                                      2
                                            2π
       i.e., corresponds to N (x, 1).
                                                                          [independent of ̟]


                                                                                               73 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Meaningless as probability distribution


               The mistake is to think of them [the non-informative priors]
               as representing ignorance
               —Lindley, 1990—




                                                                              74 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Meaningless as probability distribution


               The mistake is to think of them [the non-informative priors]
               as representing ignorance
               —Lindley, 1990—


       Example
       Consider a θ ∼ N (0, τ 2 ) prior. Then

                                             P π (θ ∈ [a, b]) −→ 0

       when τ → ∞ for any (a, b)



                                                                              75 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Noninformative prior distributions


                 What if all we know is that we know “nothing” ?!




                                                                          76 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Noninformative prior distributions


                 What if all we know is that we know “nothing” ?!

       In the absence of prior information, prior distributions solely
       derived from the sample distribution f (x|θ)




                                                                          77 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Noninformative prior distributions


                 What if all we know is that we know “nothing” ?!

       In the absence of prior information, prior distributions solely
       derived from the sample distribution f (x|θ)


               Noninformative priors cannot be expected to represent exactly
               total ignorance about the problem at hand, but should rather
               be taken as reference or default priors, upon which everyone
               could fall back when the prior information is missing.
               —Kass and Wasserman, 1996—




                                                                               78 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Laplace’s prior


       Principle of Insufficient Reason (Laplace)

                                 Θ = {θ1 , · · · , θp }                   π(θi ) = 1/p




                                                                                         79 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Laplace’s prior


       Principle of Insufficient Reason (Laplace)

                                 Θ = {θ1 , · · · , θp }                   π(θi ) = 1/p

       Extension to continuous spaces

                                                       π(θ) ∝ 1


                                                                                  [Lebesgue measure]




                                                                                                       80 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Who’s Laplace?
       Pierre Simon de Laplace (1749–1827)
     French mathematician and
     astronomer born in Beaumont en
     Auge (Normandie) who formalised
     mathematical astronomy in
     M´canique C´leste. Survived the
       e          e
     French revolution, the Napoleon
     Empire (as a comte!), and the
     Bourbon restauration (as a
     marquis!!).




                                                                          81 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Who’s Laplace?
       Pierre Simon de Laplace (1749–1827)
     French mathematician and
     astronomer born in Beaumont en
     Auge (Normandie) who formalised
     mathematical astronomy in
     M´canique C´leste. Survived the
       e          e
     French revolution, the Napoleon
     Empire (as a comte!), and the
     Bourbon restauration (as a
     marquis!!).
       In Essai Philosophique sur les Probabilit´s, Laplace set out a
                                                e
       mathematical system of inductive reasoning based on probability,
       precursor to Bayesian Statistics.

                                                                          82 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Laplace’s problem

               Lack of reparameterization invariance/coherence
                                                                                   1
                                              and ψ = eθ
                          π(θ) ∝ 1,                                       π(ψ) =     = 1 (!!)
                                                                                   ψ




                                                                                                83 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Laplace’s problem

               Lack of reparameterization invariance/coherence
                                                                                   1
                                              and ψ = eθ
                          π(θ) ∝ 1,                                       π(ψ) =     = 1 (!!)
                                                                                   ψ


               Problems of properness

                                       x ∼ N (µ, σ 2 ),                   π(µ, σ) = 1
                                                                              2    2
                                      π(µ, σ|x) ∝ e−(x−µ) /2σ σ −1
                                    ⇒ π(σ|x)    ∝1     (!!!)



                                                                                                84 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Jeffreys’ prior



   Based on Fisher information
                                ∂ log ℓ ∂ log ℓ
          I F (θ) = Eθ
                                 ∂θt      ∂θ

                                                                          Ron Fisher (1890–1962)




                                                                                                   85 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Jeffreys’ prior



   Based on Fisher information
                                ∂ log ℓ ∂ log ℓ
          I F (θ) = Eθ
                                 ∂θt      ∂θ

                                                                          Ron Fisher (1890–1962)
       the Jeffreys prior distribution is

                                              π J (θ) ∝ |I F (θ)|1/2




                                                                                                   86 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Who’s Jeffreys?

       Sir Harold Jeffreys (1891–1989)
       English mathematician,
       statistician, geophysicist, and
       astronomer. Founder of English
       Geophysics & originator of the
       theory that the Earth core is
       liquid.




                                                                          87 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Who’s Jeffreys?

       Sir Harold Jeffreys (1891–1989)
       English mathematician,
       statistician, geophysicist, and
       astronomer. Founder of English
       Geophysics & originator of the
       theory that the Earth core is
       liquid.
       Formalised Bayesian methods for
       the analysis of geophysical data
       and ended up writing Theory of
       Probability


                                                                          88 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Pros & Cons




               Relates to information theory




                                                                          89 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Pros & Cons




               Relates to information theory
               Agrees with most invariant priors




                                                                          90 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Pros & Cons




               Relates to information theory
               Agrees with most invariant priors
               Parameterization invariant




                                                                          91 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Prior selection



Pros & Cons




               Relates to information theory
               Agrees with most invariant priors
               Parameterization invariant
               Suffers from dimensionality curse




                                                                          92 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Bayesian estimation



Evaluating estimators

       Purpose of most inferential studies: to provide the
       statistician/client with a decision d ∈ D




                                                                          93 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Bayesian estimation



Evaluating estimators

       Purpose of most inferential studies: to provide the
       statistician/client with a decision d ∈ D
       Requires an evaluation criterion/loss function for decisions and
       estimators
                                       L(θ, d)




                                                                          94 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Bayesian estimation



Evaluating estimators

       Purpose of most inferential studies: to provide the
       statistician/client with a decision d ∈ D
       Requires an evaluation criterion/loss function for decisions and
       estimators
                                       L(θ, d)



       There exists an axiomatic derivation of the existence of a
       loss function.
                                                      [DeGroot, 1970]



                                                                          95 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Bayesian estimation



Loss functions



       Decision procedure δ π usually called estimator
       (while its value δ π (x) is called estimate of θ)




                                                                          96 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Bayesian estimation



Loss functions



       Decision procedure δ π usually called estimator
       (while its value δ π (x) is called estimate of θ)

       Impossible to uniformly minimize (in d) the loss function

                                                         L(θ, d)

       when θ is unknown




                                                                          97 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Bayesian estimation



Bayesian estimation



       Principle Integrate over the space Θ to get the posterior expected
       loss

                                            = Eπ [L(θ, d)|x]
                                            =            L(θ, d)π(θ|x) dθ,
                                                     Θ

       and minimise in d




                                                                             98 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Bayesian estimation



Bayes estimates




       Bayes estimator
       A Bayes estimate associated with a prior distribution π and a loss
       function L is
                            arg min Eπ [L(θ, d)|x] .
                                                     d




                                                                            99 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Bayesian estimation



The quadratic loss


   Historically, first loss function
   (Legendre, Gauss, Laplace)

                      L(θ, d) = (θ − d)2




                                                                          100 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Bayesian estimation



The quadratic loss


   Historically, first loss function
   (Legendre, Gauss, Laplace)

                      L(θ, d) = (θ − d)2


       The Bayes estimate δ π (x) associated with the prior π and with the
       quadratic loss is the posterior expectation

                                                                 Θ θf (x|θ)π(θ) dθ
                              δ π (x) = Eπ [θ|x] =                                   .
                                                                  Θ f (x|θ)π(θ) dθ




                                                                                         101 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Bayesian estimation



The absolute error loss


       Alternatives to the quadratic loss:

                                               L(θ, d) =| θ − d |,

       or

                                                      k2 (θ − d) if θ > d,
                           Lk1 ,k2 (θ, d) =
                                                      k1 (d − θ) otherwise.




                                                                              102 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Bayesian estimation



The absolute error loss


       Alternatives to the quadratic loss:

                                               L(θ, d) =| θ − d |,

       or

                                                      k2 (θ − d) if θ > d,
                           Lk1 ,k2 (θ, d) =
                                                      k1 (d − θ) otherwise.

       Associated Bayes estimate is (k2 /(k1 + k2 )) fractile of π(θ|x)




                                                                              103 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Bayesian estimation



MAP estimator



       With no loss function, consider using the maximum a posteriori
       (MAP) estimator
                               arg max ℓ(θ|x)π(θ)
                                                       θ




                                                                          104 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Bayesian estimation



MAP estimator



       With no loss function, consider using the maximum a posteriori
       (MAP) estimator
                               arg max ℓ(θ|x)π(θ)
                                                       θ


               Penalized likelihood estimator




                                                                          105 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Bayesian estimation



MAP estimator



       With no loss function, consider using the maximum a posteriori
       (MAP) estimator
                               arg max ℓ(θ|x)π(θ)
                                                       θ


               Penalized likelihood estimator
               Further appeal in restricted parameter spaces




                                                                          106 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Bayesian estimation



       Example (Binomial probability)
       Consider x|θ ∼ B(n, θ).
       Possible priors:
                                                 1
                             π J (θ) =                   θ−1/2 (1 − θ)−1/2 ,
                                             B(1/2, 1/2)

                                                            π2 (θ) = θ−1 (1 − θ)−1 .
                           π1 (θ) = 1           and




                                                                                       107 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Bayesian estimation



       Example (Binomial probability)
       Consider x|θ ∼ B(n, θ).
       Possible priors:
                                                 1
                             π J (θ) =                   θ−1/2 (1 − θ)−1/2 ,
                                             B(1/2, 1/2)

                                                            π2 (θ) = θ−1 (1 − θ)−1 .
                           π1 (θ) = 1           and
       Corresponding MAP estimators:
                                                                   x − 1/2
                                     δ πJ (x) =          max               ,0 ,
                                                                    n−1
                                     δ π1 (x) = x/n,
                                                                   x−1
                                     δ π2 (x) =          max           ,0 .
                                                                   n−2

                                                                                       108 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Bayesian estimation



Not always appropriate



       Example (Fixed MAP)
       Consider
                                                      1                   −1
                                                        1 + (x − θ)2
                                     f (x|θ) =                                 ,
                                                      π
                  1
       and π(θ) = 2 e−|θ| .




                                                                                   109 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Bayesian estimation



Not always appropriate



       Example (Fixed MAP)
       Consider
                                                      1                   −1
                                                        1 + (x − θ)2
                                     f (x|θ) =                                 ,
                                                      π
                  1
       and π(θ) = 2 e−|θ| . Then the MAP estimate of θ is always

                                                      δ π (x) = 0




                                                                                   110 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Confidence regions



Credible regions
       Natural confidence region: Highest posterior density (HPD) region
                                            π
                                           Cα = {θ; π(θ|x) > kα }




                                                                          111 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Confidence regions



Credible regions
       Natural confidence region: Highest posterior density (HPD) region
                                            π
                                           Cα = {θ; π(θ|x) > kα }


       Optimality




                                                                          0.15
     The HPD regions give the highest
     probabilities of containing θ for a

                                                                          0.10
     given volume
                                                                          0.05
                                                                          0.00



                                                                                 −10   −5   0   5   10

                                                                                            µ




                                                                                                         112 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Confidence regions




       Example
       If the posterior distribution of θ is N (µ(x), ω −2 ) with
       ω 2 = τ −2 + σ −2 and µ(x) = τ 2 x/(τ 2 + σ 2 ), then

                              Cα = µ(x) − kα ω −1 , µ(x) + kα ω −1 ,
                               π


       where kα is the α/2-quantile of N (0, 1).




                                                                          113 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Confidence regions




       Example
       If the posterior distribution of θ is N (µ(x), ω −2 ) with
       ω 2 = τ −2 + σ −2 and µ(x) = τ 2 x/(τ 2 + σ 2 ), then

                              Cα = µ(x) − kα ω −1 , µ(x) + kα ω −1 ,
                               π


       where kα is the α/2-quantile of N (0, 1).
       If τ goes to +∞,
                                          π
                                         Cα = [x − kα σ, x + kα σ] ,

       the “usual” (classical) confidence interval



                                                                          114 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Confidence regions



Full normal
       Under [almost!] Jeffreys prior prior
                                                π(µ, σ 2 ) = 1/σ 2 ,
       posterior distribution of (µ, σ)
                                                                       σ2
                                     µ|σ, x, s2
                                          ¯x          ∼      N     x,
                                                                   ¯       ,
                                                                       n
                                                                     n − 1 s2
                                                                          ,x
                                      σ 2 |¯, sx
                                           x2         ∼      IG                  .
                                                                       2     2




                                                                                     115 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Confidence regions



Full normal
       Under [almost!] Jeffreys prior prior
                                                π(µ, σ 2 ) = 1/σ 2 ,
       posterior distribution of (µ, σ)
                                                                       σ2
                                     µ|σ, x, s2
                                          ¯x          ∼      N     x,
                                                                   ¯       ,
                                                                       n
                                                                     n − 1 s2
                                                                          ,x
                                      σ 2 |¯, sx
                                           x2         ∼      IG                  .
                                                                       2     2

       Then
                                                            n(¯ − µ)2 (n−3)/2
                                                              x
       π(µ|¯, s2 ) ∝                  ω 1/2 exp −ω                            exp{−ωs2 /2} dω
           xx                                                        ω               x
                                                                2
                                                              −n/2
                                   s2 + n(¯ − µ)2
                           ∝              x
                                    x

                                                                                 [Tn−1 distribution]
                                                                                                   116 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Confidence regions



Normal credible interval
       Derived credible interval on µ

               [¯ − tα/2,n−1 sx
                x                              n(n − 1), x + tα/2,n−1 sx
                                                         ¯                 n(n − 1)]




                                                                                       117 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Confidence regions



Normal credible interval
       Derived credible interval on µ

               [¯ − tα/2,n−1 sx
                x                              n(n − 1), x + tα/2,n−1 sx
                                                         ¯                 n(n − 1)]


       normaldata
       Corresponding 95% confidence region for µ

                                               [−0.070, −0.013, ]

       Since 0 does not belong to this interval, reporting a significant
       decrease in the number of larcenies between 1991 and 1995 is
       acceptable


                                                                                       118 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing



Testing hypotheses

       Deciding about validity of assumptions or restrictions on the
       parameter θ from the data, represented as

                                 H0 : θ ∈ Θ0             versus H1 : θ ∈ Θ0




                                                                              119 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing



Testing hypotheses

       Deciding about validity of assumptions or restrictions on the
       parameter θ from the data, represented as

                                 H0 : θ ∈ Θ0             versus H1 : θ ∈ Θ0

       Binary outcome of the decision process: accept [coded by 1] or
       reject [coded by 0]
                                  D = {0, 1}




                                                                              120 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing



Testing hypotheses

       Deciding about validity of assumptions or restrictions on the
       parameter θ from the data, represented as

                                 H0 : θ ∈ Θ0             versus H1 : θ ∈ Θ0

       Binary outcome of the decision process: accept [coded by 1] or
       reject [coded by 0]
                                  D = {0, 1}
       Bayesian solution formally very close from a likelihood ratio test
       statistic, but numerical values often strongly differ from classical
       solutions


                                                                              121 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing



The 0 − 1 loss

       Rudimentary loss function

                                                         1−d          if θ ∈ Θ0
                                     L(θ, d) =
                                                         d            otherwise,

       Associated Bayes estimate
                              
                              1 if P π (θ ∈ Θ |x) > 1 ,
                                              0
                      π
                     δ (x) =                         2
                              0 otherwise.




                                                                                   122 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing



The 0 − 1 loss

       Rudimentary loss function

                                                         1−d          if θ ∈ Θ0
                                     L(θ, d) =
                                                         d            otherwise,

       Associated Bayes estimate
                              
                              1 if P π (θ ∈ Θ |x) > 1 ,
                                              0
                      π
                     δ (x) =                         2
                              0 otherwise.

       Intuitive structure


                                                                                   123 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing



Extension

       Weighted 0 − 1 (or a0 − a1 ) loss
                              
                              0 if d = IΘ0 (θ),
                              
                    L(θ, d) = a0 if θ ∈ Θ0 and d = 0,
                              
                              
                                a1 if θ ∈ Θ0 and d = 1,




                                                                          124 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing



Extension

       Weighted 0 − 1 (or a0 − a1 ) loss
                              
                              0 if d = IΘ0 (θ),
                              
                    L(θ, d) = a0 if θ ∈ Θ0 and d = 0,
                              
                              
                                a1 if θ ∈ Θ0 and d = 1,


       Associated Bayes estimator
                                         
                                                                             a1
                                         1 if P π (θ ∈ Θ0 |x) >                  ,
                                                                          a0 + a1
                             π
                           δ (x) =
                                         0 otherwise.



                                                                                      125 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing




       Example (Normal-normal)
       For x ∼ N (θ, σ 2 ) and θ ∼ N (µ, τ 2 ), π(θ|x) is N (µ(x), ω 2 ) with

                                       σ2µ + τ 2x                                 σ2τ 2
                                                                          ω2 =
                          µ(x) =                               and                        .
                                        σ2 + τ 2                                 σ2 + τ 2




                                                                                              126 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing




       Example (Normal-normal)
       For x ∼ N (θ, σ 2 ) and θ ∼ N (µ, τ 2 ), π(θ|x) is N (µ(x), ω 2 ) with

                                       σ2µ + τ 2x                                 σ2τ 2
                                                                          ω2 =
                          µ(x) =                               and                        .
                                        σ2 + τ 2                                 σ2 + τ 2
       To test H0 : θ < 0, we compute

                                                       θ − µ(x)    −µ(x)
                        P π (θ < 0|x) = P π                      <
                                                           ω        ω
                                                 = Φ (−µ(x)/ω) .




                                                                                              127 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing




       Example (Normal-normal (2))
       If za0 ,a1 is the a1 /(a0 + a1 ) quantile, i.e.,

                                        Φ(za0 ,a1 ) = a1 /(a0 + a1 ) ,

       H0 is accepted when

                                                −µ(x) > za0 ,a1 ω,

       the upper acceptance bound then being

                                                σ2         σ2
                                     x≤−           µ − (1 + 2 )ωza0 ,a1 .
                                                τ2         τ



                                                                            128 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing



Bayes factor



       Bayesian testing procedure depends on P π (θ ∈ Θ0 |x) or
       alternatively on the Bayes factor

                                          {P π (θ ∈ Θ1 |x)/P π (θ ∈ Θ0 |x)}
                               π
                              B10 =
                                            {P π (θ ∈ Θ1 )/P π (θ ∈ Θ0 )}

       in the absence of loss function parameters a0 and a1




                                                                              129 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing



Associated reparameterisations

       Corresponding models M1 vs. M0 compared via

                                                 P π (M1 |x)              P π (M1 )
                                      π
                                     B10 =
                                                 P π (M0 |x)              P π (M0 )




                                                                                      130 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing



Associated reparameterisations

       Corresponding models M1 vs. M0 compared via

                                                 P π (M1 |x)              P π (M1 )
                                      π
                                     B10 =
                                                 P π (M0 |x)              P π (M0 )

       If we rewrite the prior as
                π(θ) = Pr(θ ∈ Θ1 ) × π1 (θ) + Pr(θ ∈ Θ0 ) × π0 (θ)
       then

        π
       B10 =           f (x|θ1 )π1 (θ1 )dθ1                 f (x|θ0 )π0 (θ0 )dθ0 = m1 (x)/m0 (x)

                                                                          [Akin to likelihood ratio]


                                                                                                   131 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing



Jeffreys’ scale


                          π
               if log10 (B10 ) varies between 0 and 0.5,
         1

               the evidence against H0 is poor,
               if it is between 0.5 and 1, it is is
         2

               substantial,
               if it is between 1 and 2, it is strong,
         3

               and
               if it is above 2 it is decisive.
         4




                                                                          132 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing



Point null difficulties



       If π absolutely continuous,

                                              P π (θ = θ0 ) = 0 . . .




                                                                          133 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing



Point null difficulties



       If π absolutely continuous,

                                              P π (θ = θ0 ) = 0 . . .



                                   How can we test H0 : θ = θ0 ?!




                                                                          134 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing



New prior for new hypothesis


       Testing point null difficulties requires a modification of the prior
       distribution so that

                                      π(Θ0 ) > 0 and π(Θ1 ) > 0

       (hidden information) or

                    π(θ) = P π (θ ∈ Θ0 ) × π0 (θ) + P π (θ ∈ Θ1 ) × π1 (θ)




                                                                             135 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing



New prior for new hypothesis


       Testing point null difficulties requires a modification of the prior
       distribution so that

                                      π(Θ0 ) > 0 and π(Θ1 ) > 0

       (hidden information) or

                    π(θ) = P π (θ ∈ Θ0 ) × π0 (θ) + P π (θ ∈ Θ1 ) × π1 (θ)

                                        [E.g., when Θ0 = {θ0 }, π0 is Dirac mass at θ0 ]




                                                                                       136 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing



Posteriors with Dirac masses
       If H0 : θ = θ0 (= Θ0 ),

                 ρ = P π (θ = θ0 )             and π(θ) = ρIθ0 (θ) + (1 − ρ)π1 (θ)

       then
                                                     f (x|θ0 )ρ
                               π(Θ0 |x) =
                                                   f (x|θ)π(θ) dθ
                                                          f (x|θ0 )ρ
                                              =
                                                f (x|θ0 )ρ + (1 − ρ)m1 (x)

       with
                                      m1 (x) =               f (x|θ)π1 (θ) dθ.
                                                        Θ1


                                                                                     137 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing




       Example (Normal-normal)
       For x ∼ N (θ, σ 2 ) and θ ∼ N (0, τ 2 ), to test H0 : θ = 0 requires a
       modification of the prior, with
                                                                 2 /2τ 2
                                             π1 (θ) ∝ e−θ                  Iθ=0

       and π0 (θ) the Dirac mass in 0




                                                                                  138 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing




       Example (Normal-normal)
       For x ∼ N (θ, σ 2 ) and θ ∼ N (0, τ 2 ), to test H0 : θ = 0 requires a
       modification of the prior, with
                                                                 2 /2τ 2
                                             π1 (θ) ∝ e−θ                  Iθ=0

       and π0 (θ) the Dirac mass in 0
       Then
                                                                     2       2    2
                                                         e−x /2(σ +τ )
                        m1 (x)                     σ
                                              √
                                       =                       2    2
                        f (x|0)                 σ 2 + τ 2 e−x /2σ
                                                                      τ 2 x2
                                                    σ2
                                       =                 exp                          ,
                                                σ2 + τ 2        2σ 2 (σ 2 + τ 2 )


                                                                                          139 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing




       Example (cont’d)
       and
                                                                                                       −1
                                                                                        τ 2 x2
                                                           σ2
                           1−ρ
          π(θ = 0|x) = 1 +                                        exp                                       .
                                                         σ2 + τ 2                 2σ 2 (σ 2 + τ 2 )
                            ρ

       For z = x/σ and ρ = 1/2:
                       z                                    0              0.68      1.28       1.96
               π(θ = 0|z, τ = σ)                          0.586           0.557     0.484      0.351
              π(θ = 0|z, τ = 3.3σ)                        0.768           0.729     0.612      0.366




                                                                                                                140 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing



Banning improper priors



       Impossibility of using improper priors for testing!




                                                                          141 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing



Banning improper priors



       Impossibility of using improper priors for testing!
       Reason: When using the representation

                    π(θ) = P π (θ ∈ Θ1 ) × π1 (θ) + P π (θ ∈ Θ0 ) × π0 (θ)

       π1 and π0 must be normalised




                                                                             142 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing




       Example (Normal point null)
       When x ∼ N (θ, 1) and H0 : θ = 0, for the improper prior
       π(θ) = 1, the prior is transformed as
                                               1        1
                                         π(θ) = I0 (θ) + · Iθ=0 ,
                                               2        2
       and
                                                                                2 /2
                                                                          e−x
                        π(θ = 0|x) =                                      +∞
                                                     e−x2 /2 + −∞ e−(x−θ)2 /2 dθ
                                                            1
                                                         √
                                              =                   .
                                                     1 + 2πex2 /2


                                                                                       143 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing




       Example (Normal point null (2))
       Consequence: probability of H0 is bounded from above by
                                          √
                    π(θ = 0|x) ≤ 1/(1 + 2π) = 0.285

                        x                  0.0           1.0          1.65    1.96    2.58
                    π(θ = 0|x)            0.285         0.195        0.089   0.055   0.014

       Regular tests: Agreement with the classical p-value (but...)




                                                                                             144 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing



       Example (Normal one-sided)
       For x ∼ N (θ, 1), π(θ) = 1, and H0 : θ ≤ 0 to test versus
       H1 : θ > 0
                                                          0
                                    1                                     2 /2
                                                               e−(x−θ)
                      π(θ ≤ 0|x) = √                                             dθ = Φ(−x).
                                     2π                  −∞

       The generalized Bayes answer is also the p-value




                                                                                               145 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing



       Example (Normal one-sided)
       For x ∼ N (θ, 1), π(θ) = 1, and H0 : θ ≤ 0 to test versus
       H1 : θ > 0
                                                          0
                                    1                                     2 /2
                                                               e−(x−θ)
                      π(θ ≤ 0|x) = √                                             dθ = Φ(−x).
                                     2π                  −∞

       The generalized Bayes answer is also the p-value


       normaldata
       If π(µ, σ 2 ) = 1/σ 2 ,

                                             π(µ ≥ 0|x) = 0.0021

       since µ|x ∼ T89 (−0.0144, 0.000206).

                                                                                               146 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing



Jeffreys–Lindley paradox

       Limiting arguments not valid in testing settings: Under a
       conjugate prior
                                                                                           −1
                                                                             τ 2 x2
                                                            σ2
                                        1−ρ
          π(θ = 0|x) =               1+                            exp                          ,
                                                          σ2 + τ 2     2σ 2 (σ 2 + τ 2 )
                                         ρ

       which converges to 1 when τ goes to +∞, for every x




                                                                                                    147 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing



Jeffreys–Lindley paradox

       Limiting arguments not valid in testing settings: Under a
       conjugate prior
                                                                                           −1
                                                                             τ 2 x2
                                                            σ2
                                        1−ρ
          π(θ = 0|x) =               1+                            exp                          ,
                                                          σ2 + τ 2     2σ 2 (σ 2 + τ 2 )
                                         ρ

       which converges to 1 when τ goes to +∞, for every x
       Difference with the “noninformative” answer
                                √
                            [1 + 2π exp(x2 /2)]−1

                                                                             [   Invalid answer]


                                                                                                    148 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing



Normalisation difficulties

       If g0 and g1 are σ-finite measures on the subspaces Θ0 and Θ1 , the
       choice of the normalizing constants influences the Bayes factor:
       If gi replaced by ci gi (i = 0, 1), Bayes factor multiplied by
       c0 /c1




                                                                          149 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Testing



Normalisation difficulties

       If g0 and g1 are σ-finite measures on the subspaces Θ0 and Θ1 , the
       choice of the normalizing constants influences the Bayes factor:
       If gi replaced by ci gi (i = 0, 1), Bayes factor multiplied by
       c0 /c1

       Example
       If the Jeffreys prior is uniform and g0 = c0 , g1 = c1 ,
                                                             ρc0          f (x|θ) dθ
                                                                     Θ0
               π(θ ∈ Θ0 |x)        =
                                          ρc0       f (x|θ) dθ + (1 − ρ)c1                  f (x|θ) dθ
                                                 Θ0                                    Θ1
                                                                 ρ        f (x|θ) dθ
                                                                     Θ0
                                   =
                                          ρ      f (x|θ) dθ + (1 − ρ)[c1 /c0 ]                   f (x|θ) dθ
                                              Θ0                                            Θ1




                                                                                                              150 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Monte Carlo integration



Monte Carlo integration



       Generic problem of evaluating an integral

                                 I = Ef [h(X)] =                    h(x) f (x) dx
                                                                X

       where X is uni- or multidimensional, f is a closed form, partly
       closed form, or implicit density, and h is a function




                                                                                    151 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   The normal model
     Monte Carlo integration



Monte Carlo Principle

       Use a sample (x1 , . . . , xm ) from the density f to approximate the
       integral I by the empirical average
                                                                m
                                                        1
                                              hm =                    h(xj )
                                                        m
                                                              j=1




                                                                               152 / 785
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2
Bayesian Core: Chapter 2

Weitere ähnliche Inhalte

Mehr von Christian Robert

Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihoodChristian Robert
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceChristian Robert
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsChristian Robert
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distancesChristian Robert
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferenceChristian Robert
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018Christian Robert
 

Mehr von Christian Robert (20)

Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conference
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
 

Kürzlich hochgeladen

4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesVijayaLaxmi84
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptxAneriPatwari
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfChristalin Nelson
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 

Kürzlich hochgeladen (20)

4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their uses
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptx
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdf
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 

Bayesian Core: Chapter 2

  • 1. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Bayesian Core: A Practical Approach to Computational Bayesian Statistics Jean-Michel Marin & Christian P. Robert QUT, Brisbane, August 4–8 2008 1 / 785
  • 2. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Outline The normal model 1 Regression and variable selection 2 Generalized linear models 3 Capture-recapture experiments 4 Mixture models 5 Dynamic models 6 Image analysis 7 2 / 785
  • 3. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The normal model The normal model 1 Normal problems The Bayesian toolbox Prior selection Bayesian estimation Confidence regions Testing Monte Carlo integration Prediction 3 / 785
  • 4. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Normal problems Normal model Sample x1 , . . . , xn from a normal N (µ, σ 2 ) distribution normal sample 0.6 0.5 0.4 Density 0.3 0.2 0.1 0.0 −2 −1 0 1 2 4 / 785
  • 5. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Normal problems Inference on (µ, σ) based on this sample Estimation of [transforms of] (µ, σ) 5 / 785
  • 6. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Normal problems Inference on (µ, σ) based on this sample Estimation of [transforms of] (µ, σ) Confidence region [interval] on (µ, σ) 6 / 785
  • 7. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Normal problems Inference on (µ, σ) based on this sample Estimation of [transforms of] (µ, σ) Confidence region [interval] on (µ, σ) Test on (µ, σ) and comparison with other samples 7 / 785
  • 8. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Normal problems Datasets Larcenies =normaldata 4 Relative changes in reported 3 larcenies between 1991 and 1995 (relative to 1991) for 2 the 90 most populous US counties (Source: FBI) 1 0 −0.4 −0.2 0.0 0.2 0.4 0.6 8 / 785
  • 9. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Normal problems Cosmological background =CMBdata Spectral representation of the “cosmological microwave background” (CMB), i.e. electromagnetic radiation from photons back to 300, 000 years after the Big Bang, expressed as difference in apparent temperature from the mean temperature 9 / 785
  • 10. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Normal problems Cosmological background =CMBdata Normal estimation 5 4 3 2 1 0 −0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 CMB 10 / 785
  • 11. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Normal problems Cosmological background =CMBdata Normal estimation 5 4 3 2 1 0 −0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 CMB 11 / 785
  • 12. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The Bayesian toolbox The Bayesian toolbox Bayes theorem = Inversion of probabilities 12 / 785
  • 13. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The Bayesian toolbox The Bayesian toolbox Bayes theorem = Inversion of probabilities If A and E are events such that P (E) = 0, P (A|E) and P (E|A) are related by P (E|A)P (A) P (A|E) = P (E|A)P (A) + P (E|Ac )P (Ac ) P (E|A)P (A) = P (E) 13 / 785
  • 14. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The Bayesian toolbox Who’s Bayes? Reverend Thomas Bayes (ca. 1702–1761) Presbyterian minister in Tunbridge Wells (Kent) from 1731, son of Joshua Bayes, nonconformist minister. Election to the Royal Society based on a tract of 1736 where he defended the views and philosophy of Newton. 14 / 785
  • 15. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The Bayesian toolbox Who’s Bayes? Reverend Thomas Bayes (ca. 1702–1761) Presbyterian minister in Tunbridge Wells (Kent) from 1731, son of Joshua Bayes, nonconformist minister. Election to the Royal Society based on a tract of 1736 where he defended the views and philosophy of Newton. Sole probability paper, “Essay Towards Solving a Problem in the Doctrine of Chances”, published posthumously in 1763 by Pierce and containing the seeds of Bayes’ Theorem. 15 / 785
  • 16. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The Bayesian toolbox New perspective Uncertainty on the parameters θ of a model modeled through a probability distribution π on Θ, called prior distribution 16 / 785
  • 17. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The Bayesian toolbox New perspective Uncertainty on the parameters θ of a model modeled through a probability distribution π on Θ, called prior distribution Inference based on the distribution of θ conditional on x, π(θ|x), called posterior distribution f (x|θ)π(θ) π(θ|x) = . f (x|θ)π(θ) dθ 17 / 785
  • 18. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The Bayesian toolbox Bayesian model A Bayesian statistical model is made of a likelihood 1 f (x|θ), 18 / 785
  • 19. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The Bayesian toolbox Bayesian model A Bayesian statistical model is made of a likelihood 1 f (x|θ), and of a prior distribution on the parameters, 2 π(θ) . 19 / 785
  • 20. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The Bayesian toolbox Justifications Semantic drift from unknown θ to random θ 20 / 785
  • 21. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The Bayesian toolbox Justifications Semantic drift from unknown θ to random θ Actualization of information/knowledge on θ by extracting information/knowledge on θ contained in the observation x 21 / 785
  • 22. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The Bayesian toolbox Justifications Semantic drift from unknown θ to random θ Actualization of information/knowledge on θ by extracting information/knowledge on θ contained in the observation x Allows incorporation of imperfect/imprecise information in the decision process 22 / 785
  • 23. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The Bayesian toolbox Justifications Semantic drift from unknown θ to random θ Actualization of information/knowledge on θ by extracting information/knowledge on θ contained in the observation x Allows incorporation of imperfect/imprecise information in the decision process Unique mathematical way to condition upon the observations (conditional perspective) 23 / 785
  • 24. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The Bayesian toolbox Example (Normal illustration (σ 2 = 1)) Assume π(θ) = exp{−θ} Iθ>0 24 / 785
  • 25. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The Bayesian toolbox Example (Normal illustration (σ 2 = 1)) Assume π(θ) = exp{−θ} Iθ>0 Then π(θ|x1 , . . . , xn ) ∝ exp{−θ} exp{−n(θ − x)2 /2} Iθ>0 ∝ exp −nθ2 /2 + θ(nx − 1) Iθ>0 ∝ exp −n(θ − (x − 1/n))2 /2 Iθ>0 25 / 785
  • 26. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The Bayesian toolbox Example (Normal illustration (2)) Truncated normal distribution N + ((x − 1/n), 1/n) mu=−0.08935864 2.5 2.0 1.5 1.0 0.5 0.0 0.0 0.1 0.2 0.3 0.4 0.5 26 / 785
  • 27. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The Bayesian toolbox Prior and posterior distributions Given f (x|θ) and π(θ), several distributions of interest: the joint distribution of (θ, x), 1 ϕ(θ, x) = f (x|θ)π(θ) ; 27 / 785
  • 28. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The Bayesian toolbox Prior and posterior distributions Given f (x|θ) and π(θ), several distributions of interest: the joint distribution of (θ, x), 1 ϕ(θ, x) = f (x|θ)π(θ) ; the marginal distribution of x, 2 m(x) = ϕ(θ, x) dθ = f (x|θ)π(θ) dθ ; 28 / 785
  • 29. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The Bayesian toolbox the posterior distribution of θ, 3 f (x|θ)π(θ) π(θ|x) = f (x|θ)π(θ) dθ f (x|θ)π(θ) = ; m(x) the predictive distribution of y, when y ∼ g(y|θ, x), 4 g(y|x) = g(y|θ, x)π(θ|x)dθ . 29 / 785
  • 30. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The Bayesian toolbox Posterior distribution center of to Bayesian inference π(θ|x) ∝ f (x|θ) π(θ) Operates conditional upon the observations 30 / 785
  • 31. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The Bayesian toolbox Posterior distribution center of to Bayesian inference π(θ|x) ∝ f (x|θ) π(θ) Operates conditional upon the observations Integrate simultaneously prior information/knowledge and information brought by x 31 / 785
  • 32. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The Bayesian toolbox Posterior distribution center of to Bayesian inference π(θ|x) ∝ f (x|θ) π(θ) Operates conditional upon the observations Integrate simultaneously prior information/knowledge and information brought by x Avoids averaging over the unobserved values of x 32 / 785
  • 33. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The Bayesian toolbox Posterior distribution center of to Bayesian inference π(θ|x) ∝ f (x|θ) π(θ) Operates conditional upon the observations Integrate simultaneously prior information/knowledge and information brought by x Avoids averaging over the unobserved values of x Coherent updating of the information available on θ, independent of the order in which i.i.d. observations are collected 33 / 785
  • 34. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The Bayesian toolbox Posterior distribution center of to Bayesian inference π(θ|x) ∝ f (x|θ) π(θ) Operates conditional upon the observations Integrate simultaneously prior information/knowledge and information brought by x Avoids averaging over the unobserved values of x Coherent updating of the information available on θ, independent of the order in which i.i.d. observations are collected Provides a complete inferential scope and an unique motor of inference 34 / 785
  • 35. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The Bayesian toolbox Example (Normal-normal case) Consider x|θ ∼ N (θ, 1) and θ ∼ N (a, 10). (x − θ)2 (θ − a)2 π(θ|x) ∝ f (x|θ)π(θ) ∝ exp − − 2 20 11θ2 ∝ exp − + θ(x + a/10) 20 11 ∝ exp − {θ − ((10x + a)/11)}2 20 35 / 785
  • 36. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model The Bayesian toolbox Example (Normal-normal case) Consider x|θ ∼ N (θ, 1) and θ ∼ N (a, 10). (x − θ)2 (θ − a)2 π(θ|x) ∝ f (x|θ)π(θ) ∝ exp − − 2 20 11θ2 ∝ exp − + θ(x + a/10) 20 11 ∝ exp − {θ − ((10x + a)/11)}2 20 and θ|x ∼ N (10x + a) 11, 10 11 36 / 785
  • 37. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Prior selection The prior distribution is the key to Bayesian inference 37 / 785
  • 38. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Prior selection The prior distribution is the key to Bayesian inference But... In practice, it seldom occurs that the available prior information is precise enough to lead to an exact determination of the prior distribution 38 / 785
  • 39. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Prior selection The prior distribution is the key to Bayesian inference But... In practice, it seldom occurs that the available prior information is precise enough to lead to an exact determination of the prior distribution There is no such thing as the prior distribution! 39 / 785
  • 40. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Strategies for prior determination Ungrounded prior distributions produce unjustified posterior inference. —Anonymous, ca. 2006 40 / 785
  • 41. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Strategies for prior determination Ungrounded prior distributions produce unjustified posterior inference. —Anonymous, ca. 2006 Use a partition of Θ in sets (e.g., intervals), determine the probability of each set, and approach π by an histogram 41 / 785
  • 42. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Strategies for prior determination Ungrounded prior distributions produce unjustified posterior inference. —Anonymous, ca. 2006 Use a partition of Θ in sets (e.g., intervals), determine the probability of each set, and approach π by an histogram Select significant elements of Θ, evaluate their respective likelihoods and deduce a likelihood curve proportional to π 42 / 785
  • 43. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Strategies for prior determination Ungrounded prior distributions produce unjustified posterior inference. —Anonymous, ca. 2006 Use a partition of Θ in sets (e.g., intervals), determine the probability of each set, and approach π by an histogram Select significant elements of Θ, evaluate their respective likelihoods and deduce a likelihood curve proportional to π Use the marginal distribution of x, m(x) = f (x|θ)π(θ) dθ Θ 43 / 785
  • 44. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Strategies for prior determination Ungrounded prior distributions produce unjustified posterior inference. —Anonymous, ca. 2006 Use a partition of Θ in sets (e.g., intervals), determine the probability of each set, and approach π by an histogram Select significant elements of Θ, evaluate their respective likelihoods and deduce a likelihood curve proportional to π Use the marginal distribution of x, m(x) = f (x|θ)π(θ) dθ Θ Empirical and hierarchical Bayes techniques 44 / 785
  • 45. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Conjugate priors Specific parametric family with analytical properties Conjugate prior A family F of probability distributions on Θ is conjugate for a likelihood function f (x|θ) if, for every π ∈ F, the posterior distribution π(θ|x) also belongs to F. 45 / 785
  • 46. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Conjugate priors Specific parametric family with analytical properties Conjugate prior A family F of probability distributions on Θ is conjugate for a likelihood function f (x|θ) if, for every π ∈ F, the posterior distribution π(θ|x) also belongs to F. Only of interest when F is parameterised : switching from prior to posterior distribution is reduced to an updating of the corresponding parameters. 46 / 785
  • 47. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Justifications Limited/finite information conveyed by x Preservation of the structure of π(θ) 47 / 785
  • 48. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Justifications Limited/finite information conveyed by x Preservation of the structure of π(θ) Exchangeability motivations Device of virtual past observations 48 / 785
  • 49. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Justifications Limited/finite information conveyed by x Preservation of the structure of π(θ) Exchangeability motivations Device of virtual past observations Linearity of some estimators But mostly... 49 / 785
  • 50. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Justifications Limited/finite information conveyed by x Preservation of the structure of π(θ) Exchangeability motivations Device of virtual past observations Linearity of some estimators But mostly... tractability and simplicity 50 / 785
  • 51. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Justifications Limited/finite information conveyed by x Preservation of the structure of π(θ) Exchangeability motivations Device of virtual past observations Linearity of some estimators But mostly... tractability and simplicity First approximations to adequate priors, backed up by robustness analysis 51 / 785
  • 52. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Exponential families Sampling models of interest Exponential family The family of distributions f (x|θ) = C(θ)h(x) exp{R(θ) · T (x)} is called an exponential family of dimension k. When Θ ⊂ Rk , X ⊂ Rk and f (x|θ) = h(x) exp{θ · x − Ψ(θ)}, the family is said to be natural. 52 / 785
  • 53. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Analytical properties of exponential families Sufficient statistics (Pitman–Koopman Lemma) 53 / 785
  • 54. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Analytical properties of exponential families Sufficient statistics (Pitman–Koopman Lemma) Common enough structure (normal, Poisson, &tc...) 54 / 785
  • 55. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Analytical properties of exponential families Sufficient statistics (Pitman–Koopman Lemma) Common enough structure (normal, Poisson, &tc...) Analyticity (E[x] = ∇Ψ(θ), ...) 55 / 785
  • 56. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Analytical properties of exponential families Sufficient statistics (Pitman–Koopman Lemma) Common enough structure (normal, Poisson, &tc...) Analyticity (E[x] = ∇Ψ(θ), ...) Allow for conjugate priors π(θ|µ, λ) = K(µ, λ) eθ.µ−λΨ(θ) λ>0 56 / 785
  • 57. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Standard exponential families f (x|θ) π(θ) π(θ|x) Normal Normal N (θ, σ 2 ) N (µ, τ 2 ) N (ρ(σ 2 µ + τ 2 x), ρσ 2 τ 2 ) ρ−1 = σ 2 + τ 2 Poisson Gamma P(θ) G(α, β) G(α + x, β + 1) Gamma Gamma G(ν, θ) G(α, β) G(α + ν, β + x) Binomial Beta B(n, θ) Be(α, β) Be(α + x, β + n − x) 57 / 785
  • 58. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection More... f (x|θ) π(θ) π(θ|x) Negative Binomial Beta N eg(m, θ) Be(α, β) Be(α + m, β + x) Multinomial Dirichlet Mk (θ1 , . . . , θk ) D(α1 , . . . , αk ) D(α1 + x1 , . . . , αk + xk ) Normal Gamma G(α + 0.5, β + (µ − x)2 /2) N (µ, 1/θ) Ga(α, β) 58 / 785
  • 59. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Linearity of the posterior mean If θ ∼ πλ,µ (θ) ∝ eθ·µ−λΨ(θ) with µ ∈ X , then µ Eπ [∇Ψ(θ)] = . λ where ∇Ψ(θ) = (∂Ψ(θ)/∂θ1 , . . . , ∂Ψ(θ)/∂θp ) 59 / 785
  • 60. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Linearity of the posterior mean If θ ∼ πλ,µ (θ) ∝ eθ·µ−λΨ(θ) with µ ∈ X , then µ Eπ [∇Ψ(θ)] = . λ where ∇Ψ(θ) = (∂Ψ(θ)/∂θ1 , . . . , ∂Ψ(θ)/∂θp ) Therefore, if x1 , . . . , xn are i.i.d. f (x|θ), µ + n¯ x Eπ [∇Ψ(θ)|x1 , . . . , xn ] = . λ+n 60 / 785
  • 61. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Example (Normal-normal) In the normal N (θ, σ 2 ) case, conjugate also normal N (µ, τ 2 ) and Eπ [∇Ψ(θ)|x] = Eπ [θ|x] = ρ(σ 2 µ + τ 2 x) where ρ−1 = σ 2 + τ 2 61 / 785
  • 62. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Example (Full normal) In the normal N (µ, σ 2 ) case, when both µ and σ are unknown, there still is a conjugate prior on θ = (µ, σ 2 ), of the form (σ 2 )−λσ exp − λµ (µ − ξ)2 + α /2σ 2 62 / 785
  • 63. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Example (Full normal) In the normal N (µ, σ 2 ) case, when both µ and σ are unknown, there still is a conjugate prior on θ = (µ, σ 2 ), of the form (σ 2 )−λσ exp − λµ (µ − ξ)2 + α /2σ 2 since π(µ, σ 2 |x1 , . . . , xn ) ∝ (σ 2 )−λσ exp − λµ (µ − ξ)2 + α /2σ 2 ×(σ 2 )−n/2 exp − n(µ − x)2 + s2 /2σ 2 x ∝ (σ 2 )−λσ −n/2 exp − (λµ + n)(µ − ξx )2 nλµ (x − ξ)2 +α + s2 + /2σ 2 x n + λµ 63 / 785
  • 64. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection parameters (0,1,1,1) 2.0 1.5 σ 1.0 0.5 −1.0 −0.5 0.0 0.5 1.0 µ 64 / 785
  • 65. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Improper prior distribution Extension from a prior distribution to a prior σ-finite measure π such that π(θ) dθ = +∞ Θ 65 / 785
  • 66. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Improper prior distribution Extension from a prior distribution to a prior σ-finite measure π such that π(θ) dθ = +∞ Θ Formal extension: π cannot be interpreted as a probability any longer 66 / 785
  • 67. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Justifications Often only way to derive a prior in noninformative/automatic 1 settings 67 / 785
  • 68. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Justifications Often only way to derive a prior in noninformative/automatic 1 settings Performances of associated estimators usually good 2 68 / 785
  • 69. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Justifications Often only way to derive a prior in noninformative/automatic 1 settings Performances of associated estimators usually good 2 Often occur as limits of proper distributions 3 69 / 785
  • 70. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Justifications Often only way to derive a prior in noninformative/automatic 1 settings Performances of associated estimators usually good 2 Often occur as limits of proper distributions 3 More robust answer against possible misspecifications of the 4 prior 70 / 785
  • 71. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Justifications Often only way to derive a prior in noninformative/automatic 1 settings Performances of associated estimators usually good 2 Often occur as limits of proper distributions 3 More robust answer against possible misspecifications of the 4 prior Improper priors (infinitely!) preferable to vague proper priors 5 such as a N (0, 1002 ) distribution [e.g., BUGS] 71 / 785
  • 72. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Validation Extension of the posterior distribution π(θ|x) associated with an improper prior π given by Bayes’s formula f (x|θ)π(θ) π(θ|x) = , Θ f (x|θ)π(θ) dθ when f (x|θ)π(θ) dθ < ∞ Θ 72 / 785
  • 73. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Example (Normal+improper) If x ∼ N (θ, 1) and π(θ) = ̟, constant, the pseudo marginal distribution is +∞ 1 √ exp −(x − θ)2 /2 dθ = ̟ m(x) = ̟ 2π −∞ and the posterior distribution of θ is (x − θ)2 1 π(θ | x) = √ exp − , 2 2π i.e., corresponds to N (x, 1). [independent of ̟] 73 / 785
  • 74. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Meaningless as probability distribution The mistake is to think of them [the non-informative priors] as representing ignorance —Lindley, 1990— 74 / 785
  • 75. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Meaningless as probability distribution The mistake is to think of them [the non-informative priors] as representing ignorance —Lindley, 1990— Example Consider a θ ∼ N (0, τ 2 ) prior. Then P π (θ ∈ [a, b]) −→ 0 when τ → ∞ for any (a, b) 75 / 785
  • 76. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Noninformative prior distributions What if all we know is that we know “nothing” ?! 76 / 785
  • 77. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Noninformative prior distributions What if all we know is that we know “nothing” ?! In the absence of prior information, prior distributions solely derived from the sample distribution f (x|θ) 77 / 785
  • 78. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Noninformative prior distributions What if all we know is that we know “nothing” ?! In the absence of prior information, prior distributions solely derived from the sample distribution f (x|θ) Noninformative priors cannot be expected to represent exactly total ignorance about the problem at hand, but should rather be taken as reference or default priors, upon which everyone could fall back when the prior information is missing. —Kass and Wasserman, 1996— 78 / 785
  • 79. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Laplace’s prior Principle of Insufficient Reason (Laplace) Θ = {θ1 , · · · , θp } π(θi ) = 1/p 79 / 785
  • 80. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Laplace’s prior Principle of Insufficient Reason (Laplace) Θ = {θ1 , · · · , θp } π(θi ) = 1/p Extension to continuous spaces π(θ) ∝ 1 [Lebesgue measure] 80 / 785
  • 81. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Who’s Laplace? Pierre Simon de Laplace (1749–1827) French mathematician and astronomer born in Beaumont en Auge (Normandie) who formalised mathematical astronomy in M´canique C´leste. Survived the e e French revolution, the Napoleon Empire (as a comte!), and the Bourbon restauration (as a marquis!!). 81 / 785
  • 82. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Who’s Laplace? Pierre Simon de Laplace (1749–1827) French mathematician and astronomer born in Beaumont en Auge (Normandie) who formalised mathematical astronomy in M´canique C´leste. Survived the e e French revolution, the Napoleon Empire (as a comte!), and the Bourbon restauration (as a marquis!!). In Essai Philosophique sur les Probabilit´s, Laplace set out a e mathematical system of inductive reasoning based on probability, precursor to Bayesian Statistics. 82 / 785
  • 83. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Laplace’s problem Lack of reparameterization invariance/coherence 1 and ψ = eθ π(θ) ∝ 1, π(ψ) = = 1 (!!) ψ 83 / 785
  • 84. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Laplace’s problem Lack of reparameterization invariance/coherence 1 and ψ = eθ π(θ) ∝ 1, π(ψ) = = 1 (!!) ψ Problems of properness x ∼ N (µ, σ 2 ), π(µ, σ) = 1 2 2 π(µ, σ|x) ∝ e−(x−µ) /2σ σ −1 ⇒ π(σ|x) ∝1 (!!!) 84 / 785
  • 85. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Jeffreys’ prior Based on Fisher information ∂ log ℓ ∂ log ℓ I F (θ) = Eθ ∂θt ∂θ Ron Fisher (1890–1962) 85 / 785
  • 86. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Jeffreys’ prior Based on Fisher information ∂ log ℓ ∂ log ℓ I F (θ) = Eθ ∂θt ∂θ Ron Fisher (1890–1962) the Jeffreys prior distribution is π J (θ) ∝ |I F (θ)|1/2 86 / 785
  • 87. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Who’s Jeffreys? Sir Harold Jeffreys (1891–1989) English mathematician, statistician, geophysicist, and astronomer. Founder of English Geophysics & originator of the theory that the Earth core is liquid. 87 / 785
  • 88. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Who’s Jeffreys? Sir Harold Jeffreys (1891–1989) English mathematician, statistician, geophysicist, and astronomer. Founder of English Geophysics & originator of the theory that the Earth core is liquid. Formalised Bayesian methods for the analysis of geophysical data and ended up writing Theory of Probability 88 / 785
  • 89. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Pros & Cons Relates to information theory 89 / 785
  • 90. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Pros & Cons Relates to information theory Agrees with most invariant priors 90 / 785
  • 91. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Pros & Cons Relates to information theory Agrees with most invariant priors Parameterization invariant 91 / 785
  • 92. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Prior selection Pros & Cons Relates to information theory Agrees with most invariant priors Parameterization invariant Suffers from dimensionality curse 92 / 785
  • 93. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Bayesian estimation Evaluating estimators Purpose of most inferential studies: to provide the statistician/client with a decision d ∈ D 93 / 785
  • 94. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Bayesian estimation Evaluating estimators Purpose of most inferential studies: to provide the statistician/client with a decision d ∈ D Requires an evaluation criterion/loss function for decisions and estimators L(θ, d) 94 / 785
  • 95. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Bayesian estimation Evaluating estimators Purpose of most inferential studies: to provide the statistician/client with a decision d ∈ D Requires an evaluation criterion/loss function for decisions and estimators L(θ, d) There exists an axiomatic derivation of the existence of a loss function. [DeGroot, 1970] 95 / 785
  • 96. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Bayesian estimation Loss functions Decision procedure δ π usually called estimator (while its value δ π (x) is called estimate of θ) 96 / 785
  • 97. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Bayesian estimation Loss functions Decision procedure δ π usually called estimator (while its value δ π (x) is called estimate of θ) Impossible to uniformly minimize (in d) the loss function L(θ, d) when θ is unknown 97 / 785
  • 98. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Bayesian estimation Bayesian estimation Principle Integrate over the space Θ to get the posterior expected loss = Eπ [L(θ, d)|x] = L(θ, d)π(θ|x) dθ, Θ and minimise in d 98 / 785
  • 99. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Bayesian estimation Bayes estimates Bayes estimator A Bayes estimate associated with a prior distribution π and a loss function L is arg min Eπ [L(θ, d)|x] . d 99 / 785
  • 100. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Bayesian estimation The quadratic loss Historically, first loss function (Legendre, Gauss, Laplace) L(θ, d) = (θ − d)2 100 / 785
  • 101. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Bayesian estimation The quadratic loss Historically, first loss function (Legendre, Gauss, Laplace) L(θ, d) = (θ − d)2 The Bayes estimate δ π (x) associated with the prior π and with the quadratic loss is the posterior expectation Θ θf (x|θ)π(θ) dθ δ π (x) = Eπ [θ|x] = . Θ f (x|θ)π(θ) dθ 101 / 785
  • 102. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Bayesian estimation The absolute error loss Alternatives to the quadratic loss: L(θ, d) =| θ − d |, or k2 (θ − d) if θ > d, Lk1 ,k2 (θ, d) = k1 (d − θ) otherwise. 102 / 785
  • 103. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Bayesian estimation The absolute error loss Alternatives to the quadratic loss: L(θ, d) =| θ − d |, or k2 (θ − d) if θ > d, Lk1 ,k2 (θ, d) = k1 (d − θ) otherwise. Associated Bayes estimate is (k2 /(k1 + k2 )) fractile of π(θ|x) 103 / 785
  • 104. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Bayesian estimation MAP estimator With no loss function, consider using the maximum a posteriori (MAP) estimator arg max ℓ(θ|x)π(θ) θ 104 / 785
  • 105. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Bayesian estimation MAP estimator With no loss function, consider using the maximum a posteriori (MAP) estimator arg max ℓ(θ|x)π(θ) θ Penalized likelihood estimator 105 / 785
  • 106. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Bayesian estimation MAP estimator With no loss function, consider using the maximum a posteriori (MAP) estimator arg max ℓ(θ|x)π(θ) θ Penalized likelihood estimator Further appeal in restricted parameter spaces 106 / 785
  • 107. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Bayesian estimation Example (Binomial probability) Consider x|θ ∼ B(n, θ). Possible priors: 1 π J (θ) = θ−1/2 (1 − θ)−1/2 , B(1/2, 1/2) π2 (θ) = θ−1 (1 − θ)−1 . π1 (θ) = 1 and 107 / 785
  • 108. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Bayesian estimation Example (Binomial probability) Consider x|θ ∼ B(n, θ). Possible priors: 1 π J (θ) = θ−1/2 (1 − θ)−1/2 , B(1/2, 1/2) π2 (θ) = θ−1 (1 − θ)−1 . π1 (θ) = 1 and Corresponding MAP estimators: x − 1/2 δ πJ (x) = max ,0 , n−1 δ π1 (x) = x/n, x−1 δ π2 (x) = max ,0 . n−2 108 / 785
  • 109. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Bayesian estimation Not always appropriate Example (Fixed MAP) Consider 1 −1 1 + (x − θ)2 f (x|θ) = , π 1 and π(θ) = 2 e−|θ| . 109 / 785
  • 110. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Bayesian estimation Not always appropriate Example (Fixed MAP) Consider 1 −1 1 + (x − θ)2 f (x|θ) = , π 1 and π(θ) = 2 e−|θ| . Then the MAP estimate of θ is always δ π (x) = 0 110 / 785
  • 111. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Confidence regions Credible regions Natural confidence region: Highest posterior density (HPD) region π Cα = {θ; π(θ|x) > kα } 111 / 785
  • 112. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Confidence regions Credible regions Natural confidence region: Highest posterior density (HPD) region π Cα = {θ; π(θ|x) > kα } Optimality 0.15 The HPD regions give the highest probabilities of containing θ for a 0.10 given volume 0.05 0.00 −10 −5 0 5 10 µ 112 / 785
  • 113. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Confidence regions Example If the posterior distribution of θ is N (µ(x), ω −2 ) with ω 2 = τ −2 + σ −2 and µ(x) = τ 2 x/(τ 2 + σ 2 ), then Cα = µ(x) − kα ω −1 , µ(x) + kα ω −1 , π where kα is the α/2-quantile of N (0, 1). 113 / 785
  • 114. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Confidence regions Example If the posterior distribution of θ is N (µ(x), ω −2 ) with ω 2 = τ −2 + σ −2 and µ(x) = τ 2 x/(τ 2 + σ 2 ), then Cα = µ(x) − kα ω −1 , µ(x) + kα ω −1 , π where kα is the α/2-quantile of N (0, 1). If τ goes to +∞, π Cα = [x − kα σ, x + kα σ] , the “usual” (classical) confidence interval 114 / 785
  • 115. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Confidence regions Full normal Under [almost!] Jeffreys prior prior π(µ, σ 2 ) = 1/σ 2 , posterior distribution of (µ, σ) σ2 µ|σ, x, s2 ¯x ∼ N x, ¯ , n n − 1 s2 ,x σ 2 |¯, sx x2 ∼ IG . 2 2 115 / 785
  • 116. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Confidence regions Full normal Under [almost!] Jeffreys prior prior π(µ, σ 2 ) = 1/σ 2 , posterior distribution of (µ, σ) σ2 µ|σ, x, s2 ¯x ∼ N x, ¯ , n n − 1 s2 ,x σ 2 |¯, sx x2 ∼ IG . 2 2 Then n(¯ − µ)2 (n−3)/2 x π(µ|¯, s2 ) ∝ ω 1/2 exp −ω exp{−ωs2 /2} dω xx ω x 2 −n/2 s2 + n(¯ − µ)2 ∝ x x [Tn−1 distribution] 116 / 785
  • 117. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Confidence regions Normal credible interval Derived credible interval on µ [¯ − tα/2,n−1 sx x n(n − 1), x + tα/2,n−1 sx ¯ n(n − 1)] 117 / 785
  • 118. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Confidence regions Normal credible interval Derived credible interval on µ [¯ − tα/2,n−1 sx x n(n − 1), x + tα/2,n−1 sx ¯ n(n − 1)] normaldata Corresponding 95% confidence region for µ [−0.070, −0.013, ] Since 0 does not belong to this interval, reporting a significant decrease in the number of larcenies between 1991 and 1995 is acceptable 118 / 785
  • 119. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Testing hypotheses Deciding about validity of assumptions or restrictions on the parameter θ from the data, represented as H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ0 119 / 785
  • 120. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Testing hypotheses Deciding about validity of assumptions or restrictions on the parameter θ from the data, represented as H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ0 Binary outcome of the decision process: accept [coded by 1] or reject [coded by 0] D = {0, 1} 120 / 785
  • 121. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Testing hypotheses Deciding about validity of assumptions or restrictions on the parameter θ from the data, represented as H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ0 Binary outcome of the decision process: accept [coded by 1] or reject [coded by 0] D = {0, 1} Bayesian solution formally very close from a likelihood ratio test statistic, but numerical values often strongly differ from classical solutions 121 / 785
  • 122. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing The 0 − 1 loss Rudimentary loss function 1−d if θ ∈ Θ0 L(θ, d) = d otherwise, Associated Bayes estimate  1 if P π (θ ∈ Θ |x) > 1 , 0 π δ (x) = 2 0 otherwise. 122 / 785
  • 123. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing The 0 − 1 loss Rudimentary loss function 1−d if θ ∈ Θ0 L(θ, d) = d otherwise, Associated Bayes estimate  1 if P π (θ ∈ Θ |x) > 1 , 0 π δ (x) = 2 0 otherwise. Intuitive structure 123 / 785
  • 124. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Extension Weighted 0 − 1 (or a0 − a1 ) loss  0 if d = IΘ0 (θ),  L(θ, d) = a0 if θ ∈ Θ0 and d = 0,   a1 if θ ∈ Θ0 and d = 1, 124 / 785
  • 125. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Extension Weighted 0 − 1 (or a0 − a1 ) loss  0 if d = IΘ0 (θ),  L(θ, d) = a0 if θ ∈ Θ0 and d = 0,   a1 if θ ∈ Θ0 and d = 1, Associated Bayes estimator  a1 1 if P π (θ ∈ Θ0 |x) > , a0 + a1 π δ (x) = 0 otherwise. 125 / 785
  • 126. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Example (Normal-normal) For x ∼ N (θ, σ 2 ) and θ ∼ N (µ, τ 2 ), π(θ|x) is N (µ(x), ω 2 ) with σ2µ + τ 2x σ2τ 2 ω2 = µ(x) = and . σ2 + τ 2 σ2 + τ 2 126 / 785
  • 127. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Example (Normal-normal) For x ∼ N (θ, σ 2 ) and θ ∼ N (µ, τ 2 ), π(θ|x) is N (µ(x), ω 2 ) with σ2µ + τ 2x σ2τ 2 ω2 = µ(x) = and . σ2 + τ 2 σ2 + τ 2 To test H0 : θ < 0, we compute θ − µ(x) −µ(x) P π (θ < 0|x) = P π < ω ω = Φ (−µ(x)/ω) . 127 / 785
  • 128. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Example (Normal-normal (2)) If za0 ,a1 is the a1 /(a0 + a1 ) quantile, i.e., Φ(za0 ,a1 ) = a1 /(a0 + a1 ) , H0 is accepted when −µ(x) > za0 ,a1 ω, the upper acceptance bound then being σ2 σ2 x≤− µ − (1 + 2 )ωza0 ,a1 . τ2 τ 128 / 785
  • 129. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Bayes factor Bayesian testing procedure depends on P π (θ ∈ Θ0 |x) or alternatively on the Bayes factor {P π (θ ∈ Θ1 |x)/P π (θ ∈ Θ0 |x)} π B10 = {P π (θ ∈ Θ1 )/P π (θ ∈ Θ0 )} in the absence of loss function parameters a0 and a1 129 / 785
  • 130. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Associated reparameterisations Corresponding models M1 vs. M0 compared via P π (M1 |x) P π (M1 ) π B10 = P π (M0 |x) P π (M0 ) 130 / 785
  • 131. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Associated reparameterisations Corresponding models M1 vs. M0 compared via P π (M1 |x) P π (M1 ) π B10 = P π (M0 |x) P π (M0 ) If we rewrite the prior as π(θ) = Pr(θ ∈ Θ1 ) × π1 (θ) + Pr(θ ∈ Θ0 ) × π0 (θ) then π B10 = f (x|θ1 )π1 (θ1 )dθ1 f (x|θ0 )π0 (θ0 )dθ0 = m1 (x)/m0 (x) [Akin to likelihood ratio] 131 / 785
  • 132. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Jeffreys’ scale π if log10 (B10 ) varies between 0 and 0.5, 1 the evidence against H0 is poor, if it is between 0.5 and 1, it is is 2 substantial, if it is between 1 and 2, it is strong, 3 and if it is above 2 it is decisive. 4 132 / 785
  • 133. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Point null difficulties If π absolutely continuous, P π (θ = θ0 ) = 0 . . . 133 / 785
  • 134. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Point null difficulties If π absolutely continuous, P π (θ = θ0 ) = 0 . . . How can we test H0 : θ = θ0 ?! 134 / 785
  • 135. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing New prior for new hypothesis Testing point null difficulties requires a modification of the prior distribution so that π(Θ0 ) > 0 and π(Θ1 ) > 0 (hidden information) or π(θ) = P π (θ ∈ Θ0 ) × π0 (θ) + P π (θ ∈ Θ1 ) × π1 (θ) 135 / 785
  • 136. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing New prior for new hypothesis Testing point null difficulties requires a modification of the prior distribution so that π(Θ0 ) > 0 and π(Θ1 ) > 0 (hidden information) or π(θ) = P π (θ ∈ Θ0 ) × π0 (θ) + P π (θ ∈ Θ1 ) × π1 (θ) [E.g., when Θ0 = {θ0 }, π0 is Dirac mass at θ0 ] 136 / 785
  • 137. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Posteriors with Dirac masses If H0 : θ = θ0 (= Θ0 ), ρ = P π (θ = θ0 ) and π(θ) = ρIθ0 (θ) + (1 − ρ)π1 (θ) then f (x|θ0 )ρ π(Θ0 |x) = f (x|θ)π(θ) dθ f (x|θ0 )ρ = f (x|θ0 )ρ + (1 − ρ)m1 (x) with m1 (x) = f (x|θ)π1 (θ) dθ. Θ1 137 / 785
  • 138. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Example (Normal-normal) For x ∼ N (θ, σ 2 ) and θ ∼ N (0, τ 2 ), to test H0 : θ = 0 requires a modification of the prior, with 2 /2τ 2 π1 (θ) ∝ e−θ Iθ=0 and π0 (θ) the Dirac mass in 0 138 / 785
  • 139. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Example (Normal-normal) For x ∼ N (θ, σ 2 ) and θ ∼ N (0, τ 2 ), to test H0 : θ = 0 requires a modification of the prior, with 2 /2τ 2 π1 (θ) ∝ e−θ Iθ=0 and π0 (θ) the Dirac mass in 0 Then 2 2 2 e−x /2(σ +τ ) m1 (x) σ √ = 2 2 f (x|0) σ 2 + τ 2 e−x /2σ τ 2 x2 σ2 = exp , σ2 + τ 2 2σ 2 (σ 2 + τ 2 ) 139 / 785
  • 140. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Example (cont’d) and −1 τ 2 x2 σ2 1−ρ π(θ = 0|x) = 1 + exp . σ2 + τ 2 2σ 2 (σ 2 + τ 2 ) ρ For z = x/σ and ρ = 1/2: z 0 0.68 1.28 1.96 π(θ = 0|z, τ = σ) 0.586 0.557 0.484 0.351 π(θ = 0|z, τ = 3.3σ) 0.768 0.729 0.612 0.366 140 / 785
  • 141. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Banning improper priors Impossibility of using improper priors for testing! 141 / 785
  • 142. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Banning improper priors Impossibility of using improper priors for testing! Reason: When using the representation π(θ) = P π (θ ∈ Θ1 ) × π1 (θ) + P π (θ ∈ Θ0 ) × π0 (θ) π1 and π0 must be normalised 142 / 785
  • 143. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Example (Normal point null) When x ∼ N (θ, 1) and H0 : θ = 0, for the improper prior π(θ) = 1, the prior is transformed as 1 1 π(θ) = I0 (θ) + · Iθ=0 , 2 2 and 2 /2 e−x π(θ = 0|x) = +∞ e−x2 /2 + −∞ e−(x−θ)2 /2 dθ 1 √ = . 1 + 2πex2 /2 143 / 785
  • 144. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Example (Normal point null (2)) Consequence: probability of H0 is bounded from above by √ π(θ = 0|x) ≤ 1/(1 + 2π) = 0.285 x 0.0 1.0 1.65 1.96 2.58 π(θ = 0|x) 0.285 0.195 0.089 0.055 0.014 Regular tests: Agreement with the classical p-value (but...) 144 / 785
  • 145. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Example (Normal one-sided) For x ∼ N (θ, 1), π(θ) = 1, and H0 : θ ≤ 0 to test versus H1 : θ > 0 0 1 2 /2 e−(x−θ) π(θ ≤ 0|x) = √ dθ = Φ(−x). 2π −∞ The generalized Bayes answer is also the p-value 145 / 785
  • 146. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Example (Normal one-sided) For x ∼ N (θ, 1), π(θ) = 1, and H0 : θ ≤ 0 to test versus H1 : θ > 0 0 1 2 /2 e−(x−θ) π(θ ≤ 0|x) = √ dθ = Φ(−x). 2π −∞ The generalized Bayes answer is also the p-value normaldata If π(µ, σ 2 ) = 1/σ 2 , π(µ ≥ 0|x) = 0.0021 since µ|x ∼ T89 (−0.0144, 0.000206). 146 / 785
  • 147. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Jeffreys–Lindley paradox Limiting arguments not valid in testing settings: Under a conjugate prior −1 τ 2 x2 σ2 1−ρ π(θ = 0|x) = 1+ exp , σ2 + τ 2 2σ 2 (σ 2 + τ 2 ) ρ which converges to 1 when τ goes to +∞, for every x 147 / 785
  • 148. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Jeffreys–Lindley paradox Limiting arguments not valid in testing settings: Under a conjugate prior −1 τ 2 x2 σ2 1−ρ π(θ = 0|x) = 1+ exp , σ2 + τ 2 2σ 2 (σ 2 + τ 2 ) ρ which converges to 1 when τ goes to +∞, for every x Difference with the “noninformative” answer √ [1 + 2π exp(x2 /2)]−1 [ Invalid answer] 148 / 785
  • 149. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Normalisation difficulties If g0 and g1 are σ-finite measures on the subspaces Θ0 and Θ1 , the choice of the normalizing constants influences the Bayes factor: If gi replaced by ci gi (i = 0, 1), Bayes factor multiplied by c0 /c1 149 / 785
  • 150. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Testing Normalisation difficulties If g0 and g1 are σ-finite measures on the subspaces Θ0 and Θ1 , the choice of the normalizing constants influences the Bayes factor: If gi replaced by ci gi (i = 0, 1), Bayes factor multiplied by c0 /c1 Example If the Jeffreys prior is uniform and g0 = c0 , g1 = c1 , ρc0 f (x|θ) dθ Θ0 π(θ ∈ Θ0 |x) = ρc0 f (x|θ) dθ + (1 − ρ)c1 f (x|θ) dθ Θ0 Θ1 ρ f (x|θ) dθ Θ0 = ρ f (x|θ) dθ + (1 − ρ)[c1 /c0 ] f (x|θ) dθ Θ0 Θ1 150 / 785
  • 151. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Monte Carlo integration Monte Carlo integration Generic problem of evaluating an integral I = Ef [h(X)] = h(x) f (x) dx X where X is uni- or multidimensional, f is a closed form, partly closed form, or implicit density, and h is a function 151 / 785
  • 152. Bayesian Core:A Practical Approach to Computational Bayesian Statistics The normal model Monte Carlo integration Monte Carlo Principle Use a sample (x1 , . . . , xm ) from the density f to approximate the integral I by the empirical average m 1 hm = h(xj ) m j=1 152 / 785