prashanth updated resume 2024 for Teaching Profession
Bayesian Core: Chapter 2
1. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Bayesian Core:
A Practical Approach to Computational Bayesian
Statistics
Jean-Michel Marin & Christian P. Robert
QUT, Brisbane, August 4–8 2008
1 / 785
2. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Outline
The normal model
1
Regression and variable selection
2
Generalized linear models
3
Capture-recapture experiments
4
Mixture models
5
Dynamic models
6
Image analysis
7
2 / 785
3. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The normal model
The normal model
1
Normal problems
The Bayesian toolbox
Prior selection
Bayesian estimation
Confidence regions
Testing
Monte Carlo integration
Prediction
3 / 785
4. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Normal problems
Normal model
Sample
x1 , . . . , xn
from a normal N (µ, σ 2 ) distribution
normal sample
0.6
0.5
0.4
Density
0.3
0.2
0.1
0.0
−2 −1 0 1 2
4 / 785
5. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Normal problems
Inference on (µ, σ) based on this sample
Estimation of [transforms of] (µ, σ)
5 / 785
6. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Normal problems
Inference on (µ, σ) based on this sample
Estimation of [transforms of] (µ, σ)
Confidence region [interval] on (µ, σ)
6 / 785
7. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Normal problems
Inference on (µ, σ) based on this sample
Estimation of [transforms of] (µ, σ)
Confidence region [interval] on (µ, σ)
Test on (µ, σ) and comparison with other samples
7 / 785
8. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Normal problems
Datasets
Larcenies =normaldata
4
Relative changes in reported
3
larcenies between 1991 and
1995 (relative to 1991) for
2
the 90 most populous US
counties (Source: FBI)
1
0
−0.4 −0.2 0.0 0.2 0.4 0.6
8 / 785
9. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Normal problems
Cosmological background =CMBdata
Spectral representation of the
“cosmological microwave
background” (CMB),
i.e. electromagnetic radiation
from photons back to 300, 000
years after the Big Bang,
expressed as difference in
apparent temperature from the
mean temperature
9 / 785
10. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Normal problems
Cosmological background =CMBdata
Normal estimation
5
4
3
2
1
0
−0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6
CMB
10 / 785
11. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Normal problems
Cosmological background =CMBdata
Normal estimation
5
4
3
2
1
0
−0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6
CMB
11 / 785
12. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The Bayesian toolbox
The Bayesian toolbox
Bayes theorem = Inversion of probabilities
12 / 785
13. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The Bayesian toolbox
The Bayesian toolbox
Bayes theorem = Inversion of probabilities
If A and E are events such that P (E) = 0, P (A|E) and P (E|A)
are related by
P (E|A)P (A)
P (A|E) =
P (E|A)P (A) + P (E|Ac )P (Ac )
P (E|A)P (A)
=
P (E)
13 / 785
14. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The Bayesian toolbox
Who’s Bayes?
Reverend Thomas Bayes (ca. 1702–1761)
Presbyterian minister in
Tunbridge Wells (Kent) from
1731, son of Joshua Bayes,
nonconformist minister. Election
to the Royal Society based on a
tract of 1736 where he defended
the views and philosophy of
Newton.
14 / 785
15. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The Bayesian toolbox
Who’s Bayes?
Reverend Thomas Bayes (ca. 1702–1761)
Presbyterian minister in
Tunbridge Wells (Kent) from
1731, son of Joshua Bayes,
nonconformist minister. Election
to the Royal Society based on a
tract of 1736 where he defended
the views and philosophy of
Newton.
Sole probability paper, “Essay Towards Solving a Problem in the
Doctrine of Chances”, published posthumously in 1763 by Pierce
and containing the seeds of Bayes’ Theorem.
15 / 785
16. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The Bayesian toolbox
New perspective
Uncertainty on the parameters θ of a model modeled through
a probability distribution π on Θ, called prior distribution
16 / 785
17. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The Bayesian toolbox
New perspective
Uncertainty on the parameters θ of a model modeled through
a probability distribution π on Θ, called prior distribution
Inference based on the distribution of θ conditional on x,
π(θ|x), called posterior distribution
f (x|θ)π(θ)
π(θ|x) = .
f (x|θ)π(θ) dθ
17 / 785
18. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The Bayesian toolbox
Bayesian model
A Bayesian statistical model is made of
a likelihood
1
f (x|θ),
18 / 785
19. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The Bayesian toolbox
Bayesian model
A Bayesian statistical model is made of
a likelihood
1
f (x|θ),
and of
a prior distribution on the parameters,
2
π(θ) .
19 / 785
20. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The Bayesian toolbox
Justifications
Semantic drift from unknown θ to random θ
20 / 785
21. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The Bayesian toolbox
Justifications
Semantic drift from unknown θ to random θ
Actualization of information/knowledge on θ by extracting
information/knowledge on θ contained in the observation x
21 / 785
22. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The Bayesian toolbox
Justifications
Semantic drift from unknown θ to random θ
Actualization of information/knowledge on θ by extracting
information/knowledge on θ contained in the observation x
Allows incorporation of imperfect/imprecise information in the
decision process
22 / 785
23. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The Bayesian toolbox
Justifications
Semantic drift from unknown θ to random θ
Actualization of information/knowledge on θ by extracting
information/knowledge on θ contained in the observation x
Allows incorporation of imperfect/imprecise information in the
decision process
Unique mathematical way to condition upon the observations
(conditional perspective)
23 / 785
24. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The Bayesian toolbox
Example (Normal illustration (σ 2 = 1))
Assume
π(θ) = exp{−θ} Iθ>0
24 / 785
25. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The Bayesian toolbox
Example (Normal illustration (σ 2 = 1))
Assume
π(θ) = exp{−θ} Iθ>0
Then
π(θ|x1 , . . . , xn ) ∝ exp{−θ} exp{−n(θ − x)2 /2} Iθ>0
∝ exp −nθ2 /2 + θ(nx − 1) Iθ>0
∝ exp −n(θ − (x − 1/n))2 /2 Iθ>0
25 / 785
26. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The Bayesian toolbox
Example (Normal illustration (2))
Truncated normal distribution
N + ((x − 1/n), 1/n)
mu=−0.08935864
2.5
2.0
1.5
1.0
0.5
0.0
0.0 0.1 0.2 0.3 0.4 0.5
26 / 785
27. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The Bayesian toolbox
Prior and posterior distributions
Given f (x|θ) and π(θ), several distributions of interest:
the joint distribution of (θ, x),
1
ϕ(θ, x) = f (x|θ)π(θ) ;
27 / 785
28. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The Bayesian toolbox
Prior and posterior distributions
Given f (x|θ) and π(θ), several distributions of interest:
the joint distribution of (θ, x),
1
ϕ(θ, x) = f (x|θ)π(θ) ;
the marginal distribution of x,
2
m(x) = ϕ(θ, x) dθ
= f (x|θ)π(θ) dθ ;
28 / 785
29. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The Bayesian toolbox
the posterior distribution of θ,
3
f (x|θ)π(θ)
π(θ|x) =
f (x|θ)π(θ) dθ
f (x|θ)π(θ)
= ;
m(x)
the predictive distribution of y, when y ∼ g(y|θ, x),
4
g(y|x) = g(y|θ, x)π(θ|x)dθ .
29 / 785
30. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The Bayesian toolbox
Posterior distribution center of to Bayesian inference
π(θ|x) ∝ f (x|θ) π(θ)
Operates conditional upon the observations
30 / 785
31. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The Bayesian toolbox
Posterior distribution center of to Bayesian inference
π(θ|x) ∝ f (x|θ) π(θ)
Operates conditional upon the observations
Integrate simultaneously prior information/knowledge and
information brought by x
31 / 785
32. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The Bayesian toolbox
Posterior distribution center of to Bayesian inference
π(θ|x) ∝ f (x|θ) π(θ)
Operates conditional upon the observations
Integrate simultaneously prior information/knowledge and
information brought by x
Avoids averaging over the unobserved values of x
32 / 785
33. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The Bayesian toolbox
Posterior distribution center of to Bayesian inference
π(θ|x) ∝ f (x|θ) π(θ)
Operates conditional upon the observations
Integrate simultaneously prior information/knowledge and
information brought by x
Avoids averaging over the unobserved values of x
Coherent updating of the information available on θ,
independent of the order in which i.i.d. observations are
collected
33 / 785
34. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The Bayesian toolbox
Posterior distribution center of to Bayesian inference
π(θ|x) ∝ f (x|θ) π(θ)
Operates conditional upon the observations
Integrate simultaneously prior information/knowledge and
information brought by x
Avoids averaging over the unobserved values of x
Coherent updating of the information available on θ,
independent of the order in which i.i.d. observations are
collected
Provides a complete inferential scope and an unique motor of
inference
34 / 785
35. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The Bayesian toolbox
Example (Normal-normal case)
Consider x|θ ∼ N (θ, 1) and θ ∼ N (a, 10).
(x − θ)2 (θ − a)2
π(θ|x) ∝ f (x|θ)π(θ) ∝ exp − −
2 20
11θ2
∝ exp − + θ(x + a/10)
20
11
∝ exp − {θ − ((10x + a)/11)}2
20
35 / 785
36. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
The Bayesian toolbox
Example (Normal-normal case)
Consider x|θ ∼ N (θ, 1) and θ ∼ N (a, 10).
(x − θ)2 (θ − a)2
π(θ|x) ∝ f (x|θ)π(θ) ∝ exp − −
2 20
11θ2
∝ exp − + θ(x + a/10)
20
11
∝ exp − {θ − ((10x + a)/11)}2
20
and
θ|x ∼ N (10x + a) 11, 10 11
36 / 785
37. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Prior selection
The prior distribution is the key to Bayesian inference
37 / 785
38. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Prior selection
The prior distribution is the key to Bayesian inference
But...
In practice, it seldom occurs that the available prior information is
precise enough to lead to an exact determination of the prior
distribution
38 / 785
39. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Prior selection
The prior distribution is the key to Bayesian inference
But...
In practice, it seldom occurs that the available prior information is
precise enough to lead to an exact determination of the prior
distribution
There is no such thing as the prior distribution!
39 / 785
40. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Strategies for prior determination
Ungrounded prior distributions produce
unjustified posterior inference.
—Anonymous, ca. 2006
40 / 785
41. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Strategies for prior determination
Ungrounded prior distributions produce
unjustified posterior inference.
—Anonymous, ca. 2006
Use a partition of Θ in sets (e.g., intervals), determine the
probability of each set, and approach π by an histogram
41 / 785
42. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Strategies for prior determination
Ungrounded prior distributions produce
unjustified posterior inference.
—Anonymous, ca. 2006
Use a partition of Θ in sets (e.g., intervals), determine the
probability of each set, and approach π by an histogram
Select significant elements of Θ, evaluate their respective
likelihoods and deduce a likelihood curve proportional to π
42 / 785
43. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Strategies for prior determination
Ungrounded prior distributions produce
unjustified posterior inference.
—Anonymous, ca. 2006
Use a partition of Θ in sets (e.g., intervals), determine the
probability of each set, and approach π by an histogram
Select significant elements of Θ, evaluate their respective
likelihoods and deduce a likelihood curve proportional to π
Use the marginal distribution of x,
m(x) = f (x|θ)π(θ) dθ
Θ
43 / 785
44. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Strategies for prior determination
Ungrounded prior distributions produce
unjustified posterior inference.
—Anonymous, ca. 2006
Use a partition of Θ in sets (e.g., intervals), determine the
probability of each set, and approach π by an histogram
Select significant elements of Θ, evaluate their respective
likelihoods and deduce a likelihood curve proportional to π
Use the marginal distribution of x,
m(x) = f (x|θ)π(θ) dθ
Θ
Empirical and hierarchical Bayes techniques
44 / 785
45. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Conjugate priors
Specific parametric family with analytical properties
Conjugate prior
A family F of probability distributions on Θ is conjugate for a
likelihood function f (x|θ) if, for every π ∈ F, the posterior
distribution π(θ|x) also belongs to F.
45 / 785
46. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Conjugate priors
Specific parametric family with analytical properties
Conjugate prior
A family F of probability distributions on Θ is conjugate for a
likelihood function f (x|θ) if, for every π ∈ F, the posterior
distribution π(θ|x) also belongs to F.
Only of interest when F is parameterised : switching from prior to
posterior distribution is reduced to an updating of the
corresponding parameters.
46 / 785
47. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Justifications
Limited/finite information conveyed by x
Preservation of the structure of π(θ)
47 / 785
48. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Justifications
Limited/finite information conveyed by x
Preservation of the structure of π(θ)
Exchangeability motivations
Device of virtual past observations
48 / 785
49. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Justifications
Limited/finite information conveyed by x
Preservation of the structure of π(θ)
Exchangeability motivations
Device of virtual past observations
Linearity of some estimators
But mostly...
49 / 785
50. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Justifications
Limited/finite information conveyed by x
Preservation of the structure of π(θ)
Exchangeability motivations
Device of virtual past observations
Linearity of some estimators
But mostly... tractability and simplicity
50 / 785
51. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Justifications
Limited/finite information conveyed by x
Preservation of the structure of π(θ)
Exchangeability motivations
Device of virtual past observations
Linearity of some estimators
But mostly... tractability and simplicity
First approximations to adequate priors, backed up by
robustness analysis
51 / 785
52. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Exponential families
Sampling models of interest
Exponential family
The family of distributions
f (x|θ) = C(θ)h(x) exp{R(θ) · T (x)}
is called an exponential family of dimension k. When Θ ⊂ Rk ,
X ⊂ Rk and
f (x|θ) = h(x) exp{θ · x − Ψ(θ)},
the family is said to be natural.
52 / 785
53. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Analytical properties of exponential families
Sufficient statistics (Pitman–Koopman Lemma)
53 / 785
54. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Analytical properties of exponential families
Sufficient statistics (Pitman–Koopman Lemma)
Common enough structure (normal, Poisson, &tc...)
54 / 785
55. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Analytical properties of exponential families
Sufficient statistics (Pitman–Koopman Lemma)
Common enough structure (normal, Poisson, &tc...)
Analyticity (E[x] = ∇Ψ(θ), ...)
55 / 785
56. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Analytical properties of exponential families
Sufficient statistics (Pitman–Koopman Lemma)
Common enough structure (normal, Poisson, &tc...)
Analyticity (E[x] = ∇Ψ(θ), ...)
Allow for conjugate priors
π(θ|µ, λ) = K(µ, λ) eθ.µ−λΨ(θ) λ>0
56 / 785
57. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Standard exponential families
f (x|θ) π(θ) π(θ|x)
Normal Normal
N (θ, σ 2 ) N (µ, τ 2 ) N (ρ(σ 2 µ + τ 2 x), ρσ 2 τ 2 )
ρ−1 = σ 2 + τ 2
Poisson Gamma
P(θ) G(α, β) G(α + x, β + 1)
Gamma Gamma
G(ν, θ) G(α, β) G(α + ν, β + x)
Binomial Beta
B(n, θ) Be(α, β) Be(α + x, β + n − x)
57 / 785
58. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
More...
f (x|θ) π(θ) π(θ|x)
Negative Binomial Beta
N eg(m, θ) Be(α, β) Be(α + m, β + x)
Multinomial Dirichlet
Mk (θ1 , . . . , θk ) D(α1 , . . . , αk ) D(α1 + x1 , . . . , αk + xk )
Normal Gamma
G(α + 0.5, β + (µ − x)2 /2)
N (µ, 1/θ) Ga(α, β)
58 / 785
59. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Linearity of the posterior mean
If
θ ∼ πλ,µ (θ) ∝ eθ·µ−λΨ(θ)
with µ ∈ X , then
µ
Eπ [∇Ψ(θ)] =
.
λ
where ∇Ψ(θ) = (∂Ψ(θ)/∂θ1 , . . . , ∂Ψ(θ)/∂θp )
59 / 785
60. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Linearity of the posterior mean
If
θ ∼ πλ,µ (θ) ∝ eθ·µ−λΨ(θ)
with µ ∈ X , then
µ
Eπ [∇Ψ(θ)] =
.
λ
where ∇Ψ(θ) = (∂Ψ(θ)/∂θ1 , . . . , ∂Ψ(θ)/∂θp )
Therefore, if x1 , . . . , xn are i.i.d. f (x|θ),
µ + n¯
x
Eπ [∇Ψ(θ)|x1 , . . . , xn ] = .
λ+n
60 / 785
61. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Example (Normal-normal)
In the normal N (θ, σ 2 ) case, conjugate also normal N (µ, τ 2 ) and
Eπ [∇Ψ(θ)|x] = Eπ [θ|x] = ρ(σ 2 µ + τ 2 x)
where
ρ−1 = σ 2 + τ 2
61 / 785
62. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Example (Full normal)
In the normal N (µ, σ 2 ) case, when both µ and σ are unknown,
there still is a conjugate prior on θ = (µ, σ 2 ), of the form
(σ 2 )−λσ exp − λµ (µ − ξ)2 + α /2σ 2
62 / 785
63. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Example (Full normal)
In the normal N (µ, σ 2 ) case, when both µ and σ are unknown,
there still is a conjugate prior on θ = (µ, σ 2 ), of the form
(σ 2 )−λσ exp − λµ (µ − ξ)2 + α /2σ 2
since
π(µ, σ 2 |x1 , . . . , xn ) ∝ (σ 2 )−λσ exp − λµ (µ − ξ)2 + α /2σ 2
×(σ 2 )−n/2 exp − n(µ − x)2 + s2 /2σ 2
x
∝ (σ 2 )−λσ −n/2 exp − (λµ + n)(µ − ξx )2
nλµ (x − ξ)2
+α + s2 + /2σ 2
x
n + λµ
63 / 785
64. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
parameters (0,1,1,1)
2.0
1.5
σ
1.0
0.5
−1.0 −0.5 0.0 0.5 1.0
µ
64 / 785
65. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Improper prior distribution
Extension from a prior distribution to a prior σ-finite measure π
such that
π(θ) dθ = +∞
Θ
65 / 785
66. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Improper prior distribution
Extension from a prior distribution to a prior σ-finite measure π
such that
π(θ) dθ = +∞
Θ
Formal extension: π cannot be interpreted as a probability any
longer
66 / 785
67. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Justifications
Often only way to derive a prior in noninformative/automatic
1
settings
67 / 785
68. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Justifications
Often only way to derive a prior in noninformative/automatic
1
settings
Performances of associated estimators usually good
2
68 / 785
69. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Justifications
Often only way to derive a prior in noninformative/automatic
1
settings
Performances of associated estimators usually good
2
Often occur as limits of proper distributions
3
69 / 785
70. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Justifications
Often only way to derive a prior in noninformative/automatic
1
settings
Performances of associated estimators usually good
2
Often occur as limits of proper distributions
3
More robust answer against possible misspecifications of the
4
prior
70 / 785
71. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Justifications
Often only way to derive a prior in noninformative/automatic
1
settings
Performances of associated estimators usually good
2
Often occur as limits of proper distributions
3
More robust answer against possible misspecifications of the
4
prior
Improper priors (infinitely!) preferable to vague proper priors
5
such as a N (0, 1002 ) distribution [e.g., BUGS]
71 / 785
72. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Validation
Extension of the posterior distribution π(θ|x) associated with an
improper prior π given by Bayes’s formula
f (x|θ)π(θ)
π(θ|x) = ,
Θ f (x|θ)π(θ) dθ
when
f (x|θ)π(θ) dθ < ∞
Θ
72 / 785
73. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Example (Normal+improper)
If x ∼ N (θ, 1) and π(θ) = ̟, constant, the pseudo marginal
distribution is
+∞
1
√ exp −(x − θ)2 /2 dθ = ̟
m(x) = ̟
2π
−∞
and the posterior distribution of θ is
(x − θ)2
1
π(θ | x) = √ exp − ,
2
2π
i.e., corresponds to N (x, 1).
[independent of ̟]
73 / 785
74. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Meaningless as probability distribution
The mistake is to think of them [the non-informative priors]
as representing ignorance
—Lindley, 1990—
74 / 785
75. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Meaningless as probability distribution
The mistake is to think of them [the non-informative priors]
as representing ignorance
—Lindley, 1990—
Example
Consider a θ ∼ N (0, τ 2 ) prior. Then
P π (θ ∈ [a, b]) −→ 0
when τ → ∞ for any (a, b)
75 / 785
76. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Noninformative prior distributions
What if all we know is that we know “nothing” ?!
76 / 785
77. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Noninformative prior distributions
What if all we know is that we know “nothing” ?!
In the absence of prior information, prior distributions solely
derived from the sample distribution f (x|θ)
77 / 785
78. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Noninformative prior distributions
What if all we know is that we know “nothing” ?!
In the absence of prior information, prior distributions solely
derived from the sample distribution f (x|θ)
Noninformative priors cannot be expected to represent exactly
total ignorance about the problem at hand, but should rather
be taken as reference or default priors, upon which everyone
could fall back when the prior information is missing.
—Kass and Wasserman, 1996—
78 / 785
79. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Laplace’s prior
Principle of Insufficient Reason (Laplace)
Θ = {θ1 , · · · , θp } π(θi ) = 1/p
79 / 785
80. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Laplace’s prior
Principle of Insufficient Reason (Laplace)
Θ = {θ1 , · · · , θp } π(θi ) = 1/p
Extension to continuous spaces
π(θ) ∝ 1
[Lebesgue measure]
80 / 785
81. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Who’s Laplace?
Pierre Simon de Laplace (1749–1827)
French mathematician and
astronomer born in Beaumont en
Auge (Normandie) who formalised
mathematical astronomy in
M´canique C´leste. Survived the
e e
French revolution, the Napoleon
Empire (as a comte!), and the
Bourbon restauration (as a
marquis!!).
81 / 785
82. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Who’s Laplace?
Pierre Simon de Laplace (1749–1827)
French mathematician and
astronomer born in Beaumont en
Auge (Normandie) who formalised
mathematical astronomy in
M´canique C´leste. Survived the
e e
French revolution, the Napoleon
Empire (as a comte!), and the
Bourbon restauration (as a
marquis!!).
In Essai Philosophique sur les Probabilit´s, Laplace set out a
e
mathematical system of inductive reasoning based on probability,
precursor to Bayesian Statistics.
82 / 785
83. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Laplace’s problem
Lack of reparameterization invariance/coherence
1
and ψ = eθ
π(θ) ∝ 1, π(ψ) = = 1 (!!)
ψ
83 / 785
84. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Laplace’s problem
Lack of reparameterization invariance/coherence
1
and ψ = eθ
π(θ) ∝ 1, π(ψ) = = 1 (!!)
ψ
Problems of properness
x ∼ N (µ, σ 2 ), π(µ, σ) = 1
2 2
π(µ, σ|x) ∝ e−(x−µ) /2σ σ −1
⇒ π(σ|x) ∝1 (!!!)
84 / 785
85. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Jeffreys’ prior
Based on Fisher information
∂ log ℓ ∂ log ℓ
I F (θ) = Eθ
∂θt ∂θ
Ron Fisher (1890–1962)
85 / 785
86. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Jeffreys’ prior
Based on Fisher information
∂ log ℓ ∂ log ℓ
I F (θ) = Eθ
∂θt ∂θ
Ron Fisher (1890–1962)
the Jeffreys prior distribution is
π J (θ) ∝ |I F (θ)|1/2
86 / 785
87. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Who’s Jeffreys?
Sir Harold Jeffreys (1891–1989)
English mathematician,
statistician, geophysicist, and
astronomer. Founder of English
Geophysics & originator of the
theory that the Earth core is
liquid.
87 / 785
88. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Who’s Jeffreys?
Sir Harold Jeffreys (1891–1989)
English mathematician,
statistician, geophysicist, and
astronomer. Founder of English
Geophysics & originator of the
theory that the Earth core is
liquid.
Formalised Bayesian methods for
the analysis of geophysical data
and ended up writing Theory of
Probability
88 / 785
89. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Pros & Cons
Relates to information theory
89 / 785
90. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Pros & Cons
Relates to information theory
Agrees with most invariant priors
90 / 785
91. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Pros & Cons
Relates to information theory
Agrees with most invariant priors
Parameterization invariant
91 / 785
92. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Prior selection
Pros & Cons
Relates to information theory
Agrees with most invariant priors
Parameterization invariant
Suffers from dimensionality curse
92 / 785
93. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Bayesian estimation
Evaluating estimators
Purpose of most inferential studies: to provide the
statistician/client with a decision d ∈ D
93 / 785
94. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Bayesian estimation
Evaluating estimators
Purpose of most inferential studies: to provide the
statistician/client with a decision d ∈ D
Requires an evaluation criterion/loss function for decisions and
estimators
L(θ, d)
94 / 785
95. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Bayesian estimation
Evaluating estimators
Purpose of most inferential studies: to provide the
statistician/client with a decision d ∈ D
Requires an evaluation criterion/loss function for decisions and
estimators
L(θ, d)
There exists an axiomatic derivation of the existence of a
loss function.
[DeGroot, 1970]
95 / 785
96. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Bayesian estimation
Loss functions
Decision procedure δ π usually called estimator
(while its value δ π (x) is called estimate of θ)
96 / 785
97. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Bayesian estimation
Loss functions
Decision procedure δ π usually called estimator
(while its value δ π (x) is called estimate of θ)
Impossible to uniformly minimize (in d) the loss function
L(θ, d)
when θ is unknown
97 / 785
98. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Bayesian estimation
Bayesian estimation
Principle Integrate over the space Θ to get the posterior expected
loss
= Eπ [L(θ, d)|x]
= L(θ, d)π(θ|x) dθ,
Θ
and minimise in d
98 / 785
99. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Bayesian estimation
Bayes estimates
Bayes estimator
A Bayes estimate associated with a prior distribution π and a loss
function L is
arg min Eπ [L(θ, d)|x] .
d
99 / 785
100. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Bayesian estimation
The quadratic loss
Historically, first loss function
(Legendre, Gauss, Laplace)
L(θ, d) = (θ − d)2
100 / 785
101. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Bayesian estimation
The quadratic loss
Historically, first loss function
(Legendre, Gauss, Laplace)
L(θ, d) = (θ − d)2
The Bayes estimate δ π (x) associated with the prior π and with the
quadratic loss is the posterior expectation
Θ θf (x|θ)π(θ) dθ
δ π (x) = Eπ [θ|x] = .
Θ f (x|θ)π(θ) dθ
101 / 785
102. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Bayesian estimation
The absolute error loss
Alternatives to the quadratic loss:
L(θ, d) =| θ − d |,
or
k2 (θ − d) if θ > d,
Lk1 ,k2 (θ, d) =
k1 (d − θ) otherwise.
102 / 785
103. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Bayesian estimation
The absolute error loss
Alternatives to the quadratic loss:
L(θ, d) =| θ − d |,
or
k2 (θ − d) if θ > d,
Lk1 ,k2 (θ, d) =
k1 (d − θ) otherwise.
Associated Bayes estimate is (k2 /(k1 + k2 )) fractile of π(θ|x)
103 / 785
104. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Bayesian estimation
MAP estimator
With no loss function, consider using the maximum a posteriori
(MAP) estimator
arg max ℓ(θ|x)π(θ)
θ
104 / 785
105. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Bayesian estimation
MAP estimator
With no loss function, consider using the maximum a posteriori
(MAP) estimator
arg max ℓ(θ|x)π(θ)
θ
Penalized likelihood estimator
105 / 785
106. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Bayesian estimation
MAP estimator
With no loss function, consider using the maximum a posteriori
(MAP) estimator
arg max ℓ(θ|x)π(θ)
θ
Penalized likelihood estimator
Further appeal in restricted parameter spaces
106 / 785
107. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Bayesian estimation
Example (Binomial probability)
Consider x|θ ∼ B(n, θ).
Possible priors:
1
π J (θ) = θ−1/2 (1 − θ)−1/2 ,
B(1/2, 1/2)
π2 (θ) = θ−1 (1 − θ)−1 .
π1 (θ) = 1 and
107 / 785
108. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Bayesian estimation
Example (Binomial probability)
Consider x|θ ∼ B(n, θ).
Possible priors:
1
π J (θ) = θ−1/2 (1 − θ)−1/2 ,
B(1/2, 1/2)
π2 (θ) = θ−1 (1 − θ)−1 .
π1 (θ) = 1 and
Corresponding MAP estimators:
x − 1/2
δ πJ (x) = max ,0 ,
n−1
δ π1 (x) = x/n,
x−1
δ π2 (x) = max ,0 .
n−2
108 / 785
109. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Bayesian estimation
Not always appropriate
Example (Fixed MAP)
Consider
1 −1
1 + (x − θ)2
f (x|θ) = ,
π
1
and π(θ) = 2 e−|θ| .
109 / 785
110. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Bayesian estimation
Not always appropriate
Example (Fixed MAP)
Consider
1 −1
1 + (x − θ)2
f (x|θ) = ,
π
1
and π(θ) = 2 e−|θ| . Then the MAP estimate of θ is always
δ π (x) = 0
110 / 785
111. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Confidence regions
Credible regions
Natural confidence region: Highest posterior density (HPD) region
π
Cα = {θ; π(θ|x) > kα }
111 / 785
112. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Confidence regions
Credible regions
Natural confidence region: Highest posterior density (HPD) region
π
Cα = {θ; π(θ|x) > kα }
Optimality
0.15
The HPD regions give the highest
probabilities of containing θ for a
0.10
given volume
0.05
0.00
−10 −5 0 5 10
µ
112 / 785
113. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Confidence regions
Example
If the posterior distribution of θ is N (µ(x), ω −2 ) with
ω 2 = τ −2 + σ −2 and µ(x) = τ 2 x/(τ 2 + σ 2 ), then
Cα = µ(x) − kα ω −1 , µ(x) + kα ω −1 ,
π
where kα is the α/2-quantile of N (0, 1).
113 / 785
114. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Confidence regions
Example
If the posterior distribution of θ is N (µ(x), ω −2 ) with
ω 2 = τ −2 + σ −2 and µ(x) = τ 2 x/(τ 2 + σ 2 ), then
Cα = µ(x) − kα ω −1 , µ(x) + kα ω −1 ,
π
where kα is the α/2-quantile of N (0, 1).
If τ goes to +∞,
π
Cα = [x − kα σ, x + kα σ] ,
the “usual” (classical) confidence interval
114 / 785
115. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Confidence regions
Full normal
Under [almost!] Jeffreys prior prior
π(µ, σ 2 ) = 1/σ 2 ,
posterior distribution of (µ, σ)
σ2
µ|σ, x, s2
¯x ∼ N x,
¯ ,
n
n − 1 s2
,x
σ 2 |¯, sx
x2 ∼ IG .
2 2
115 / 785
116. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Confidence regions
Full normal
Under [almost!] Jeffreys prior prior
π(µ, σ 2 ) = 1/σ 2 ,
posterior distribution of (µ, σ)
σ2
µ|σ, x, s2
¯x ∼ N x,
¯ ,
n
n − 1 s2
,x
σ 2 |¯, sx
x2 ∼ IG .
2 2
Then
n(¯ − µ)2 (n−3)/2
x
π(µ|¯, s2 ) ∝ ω 1/2 exp −ω exp{−ωs2 /2} dω
xx ω x
2
−n/2
s2 + n(¯ − µ)2
∝ x
x
[Tn−1 distribution]
116 / 785
117. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Confidence regions
Normal credible interval
Derived credible interval on µ
[¯ − tα/2,n−1 sx
x n(n − 1), x + tα/2,n−1 sx
¯ n(n − 1)]
117 / 785
118. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Confidence regions
Normal credible interval
Derived credible interval on µ
[¯ − tα/2,n−1 sx
x n(n − 1), x + tα/2,n−1 sx
¯ n(n − 1)]
normaldata
Corresponding 95% confidence region for µ
[−0.070, −0.013, ]
Since 0 does not belong to this interval, reporting a significant
decrease in the number of larcenies between 1991 and 1995 is
acceptable
118 / 785
119. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Testing hypotheses
Deciding about validity of assumptions or restrictions on the
parameter θ from the data, represented as
H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ0
119 / 785
120. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Testing hypotheses
Deciding about validity of assumptions or restrictions on the
parameter θ from the data, represented as
H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ0
Binary outcome of the decision process: accept [coded by 1] or
reject [coded by 0]
D = {0, 1}
120 / 785
121. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Testing hypotheses
Deciding about validity of assumptions or restrictions on the
parameter θ from the data, represented as
H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ0
Binary outcome of the decision process: accept [coded by 1] or
reject [coded by 0]
D = {0, 1}
Bayesian solution formally very close from a likelihood ratio test
statistic, but numerical values often strongly differ from classical
solutions
121 / 785
122. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
The 0 − 1 loss
Rudimentary loss function
1−d if θ ∈ Θ0
L(θ, d) =
d otherwise,
Associated Bayes estimate
1 if P π (θ ∈ Θ |x) > 1 ,
0
π
δ (x) = 2
0 otherwise.
122 / 785
123. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
The 0 − 1 loss
Rudimentary loss function
1−d if θ ∈ Θ0
L(θ, d) =
d otherwise,
Associated Bayes estimate
1 if P π (θ ∈ Θ |x) > 1 ,
0
π
δ (x) = 2
0 otherwise.
Intuitive structure
123 / 785
124. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Extension
Weighted 0 − 1 (or a0 − a1 ) loss
0 if d = IΘ0 (θ),
L(θ, d) = a0 if θ ∈ Θ0 and d = 0,
a1 if θ ∈ Θ0 and d = 1,
124 / 785
125. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Extension
Weighted 0 − 1 (or a0 − a1 ) loss
0 if d = IΘ0 (θ),
L(θ, d) = a0 if θ ∈ Θ0 and d = 0,
a1 if θ ∈ Θ0 and d = 1,
Associated Bayes estimator
a1
1 if P π (θ ∈ Θ0 |x) > ,
a0 + a1
π
δ (x) =
0 otherwise.
125 / 785
126. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Example (Normal-normal)
For x ∼ N (θ, σ 2 ) and θ ∼ N (µ, τ 2 ), π(θ|x) is N (µ(x), ω 2 ) with
σ2µ + τ 2x σ2τ 2
ω2 =
µ(x) = and .
σ2 + τ 2 σ2 + τ 2
126 / 785
127. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Example (Normal-normal)
For x ∼ N (θ, σ 2 ) and θ ∼ N (µ, τ 2 ), π(θ|x) is N (µ(x), ω 2 ) with
σ2µ + τ 2x σ2τ 2
ω2 =
µ(x) = and .
σ2 + τ 2 σ2 + τ 2
To test H0 : θ < 0, we compute
θ − µ(x) −µ(x)
P π (θ < 0|x) = P π <
ω ω
= Φ (−µ(x)/ω) .
127 / 785
128. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Example (Normal-normal (2))
If za0 ,a1 is the a1 /(a0 + a1 ) quantile, i.e.,
Φ(za0 ,a1 ) = a1 /(a0 + a1 ) ,
H0 is accepted when
−µ(x) > za0 ,a1 ω,
the upper acceptance bound then being
σ2 σ2
x≤− µ − (1 + 2 )ωza0 ,a1 .
τ2 τ
128 / 785
129. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Bayes factor
Bayesian testing procedure depends on P π (θ ∈ Θ0 |x) or
alternatively on the Bayes factor
{P π (θ ∈ Θ1 |x)/P π (θ ∈ Θ0 |x)}
π
B10 =
{P π (θ ∈ Θ1 )/P π (θ ∈ Θ0 )}
in the absence of loss function parameters a0 and a1
129 / 785
130. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Associated reparameterisations
Corresponding models M1 vs. M0 compared via
P π (M1 |x) P π (M1 )
π
B10 =
P π (M0 |x) P π (M0 )
130 / 785
131. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Associated reparameterisations
Corresponding models M1 vs. M0 compared via
P π (M1 |x) P π (M1 )
π
B10 =
P π (M0 |x) P π (M0 )
If we rewrite the prior as
π(θ) = Pr(θ ∈ Θ1 ) × π1 (θ) + Pr(θ ∈ Θ0 ) × π0 (θ)
then
π
B10 = f (x|θ1 )π1 (θ1 )dθ1 f (x|θ0 )π0 (θ0 )dθ0 = m1 (x)/m0 (x)
[Akin to likelihood ratio]
131 / 785
132. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Jeffreys’ scale
π
if log10 (B10 ) varies between 0 and 0.5,
1
the evidence against H0 is poor,
if it is between 0.5 and 1, it is is
2
substantial,
if it is between 1 and 2, it is strong,
3
and
if it is above 2 it is decisive.
4
132 / 785
133. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Point null difficulties
If π absolutely continuous,
P π (θ = θ0 ) = 0 . . .
133 / 785
134. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Point null difficulties
If π absolutely continuous,
P π (θ = θ0 ) = 0 . . .
How can we test H0 : θ = θ0 ?!
134 / 785
135. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
New prior for new hypothesis
Testing point null difficulties requires a modification of the prior
distribution so that
π(Θ0 ) > 0 and π(Θ1 ) > 0
(hidden information) or
π(θ) = P π (θ ∈ Θ0 ) × π0 (θ) + P π (θ ∈ Θ1 ) × π1 (θ)
135 / 785
136. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
New prior for new hypothesis
Testing point null difficulties requires a modification of the prior
distribution so that
π(Θ0 ) > 0 and π(Θ1 ) > 0
(hidden information) or
π(θ) = P π (θ ∈ Θ0 ) × π0 (θ) + P π (θ ∈ Θ1 ) × π1 (θ)
[E.g., when Θ0 = {θ0 }, π0 is Dirac mass at θ0 ]
136 / 785
137. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Posteriors with Dirac masses
If H0 : θ = θ0 (= Θ0 ),
ρ = P π (θ = θ0 ) and π(θ) = ρIθ0 (θ) + (1 − ρ)π1 (θ)
then
f (x|θ0 )ρ
π(Θ0 |x) =
f (x|θ)π(θ) dθ
f (x|θ0 )ρ
=
f (x|θ0 )ρ + (1 − ρ)m1 (x)
with
m1 (x) = f (x|θ)π1 (θ) dθ.
Θ1
137 / 785
138. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Example (Normal-normal)
For x ∼ N (θ, σ 2 ) and θ ∼ N (0, τ 2 ), to test H0 : θ = 0 requires a
modification of the prior, with
2 /2τ 2
π1 (θ) ∝ e−θ Iθ=0
and π0 (θ) the Dirac mass in 0
138 / 785
139. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Example (Normal-normal)
For x ∼ N (θ, σ 2 ) and θ ∼ N (0, τ 2 ), to test H0 : θ = 0 requires a
modification of the prior, with
2 /2τ 2
π1 (θ) ∝ e−θ Iθ=0
and π0 (θ) the Dirac mass in 0
Then
2 2 2
e−x /2(σ +τ )
m1 (x) σ
√
= 2 2
f (x|0) σ 2 + τ 2 e−x /2σ
τ 2 x2
σ2
= exp ,
σ2 + τ 2 2σ 2 (σ 2 + τ 2 )
139 / 785
140. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Example (cont’d)
and
−1
τ 2 x2
σ2
1−ρ
π(θ = 0|x) = 1 + exp .
σ2 + τ 2 2σ 2 (σ 2 + τ 2 )
ρ
For z = x/σ and ρ = 1/2:
z 0 0.68 1.28 1.96
π(θ = 0|z, τ = σ) 0.586 0.557 0.484 0.351
π(θ = 0|z, τ = 3.3σ) 0.768 0.729 0.612 0.366
140 / 785
141. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Banning improper priors
Impossibility of using improper priors for testing!
141 / 785
142. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Banning improper priors
Impossibility of using improper priors for testing!
Reason: When using the representation
π(θ) = P π (θ ∈ Θ1 ) × π1 (θ) + P π (θ ∈ Θ0 ) × π0 (θ)
π1 and π0 must be normalised
142 / 785
143. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Example (Normal point null)
When x ∼ N (θ, 1) and H0 : θ = 0, for the improper prior
π(θ) = 1, the prior is transformed as
1 1
π(θ) = I0 (θ) + · Iθ=0 ,
2 2
and
2 /2
e−x
π(θ = 0|x) = +∞
e−x2 /2 + −∞ e−(x−θ)2 /2 dθ
1
√
= .
1 + 2πex2 /2
143 / 785
144. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Example (Normal point null (2))
Consequence: probability of H0 is bounded from above by
√
π(θ = 0|x) ≤ 1/(1 + 2π) = 0.285
x 0.0 1.0 1.65 1.96 2.58
π(θ = 0|x) 0.285 0.195 0.089 0.055 0.014
Regular tests: Agreement with the classical p-value (but...)
144 / 785
145. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Example (Normal one-sided)
For x ∼ N (θ, 1), π(θ) = 1, and H0 : θ ≤ 0 to test versus
H1 : θ > 0
0
1 2 /2
e−(x−θ)
π(θ ≤ 0|x) = √ dθ = Φ(−x).
2π −∞
The generalized Bayes answer is also the p-value
145 / 785
146. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Example (Normal one-sided)
For x ∼ N (θ, 1), π(θ) = 1, and H0 : θ ≤ 0 to test versus
H1 : θ > 0
0
1 2 /2
e−(x−θ)
π(θ ≤ 0|x) = √ dθ = Φ(−x).
2π −∞
The generalized Bayes answer is also the p-value
normaldata
If π(µ, σ 2 ) = 1/σ 2 ,
π(µ ≥ 0|x) = 0.0021
since µ|x ∼ T89 (−0.0144, 0.000206).
146 / 785
147. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Jeffreys–Lindley paradox
Limiting arguments not valid in testing settings: Under a
conjugate prior
−1
τ 2 x2
σ2
1−ρ
π(θ = 0|x) = 1+ exp ,
σ2 + τ 2 2σ 2 (σ 2 + τ 2 )
ρ
which converges to 1 when τ goes to +∞, for every x
147 / 785
148. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Jeffreys–Lindley paradox
Limiting arguments not valid in testing settings: Under a
conjugate prior
−1
τ 2 x2
σ2
1−ρ
π(θ = 0|x) = 1+ exp ,
σ2 + τ 2 2σ 2 (σ 2 + τ 2 )
ρ
which converges to 1 when τ goes to +∞, for every x
Difference with the “noninformative” answer
√
[1 + 2π exp(x2 /2)]−1
[ Invalid answer]
148 / 785
149. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Normalisation difficulties
If g0 and g1 are σ-finite measures on the subspaces Θ0 and Θ1 , the
choice of the normalizing constants influences the Bayes factor:
If gi replaced by ci gi (i = 0, 1), Bayes factor multiplied by
c0 /c1
149 / 785
150. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Testing
Normalisation difficulties
If g0 and g1 are σ-finite measures on the subspaces Θ0 and Θ1 , the
choice of the normalizing constants influences the Bayes factor:
If gi replaced by ci gi (i = 0, 1), Bayes factor multiplied by
c0 /c1
Example
If the Jeffreys prior is uniform and g0 = c0 , g1 = c1 ,
ρc0 f (x|θ) dθ
Θ0
π(θ ∈ Θ0 |x) =
ρc0 f (x|θ) dθ + (1 − ρ)c1 f (x|θ) dθ
Θ0 Θ1
ρ f (x|θ) dθ
Θ0
=
ρ f (x|θ) dθ + (1 − ρ)[c1 /c0 ] f (x|θ) dθ
Θ0 Θ1
150 / 785
151. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Monte Carlo integration
Monte Carlo integration
Generic problem of evaluating an integral
I = Ef [h(X)] = h(x) f (x) dx
X
where X is uni- or multidimensional, f is a closed form, partly
closed form, or implicit density, and h is a function
151 / 785
152. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
The normal model
Monte Carlo integration
Monte Carlo Principle
Use a sample (x1 , . . . , xm ) from the density f to approximate the
integral I by the empirical average
m
1
hm = h(xj )
m
j=1
152 / 785