Dr09 Slide

A vanilla Rao–Blackwellisation of
Metropolis–Hastings algorithms

Randal DOUC and Christian ROBERT
Telecom SudParis, France

randal.douc@it-sudparis.eu

April 2009

1 / 24

Main themes

1 Rao–Blackwellisation on MCMC.
2 Can be performed in any Hastings Metropolis algorithm.
3 Asymptotically more efﬁcient to usual MCMC with a
controlled amount of calculations.

2 / 24

Introduction Some properties of the HM algorithm Rao–Blackwellisation Illustrations Conclusion

Outline

1 Introduction

2 Some properties of the HM algorithm

3 Rao–Blackwellisation
Variance reduction
Asymptotic results

4 Illustrations

5 Conclusion

3 / 24


Outline

1 Introduction


Variance reduction
Asymptotic results

4 Illustrations

5 Conclusion

4 / 24


Metropolis Hastings algorithm

1 We wish to approximate

h(x )π(x )dx
I= = h(x )¯ (x )dx
π
π(x )dx

2 x → π(x ) is known but not π(x )dx .
1 n
3 Approximate I with δ = n t=1 h(x (t) ) where (x (t) ) is a Markov
chain with limiting distribution π .
¯
4 Convergence obtained from Law of Large Numbers or CLT for
Markov chains.

5 / 24


Metropolis Hasting Algorithm

Suppose that x (t) is drawn.
1 Simulate yt ∼ q(·|x (t) ).
2 Set x (t+1) = yt with probability

π(yt ) q(x (t) |yt )
α(x (t) , yt ) = min 1,
π(x (t) ) q(yt |x (t) )

Otherwise, set x (t+1) = x (t) .
3 α is such that the detailed balance equation is satisﬁed: ⊲ π is
¯
the stationary distribution of (x (t) ).
◮ The accepted candidates are simulated with the rejection
algorithm.

6 / 24


Metropolis Hasting Algorithm
Suppose that x (t) is drawn.
1 Simulate yt ∼ q(·|x (t) ).
2 Set x (t+1) = yt with probability

α(x (t) , yt ) = min 1,
π(x (t) ) q(yt |x (t) )

Otherwise, set x (t+1) = x (t) .
3 α is such that the detailed balance equation is satisﬁed:

π(x )q(y |x )α(x , y ) = π(y )q(x |y )α(y , x ).

⊲ π is the stationary distribution of (x (t) ).
¯
◮ The accepted candidates are simulated with the rejection
algorithm.
6 / 24


Outline

1 Introduction


Variance reduction
Asymptotic results

4 Illustrations

5 Conclusion

7 / 24


1 Alternative representation of the estimator δ is
n MN
1 (t) 1
δ= h(x ) = ni h(zi ) ,
n N
t=1 i=1

where
zi ’s are the accepted yj ’s,
MN is the number of accepted yj ’s till time N,
ni is the number of times zi appears in the sequence (x (t) )t .

8 / 24


α(zi , ·) q(·|zi ) q(·|zi )
˜
q (·|zi ) = ≤ ,
p(zi ) p(zi )

where p(zi ) = ˜
α(zi , y ) q(y |zi )dy . To simulate according to q (·|zi ):
1 Propose a candidate y ∼ q(·|zi )
2 Accept with probability

q(y |zi )
˜
q (y |zi )/ = α(zi , y )
p(zi )

Otherwise, reject it and starts again.
3 ◮ this is the transition of the HM algorithm.
˜
The transition kernel q admits π as a stationary distribution:
˜

˜ ˜
π (x )q (y |x ) =

9 / 24


Lemme
The sequence (zi , ni ) satisﬁes
1 (zi , ni )i is a Markov chain;
2 zi+1 and ni are independent given zi ;
3 ni is distributed as a geometric random variable with probability
parameter
p(zi ) := α(zi , y ) q(y |zi ) dy ; (1)

4 (zi )i is a Markov chain with transition kernel
˜ ˜
Q(z, dy ) = q (y |z)dy and stationary distribution π such that
˜

˜
q (·|z) ∝ α(z, ·) q(·|z) and π (·) ∝ π(·)p(·) .
˜

10 / 24


zi−1

11 / 24


indep
zi−1 zi

indep

ni−1

11 / 24


indep indep
zi−1 zi zi+1

indep indep

ni−1 ni

11 / 24


indep indep
zi−1 zi zi+1

indep indep

ni−1 ni

n MN
1 1
δ= h(x (t) ) = ni h(zi ) .
n N
t=1 i=1

11 / 24


Outline

1 Introduction


Variance reduction
Asymptotic results

4 Illustrations

5 Conclusion

12 / 24


1 A natural idea:
MN
1 h(zi )
δ∗ = ,
N p(zi )
i=1

13 / 24


1 A natural idea:

MN h(zi ) MN π(zi )
i=1 i=1 h(zi )
p(zi ) π (zi )
˜
δ∗ ≃ = .
MN 1 MN π(zi )
i=1 i=1
p(zi ) π (zi )
˜

13 / 24


1 A natural idea:

i=1 i=1 h(zi )
∗ p(zi ) π (zi )
˜
δ ≃ = .
MN 1 MN π(zi )
i=1 i=1
p(zi ) π (zi )
˜

2 But p not available in closed form.

13 / 24


1 A natural idea:

i=1 i=1 h(zi )
∗ p(zi ) π (zi )
˜
δ ≃ = .
MN 1 MN π(zi )
i=1 i=1
p(zi ) π (zi )
˜

3 The geometric ni is the obvious solution that is used in the
original Metropolis–Hastings estimate.

13 / 24


1 A natural idea:

i=1 i=1 h(zi )
p(zi ) π (zi )
˜
δ∗ ≃ = .
MN 1 MN π(zi )
i=1 i=1
p(zi ) π (zi )
˜

3 The geometric ni is the obvious solution that is used in the
original Metropolis–Hastings estimate.

∞
ni = 1 + I {uℓ ≥ α(zi , yℓ )} ,
j=1 ℓ≤j

13 / 24


∞
ni = 1 + I {uℓ ≥ α(zi , yℓ )} ,
j=1 ℓ≤j

Lemma
If (yj )j is an iid sequence with distribution q(y |zi ), the quantity
∞
ˆ
ξi = 1 + {1 − α(zi , yℓ )}
j=1 ℓ≤j

is an unbiased estimator of 1/p(zi ) which variance, conditional on zi ,
is lower than the conditional variance of ni , {1 − p(zi )}/p2 (zi ).

13 / 24


∞
ˆ
ξi = 1 + {1 − α(zi , yℓ )}
j=1 ℓ≤j

1 Infinite sum but sometimes finite:
α(x (t) , yt ) = min 1,
π(x (t) ) q(yt |x (t) )

For example: take a symetric random walk as a proposal.
2 What if we wish to be sure that the sum is finite?

14 / 24


Variance reduction

Proposition
If (yj )j is an iid sequence with distribution q(y |zi ) and (uj )j is an iid
uniform sequence, for any k ≥ 0, the quantity
∞
ˆ
ξik = 1 + {1 − α(zi , yj )} I {uℓ ≥ α(zi , yℓ )} (2)
j=1 1≤ℓ≤k ∧j k +1≤ℓ≤j

is an unbiased estimator of 1/p(zi ) with an almost sure ﬁnite number
of terms.

15 / 24


Variance reduction

Proposition
∞
ˆ
j=1 1≤ℓ≤k ∧j k +1≤ℓ≤j

of terms. Moreover, for k ≥ 1,

ˆ 1 − p(zi ) 1 − (1 − 2p(zi ) + r (zi ))k 2 − p(zi )
V ξik zi = − (p(zi )−r (zi )) ,
p2 (zi ) 2p(zi ) − r (zi ) p2 (zi )

where p(zi ) := α(zi , y ) q(y |zi ) dy . and r (zi ) := α2 (zi , y ) q(y |zi ) dy .

15 / 24


Variance reduction

Proposition
∞
ˆ
j=1 1≤ℓ≤k ∧j k +1≤ℓ≤j

of terms. Therefore, we have

ˆ ˆ ˆ
V ξi zi ≤ V ξik zi ≤ V ξi0 zi = V [ni | zi ] .

15 / 24


Variance reduction

zi−1

∞
ˆ
j=1 1≤ℓ≤k ∧j k +1≤ℓ≤j

16 / 24


Variance reduction

not indep
zi−1 zi

not indep

ˆk
ξi−1

∞
ˆ
j=1 1≤ℓ≤k ∧j k +1≤ℓ≤j

16 / 24


Variance reduction

not indep not indep
zi−1 zi zi+1

not indep not indep

ˆk
ξi−1 ˆ
ξik

∞
ˆ
j=1 1≤ℓ≤k ∧j k +1≤ℓ≤j

16 / 24


Variance reduction

not indep not indep
zi−1 zi zi+1

not indep not indep

ˆk
ξi−1 ˆ
ξik

M ˆk
k i=1 ξi h(zi )
δM = M ˆk
.
i=1 ξi

16 / 24


Asymptotic results

Let
M ˆk
k i=1 ξi h(zi )
δM = M ˆk
.
i=1 ξi

For any positive function ϕ, we denote Cϕ = {h; |h/ϕ|∞ < ∞}.

17 / 24


Asymptotic results

Let
M ˆk
k i=1 ξi h(zi )
δM = M ˆk
.
i=1 ξi

Assume that there exist a positive function ϕ ≥ 1 such that
M
i=1 h(zi )/p(zi ) P
∀h ∈ Cϕ , M
−→ π(h) (3)
i=1 1/p(zi )

Theorem
Under the assumption that π(p) > 0, the following convergence
property holds:
i) If h is in Cϕ , then

k P
δM −→M→∞ π(h) (◮C ONSISTENCY)
17 / 24


Asymptotic results

Let
M ˆk
k i=1 ξi h(zi )
δM = M ˆk
.
i=1 ξi

Assume that there exist a positive function ψ such that
√ M
i=1 h(zi )/p(zi ) L
∀h ∈ Cψ , M M
− π(h) −→ N (0, Γ(h))
i=1 1/p(zi )

Theorem

Under the assumption that π(p) > 0, the following convergence
property holds:
ii) If, in addition, h2 /p ∈ Cϕ and h ∈ Cψ , then
√ k L
M(δM − π(h)) −→M→∞ N (0, Vk [h − π(h)]) , (◮C LT)

where Vk (h) := π(p) ˆ
π(dz)V ξik z h2 (z)p(z) + Γ(h) .
17 / 24


Asymptotic results

We will need some additional assumptions. Assume a maximal
inequality for the Markov chain (zi )i : there exists a measurable
function ζ such that for any starting point x ,
 
i
NCh (x )
∀h ∈ Cζ , Px  sup [h(zi ) − π (h)] > ǫ ≤
˜
0≤i≤N ǫ2
j=0

Theorem
Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p2 } ⊂ Cφ . Assume
moreover that
√ 0 L
M δM − π(h) −→ N (0, V0 [h − π(h)]) .

Then, for any starting point x ,

N
t=1 h(x (t) ) L
MN − π(h) −→N→∞ N (0, V0 [h − π(h)]) ,
N
18 / 24


Asymptotic results

We will need some additional assumptions. Assume a maximal
inequality for the Markov chain (zi )i : there exists a measurable
function ζ such that for any starting point x ,
 
i
NCh (x )
∀h ∈ Cζ , Px  sup [h(zi ) − π (h)] > ǫ ≤
˜
0≤i≤N ǫ2
j=0

Moreover, assume that ∃φ ≥ 1 such that for any starting point x ,
∀h ∈ Cφ , ˜ P
Q n (x , h) −→ π (h) = π(ph)/π(p) ,
˜

Theorem
moreover that
√ 0 L
M δM − π(h) −→ N (0, V0 [h − π(h)]) .

18 / 24


Asymptotic results

 
i
NCh (x )
∀h ∈ Cζ , Px  sup [h(zi ) − π (h)] > ǫ ≤
˜
0≤i≤N j=0 ǫ2

∀h ∈ Cφ , ˜ P
Q n (x , h) −→ π (h) = π(ph)/π(p) ,
˜

Theorem
moreover that
√ 0 L
M δM − π(h) −→ N (0, V0 [h − π(h)]) .


N
t=1 h(x (t) ) L
MN − π(h) −→N→∞ N (0, V0 [h − π(h)]) ,
N

18 / 24


Asymptotic results

Theorem
moreover that
√ 0 L
M δM − π(h) −→ N (0, V0 [h − π(h)]) .


N
t=1 h(x (t) ) L
MN − π(h) −→N→∞ N (0, V0 [h − π(h)]) ,
N

where MN is deﬁned by
MN MN +1
ˆ
ξi0 ≤ N < ˆ
ξi0 . (3)
i=1 i=1

18 / 24


Outline

1 Introduction


Variance reduction
Asymptotic results

4 Illustrations

5 Conclusion

19 / 24


Figure: Overlay of the variations of 250 iid realisations of the
estimates δ (gold) and δ ∞ (grey) of E[X ] = 0 for 1000 iterations, along
with the 90% interquantile range for the estimates δ (brown) and δ ∞
(pink), in the setting of a random walk Gaussian proposal with scale
τ = 10.

20 / 24


Figure: Overlay of the variations of 500 iid realisations of the
estimates δ (deep grey), δ ∞ (medium grey) and of the importance
sampling version (light grey) of E[X ] = 10 when X ∼ Exp(.1) for 100
iterations, along with the 90% interquantile ranges (same colour
code), in the setting of an independent exponential proposal with
scale µ = 0.02.
21 / 24


I|x−y |=1 if x > 0 ,
π(x ) = β(1 − β)x and 2q(y |x ) =
I|y |≤1 if x = 0 .
For this problem,

p(x ) = 1 − β/2 and r (x ) = 1 − β + β 2 /2 .

We can therefore compute the gain in variance

p(x ) − r (x ) 2 − p(x ) β(1 − β)(2 + β)
=2
2p(x ) − r (x ) p2 (x ) (2 − β 2 )(2 − β)2

which is optimal for β = 0.174, leading to a gain of 0.578 while the
relative gain in variance is

p(x ) − r (x ) 2 − p(x ) (1 − β)(2 + β)
=
2p(x ) − r (x ) 1 − p(x ) (2 − β 2 )

which is decreasing in β.
22 / 24


Outline

1 Introduction


Variance reduction
Asymptotic results

4 Illustrations

5 Conclusion

23 / 24


a) Rao Blackwellisation of any HM algorithm with a controled
amount of additional calculation.
b) Link with the importance sampling of Markov chains.
c) Analysis with asymptotic results on triangular arrays.

24 / 24

Dr09 Slide

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (11)

Ähnlich wie Dr09 Slide

Ähnlich wie Dr09 Slide (8)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Dr09 Slide