Presentation MCB seminar 09032011

Introduction and State Space Models
Reminder on some Monte Carlo methods
Particle Markov Chain Monte Carlo
SMC2

SMC2 : A sequential Monte Carlo algorithm with
particle Markov chain Monte Carlo updates

N. CHOPIN1 , P.E. JACOB2 , & O. PAPASPILIOPOULOS3

MCB seminar, March 9th, 2011

1
ENSAE-CREST
2
CREST & Universit´ Paris Dauphine, funded by AXA research
e
3
Universitat Pompeu Fabra
N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC2 1/ 72

SMC2

Outline

1 Introduction and State Space Models

2 Reminder on some Monte Carlo methods

3 Particle Markov Chain Monte Carlo

4 SMC2


SMC2

Outline




4 SMC2


SMC2

State Space Models

Context
In these models:
we observe some data Y1:T = (Y1 , . . . YT ),
we suppose that they depend on some hidden states X1:T .


SMC2

State Space Models

Some interesting distributions
Bayesian inference focuses on:

p(θ|y1:T )

Filtering (traditionally) focuses on:

∀t ∈ [1, T ] pθ (xt |y1:t )

Smoothing (traditionally) focuses on:

∀t ∈ [1, T ] pθ (xt |y1:T )


SMC2

State Space Models

Some interesting distributions [spoiler]
PMCMC methods provide a sample from:

p(θ, x1:T |y1:T )

SMC2 provides a sample from:

∀t ∈ [1, T ] p(θ, x1:t |y1:t )


SMC2

Examples

Local level

yt
 = xt + σV εt , εt ∼ N (0, 1),
x = xt + σW ηt , ηt ∼ N (0, 1),
 t+1
x0 ∼ N (0, 1)


Here: θ = (σV , σW ). The model is linear and Gaussian.


SMC2

Examples

Stochastic Volatility (simple)

yt |xt ∼ N (0, e xt )

x = µ + ρ(xt−1 − µ) + σεt
 t
x0 = µ0


Here: θ = (µ, ρ, σ), or can include µ0 .


SMC2

Examples

Population growth model

yt
 = nt + σw εt
log nt+1 = log nt + b0 + b1 (nt )b2 + σ ηt

log n0 = µ0


Here: θ = (b0 , b1 , b2 , σ , σW ), or can include µ0 .


SMC2

Examples

Stochastic Volatility (sophisticated)
1/2
yt = µ + βvt + vt t ,t ≥ 1

iid iid
k ∼ Poi λξ 2 /ω 2 c1:k ∼ U(t, t + 1) ei:k ∼ Exp ξ/ω 2
k
zt+1 = e −λ zt + e −λ(t+1−cj ) ej
j=1
 
k
1
vt+1 = zt − zt+1 + ej 
λ
j=1

xt+1 = (vt+1 , zt+1 )


SMC2

Examples

20
2

Squared observations
15
Observations

0

10

−2

5

−4

100 200 300 400 500 600 700 100 200 300 400 500 600 700
Time Time

(a) (b)

Figure: The S&P 500 data from 03/01/2005 to 21/12/2007.


SMC2

Examples

Athletics records model

2
g (yi,t |µt , ξ, σ)
g (y1:2,t |µt , ξ, σ) = {1 − G (y2,t |µt , ξ, σ)}
1 − G (yi,t |µt , ξ, σ)
i=1

xt = (µt , µt ) ,
˙ xt+1 | xt , ν ∼ N (Fxt , Q) ,
with
1 1 1/3 1/2
F = and Q = ν 2
0 1 1/2 1
−1/ξ
y −µ
G (y |µ, ξ, σ) = 1 − exp − 1 − ξ
σ +


SMC2

Examples

530

520

Times (seconds)
510

500

490

480

1980 1985 1990 1995 2000 2005 2010
Year

Figure: Best two times of each year, in women’s 3000 metres events
between 1976 and 2010.


SMC2

Outline




4 SMC2


SMC2

Metropolis-Hastings algorithm
A popular method to sample from a distribution π.

Algorithm 1 Metropolis-Hastings algorithm
1: Set some x (1)
2: for i = 2 to N do
3: Propose x ∗ ∼ q(·|x (i−1) )
4: Compute the ratio:

π(x ) q(x (i−1) |x )
α = min 1,
π(x (i−1) ) q(x |x (i−1) )

5: Set x (i) = x with probability α, otherwise set x (i) = x (i−1)
6: end for


SMC2

Metropolis-Hastings algorithm

Requirements
π can be evaluated point-wise, up to a multiplicative constant.
x is low-dimensional, otherwise designing q gets tedious or
even impossible.

Back to SSM
p(θ|y1:T ) cannot be evaluated point-wise.
pθ (x1:T |y1:T ) and p(x1:T , θ|y1:T ) are high-dimensional, and
cannot be necessarily computed point-wise either.


SMC2

Gibbs sampling

Suppose the target distribution π is deﬁned on X d .

Algorithm 2 Gibbs sampling
(1)
1: Set some x1:d
2: for i = 2 to N do
3: for j = 1 to d do
(i) (i) (i) (i−1)
4: Draw xj ∼ π(xj |x1:j−1 , xj+1:d )
5: end for
6: end for

It allows to break a high-dimensional sampling problem into many
low-dimensional sampling problems!


SMC2

Gibbs sampling

Requirements
Conditional distributions π(xj |x1:j−1 , xj+1:d ) can be sampled
from, otherwise MH within Gibbs.
The components xj are not too correlated one to another.

Back to SSM
The hidden states x1:T are typically very correlated one to
another.
If the target is p(θ, x1:T |y1:T ), θ is also very correlated with
x1:T .


SMC2

Sequential Monte Carlo for ﬁltering

Context
Suppose we are interested in pθ (x1:T |y1:T ), with θ known.
(i)
We want to get a sample x1:T , i ∈ [1, N] from it.

General idea
We introduce the following sequence of distributions:

{pθ (x1:t |y1:t ), t ∈ [1, T ]}

Sample recursively from pθ (x1:t |y1:t ) to pθ (x1:t+1 |y1:t+1 ).


SMC2


Deﬁnition
A particle ﬁlter is just a collection of weighted points, called
particles.

Particles
Writing (w (i) , x (i) )N ∼ π means that the empirical distribution:
i=1

N
w (i) δx (i) (dx)
i=1

converges towards π when N → +∞.


SMC2


Importance Sampling
Suppose:
(i)
(w1 , x (i) )N ∼ π1
i=1

and if we deﬁne:
(i) (i) π2 (x (i) )
w2 = w1 ∗
π1 (x (i) )
then
(i)
(w2 , x (i) )N ∼ π2
i=1

under some common-sense assumptions on π1 and π2 .


SMC2


Proposal
(i) (i)
Propose xt+1 ∼ qθ (xt+1 |x1:t = x1:t , y1:t ). Then:

(i) (i) (i) N
wt , (x1:t , xt+1 ) ∼ qθ (xt+1 |x1:t , y1:t+1 )pθ (x1:t |y1:t )
i=1


SMC2


Reweighting
(i) (i) (i)
(i) (i) gθ (yt+1 |xt+1 )fθ (xt+1 |xt )
wt+1 = wt × (i) (i)
qθ (xt+1 |x1:t , y1:t+1 )
and ﬁnally we have
(i) (i)
(wt+1 , x1:t+1 )N ∼ pθ (x1:t+1 |y1:t+1 )
i=1


SMC2


Resampling
To ﬁght the weight degeneracy we introduce a resampling step.

Notation
Family of probability distribution on {1, . . . N}N :
N
N
a ∼ r (·|w ) for w ∈ [0, 1] such that w (i) = 1
i=1

(i) (i)
The variables (at−1 )N are the indices of the parents of (x1:t )N .
i=1 i=1


SMC2


Algorithm 3 Sequential Monte Carlo algorithm
(i)
1: Propose x1 ∼ µθ (·)
(i)
2: Compute weights w1
3: for t = 2 to T do
4: Resample at−1 ∼ r (·|wt−1 )
(i) (i)
(i)t−1 a (i)
t−1 a (i)
5: Propose xt ∼ qθ (·|x1:t−1 , y1:t ), let x1:t = (x1:t−1 , xt )
(i) (i)
6: Update wt to get wt+1
7: end for


SMC2


time
Figure: Three weighted trajectories x1:t at time t.


SMC2


time
Figure: Three proposed trajectories x1:t+1 at time t + 1.


SMC2


time
Figure: Three reweighted trajectories x1:t+1 at time t + 1


SMC2


Output
In the end we get particles:
(i) (i)
(wT , x1:T )N ∼ pθ (x1:T |y1:T )
i=1

Requirements
Proposal kernels qθ (·|x1:t−1 , y1:t ) from which we can sample.
Weight functions which we can evaluate point-wise.
These proposal kernels and weight functions must result in
properly weighted samples.


SMC2


Marginal likelihood
A side eﬀect of the SMC algorithm is that we can approximate the
marginal likelihood ZT :

ZT = p(y1:T |θ)

with the following unbiased estimate:
T N
ˆN 1 (i) P
ZT = wt − − → ZT
−−
N N→∞
t=1 i=1


SMC2

Outline




4 SMC2


SMC2

Reference

Particle Markov Chain Monte Carlo methods
is an article by Andrieu, Doucet, Holenstein,
JRSS B., 2010, 72(3):269–342

Motivation
Bayesian inference in state space models:

p(θ, x1:T |y1:T )


SMC2

Valid Metropolis–Hastings for SSM ??

Plug in estimates
ˆN
However we have ZT (θ) ≈ p(y1:T |θ) by running a SMC algorithm,
and we can try to run a MH algorithm with acceptance rate:

ˆN
p(θ )ZT (θ ) q(θ(i) |θ )
α(θ(i) , θ ) = min 1,
ˆ
p(θ(i) )Z N (θ(i) ) q(θ |θ(i) )
T


SMC2

The Beauty of Particle MCMC

“Exact approximation”
Turns out it is a valid MH algorithm that targets exactly p(θ|y1:T ),
regardless of the number N of particles used in the SMC algorithm
ˆN
that provides the estimates ZT (θ) at each iteration.

State estimation
In fact the PMCMC algorithms provide samples from
p(θ, x1:T |y1:T ), and not only from the posterior distribution of the
parameters.


SMC2

Particle Metropolis-Hastings

Algorithm 4 Particle Metropolis-Hastings algorithm
1: Set some θ(1)
ˆN (1)
2: Run a SMC algorithm, keep ZT (θ(1) ), draw a trajectory x1:T
3: for i = 2 to I do
4: Propose θ ∼ q(·|θ(i−1) )
5: ˆN
Run a SMC algorithm, keep ZT (θ ), draw a trajectory x1:T
6: Compute the ratio:

ˆN
p(θ )ZT (θ ) q(θ(i−1) |θ )
α(θ(i−1) , θ ) = min 1,
ˆ
p(θ(i−1) )Z N (θ(i−1) ) q(θ |θ(i−1) )
T

(i)
7: Set θ(i) = θ , x1:T = x1:T with probability α, otherwise keep
the previous values
8: end for

SMC2

Why does it work?

Variables generated by SMC
(1) (N)
∀t ∈ [1, T ] xt = (xt , . . . xt )
(1) (N)
∀t ∈ [1, T − 1] at = (at , . . . at )

Joint distribution

N
(i)
ψ(x1 , . . . xT , a1 , . . . aT −1 ) = qθ (x1 )
i=1
T N (i)
(i) a
1:t−1
× r (at−1 |wt−1 ) qθ (xt |x1:t−1 )
t=2 i=1


SMC2

Why does it work?

Extended proposal distribution
k ,
The PMH proposes: a new parameter θ , a trajectory x1:T , and
the rest of the variables generated by the SMC.

q N (θ , k , x1 , . . . xT , a1 , . . . aT −1 )
k ,
= q(θ |θ(i) )wT ψ (x1 , . . . xT , a1 , . . . aT −1 )


SMC2

Why does it work?

Extended target distribution

π N (θ, k, x1 , . . . xT , a1 , . . . aT −1 )
˜
p(θ, x1:T |y1:T ) ψ θ (x1 , . . . xT , a1 , . . . aT −1 )
=
NT bk
qθ (x1 1 ) T r (bt−1 |wt−1 )qθ (xt t |x1:t−1 )
k
k
b k bt−1
t=2

k (k)
with b1:T the index history of particle x1:T .

Valid algorithm
From the explicit form of the extended distributions, showing that
PMH is a standard MH algorithm becomes straightforward.


SMC2

Particle MCMC: conclusion

Remarks
It is exact regardless of N . . .
. . . however a suﬃcient number N of particles is required to
get decent acceptance rates.
SMC methods are considered expensive, but easy to
parallelize.
Applies to a broad class of models.
More sophisticated SMC and MCMC methods can be used,
and result in more sophisticated Particle MCMC methods.


SMC2

Outline




4 SMC2


SMC2

Our idea. . .

. . . was to use the same, very powerful “extended distribution”
framework, to build a SMC sampler instead of a MCMC algorithm.
Foreseen beneﬁts
to sample more eﬃciently from the posterior distribution
p(θ|y1:T ),
to sample sequentially from p(θ|y1 ), p(θ|y1 , y2 ), . . . p(θ|y1:T ).

and it turns out, it allows even a bit more.


SMC2

Idealized SMC sampler for SSM

Algorithm 5 Iterated Batch Importance Sampling
1: Sample from the prior θ(m) ∼ p(·) for m ∈ [1, Nθ ]
2: Set ω (m) ← 1
3: for t = 1 to T do
4: Compute ut (θ(m) ) = p(yt |y1:t−1 , θ(m) )
5: Update ω (m) ← ω (m) × ut (θ(m) )
6: if some degeneracy criterion is met then
7: Resample the particles, reset the weights ω (m) ← 1
8: Move the particles using a Markov kernel leaving the dis-
tribution invariant
9: end if
10: end for


SMC2

Valid SMC sampler for SSM ??

Plug in estimates
Similarly to PMCMC methods, we want to replace
p(yt |y1:t−1 , θ(m) ) with an unbiased estimate, and see what
happens.

SMC everywhere
We associate Nx x-particles to each of the Nθ θ-particles.


SMC2

Valid SMC sampler for SSM ??

Marginal likelihood
Remember, a side eﬀect of the SMC algorithm is that we can
approximate the incremental likelihood:
Nx
1 (i,m)
wt ≈ p(yt |y1:t−1 , θ(m) )
Nx
i=1

Move steps
Instead of simple MH kernels, use PMH kernels.


SMC2

Why does it work?

A simple idea. . .
. . . especially after the PMCMC article.

Still. . .
. . . some work had to be done to justify the validity of the
algorithm.

In short, it leads to a standard SMC sampler on a sequence of
extended distributions πt (proposition 1 of the article).


SMC2

Why does it work?

Additional notations
hn denotes the index history of xtn , that is, hn (t) = n, and
t t
n
htn (s) = aht (s+1) recursively, for s = t − 1, . . . , 1.
s
xn denotes a state trajectory ﬁnishing in xtn , that is:
1:t

hn (s)
xn (s) = xs t
1:t , for s = 1, . . . , t.


SMC2

Why does it work?

Here is what the distribution πt looks like:
1:N 1:Nx
πt (θ, x1:t x , a1:t−1 ) = p(θ|y1:t )
 
N
 N 
1 x p(xn |θ, y1:t )  x
 

1:t i
× t−1
q1,θ (x1 )
Nx Nx  
n=1  i=1
 n 

i=ht (1)
 

 t Nx


 i
as−1 i
as−1

i
× Ws−1,θ qs,θ (xs |xs−1 )

s=2 i=1 

 n

i=ht (s)


SMC2

Why does it work?

PMCMC move steps
These steps are valid because the PMCMC invariant distribution
πt deﬁned on
1:N 1:Nx
θ, k, x1:t x , a1:t−1
is such that πt is the marginal distribution of
1:N 1:Nx
θ, x1:t x , a1:t−1

with respect to πt .

(Sections 3.2, 3.3 of the article)


SMC2

Beneﬁts

Explicit form of the distribution
It allows to prove the validity of the algorithm, but also:
to get samples from p(θ, x1:t |y1:t ),
to validate an automatic calibration of Nx .


SMC2

Beneﬁts

Drawing trajectories
If for every θ-particle θ(m) one draws an index n (m) uniformly on
{1, . . . Nx }, then the weighted sample:
n (m),m
(ω m , θm , x1:t )m∈1:Nθ

follows p(θ, x1:t |y1:t ).

Memory cost
Need to store the x-trajectories, if one wants to make inference
about x1:t (smoothing).
If the interest is only on parameter inference (θ), ﬁltering (xt ) and
prediction (yt+1 ), no need to store the trajectories.


SMC2

Beneﬁts

Estimating functionals of the states
We have a test function h and want to estimate E [h(θ, x1:t )|y1:t ].
Estimator:
Nθ
1 n (m),m
Nθ
ω m h(θm , x1:t ).
m=1 ω m m=1
Rao–Blackwellized estimator:
Nθ Nx
1 n,m
Nθ
ωm Wt,θm h(θm , x1:t ) .
n
m
m=1 ω m=1 n=1

(Section 3.4 of the article)


SMC2

Beneﬁts

Evidence
The evidence of the data given the model is deﬁned as:
t
p(y1:t ) = p(ys |y1:s−1 )
s=1

And it can be used to compare models. SMC2 provides the
following estimate:
Nθ
ˆ 1
Lt = Nθ
ω m p (yt |y1:t−1 , θm )
ˆ
m
m=1 ω m=1



SMC2

Beneﬁts

Exchange importance sampling step
˜
Launch a new SMC for each θ-particle, with Nx x-particles. Joint
distribution:
˜ ˜
1:N 1:Nx
πt (θ, x1:t x , a1:t−1 )ψt,θ (˜1:tNx , ˜1:t−1 )
x 1: a1:Nx

Retain the new x-particles and drop the old ones, updating the
θ-weights with:
˜ ˜
˜ ˜
ˆ ˜1: a1:Nx
Zt (θ, x1:tNx , ˜1:t−1 )
exch
ut θ, x1:t x , a1:t−1 , x1:tNx , ˜1:t−1
1:N 1:Nx
˜1: a1:Nx =
ˆ
Zt (θ, x 1:Nx , a1:Nx )
1:t 1:t−1



SMC2

Warning

Plug in estimates
Not any SMC sampler can be turned into a SMC2 algorithm, by
replacing the exact weights with estimates: these have to be
unbiased. . . !!


SMC2

Warning

Example
For instance, if instead of using the sequence of distributions:

{p(θ|y1:t )}T
t=1

one wants to use the “tempered” sequence:

{p(θ|y1:T )γk }K
k=1

with γk an increasing sequence from 0 to 1, then one should ﬁnd
unbiased estimates of p(θ|y1:T )γk −γk−1 to plug into the idealized
SMC sampler.


SMC2

Numerical illustrations

Stochastic Volatility (sophisticated)
1/2
yt = µ + βvt + vt t ,t ≥ 1

iid iid
k ∼ Poi λξ 2 /ω 2 c1:k ∼ U(t, t + 1) ei:k ∼ Exp ξ/ω 2
k
zt+1 = e −λ zt + e −λ(t+1−cj ) ej
j=1
 
k
1
vt+1 = zt − zt+1 + ej 
λ
j=1

xt+1 = (vt+1 , zt+1 )


SMC2


1.0 800

700
8 0.8
600

Acceptance rates
6 0.6 500

Nx
400
4 0.4
300

2 0.2 200

100
0 0.0

200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000
Time Iterations Iterations

(a) (b) (c)

Figure: Squared observations (synthetic data set), acceptance rates, and
illustration of the automatic increase of Nx .


SMC2


T = 250 T = 500 T = 750 T = 1000
8

6
Density

4

2

0
−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0
µ

Figure: Concentration of the posterior distribution for parameter µ.


SMC2


Multifactor model

k1 k2
1/2
yt = µ+βvt +vt t +ρ1 e1,j +ρ2 e2,j −ξ(w ρ1 λ1 +(1−w )ρ2 λ2 )
j=1 j=1


SMC2


Evidence compared to the one factor model
variable
20 Multi factor without leverage
4 Multi factor with leverage

15

2

10

0

5

−2

100 200 300 400 500 600 700 100 200 300 400 500 600 700
Time Iterations

(a) (b)

Figure: S&P500 squared observations, and log-evidence comparison
between models (relative to the one-factor model).

SMC2


Athletics records model

2
g (yi,t |µt , ξ, σ)
g (y1:2,t |µt , ξ, σ) = {1 − G (y2,t |µt , ξ, σ)}
1 − G (yi,t |µt , ξ, σ)
i=1

xt = (µt , µt ) ,
˙ xt+1 | xt , ν ∼ N (Fxt , Q) ,
with
1 1 1/3 1/2
F = and Q = ν 2
0 1 1/2 1
−1/ξ
y −µ
G (y |µ, ξ, σ) = 1 − exp − 1 − ξ
σ +


SMC2


530

520

Times (seconds)
510

500

490

480

1980 1985 1990 1995 2000 2005 2010
Year

Figure: Best two times of each year, in women’s 3000 metres events
between 1976 and 2010.


SMC2


Motivating question
How unlikely is Wang Junxia’s record in 1993?

A smoothing problem
We want to estimate the likelihood of Wang Junxia’s record in
1993, given that we observe a better time than the previous world
record. We want to use all the observations from 1976 to 2010 to
answer the question.

Note
We exclude observations from the year 1993.


SMC2


Some probabilities of interest

y
pt = P(yt ≤ y |y1976:2010 )

= G (y |µt , θ)p(µt |y1976:2010 , θ)p(θ|y1976:2010 ) dµt dθ
Θ X

486.11 502.62 cond := p 486.11 /p 502.62 .
The interest lies in p1993 , p1993 and pt t t


SMC2


10−1

10−2

Probability

10−3

10−4

1980 1985 1990 1995 2000 2005 2010
Year

502.62
Figure: Estimates of the probability of interest (top) pt , (middle)
cond 486.11 2
pt and (bottom) pt , obtained with the SMC algorithm. The
y -axis is in log scale, and the dotted line indicates the year 1993 which
motivated the study.

SMC2

Conclusion

A powerful framework
The SMC2 framework allows to obtain various quantities of
interest, in a quite generic and “black-box” way.
It extends the PMCMC framework introduced by Andrieu,
Doucet and Holenstein.
A package is available:
http://code.google.com/p/py-smc2/.


SMC2

Acknowledgments

N. Chopin is supported by the ANR grant
ANR-008-BLAN-0218 “BigMC” of the French Ministry of
research.
P.E. Jacob is supported by a PhD fellowship from the AXA
Research Fund.
O. Papaspiliopoulos would like to acknowledge ﬁnancial
support by the Spanish government through a “Ramon y
Cajal” fellowship and grant MTM2009-09063.
The authors are thankful to Arnaud Doucet (University of British
Columbia) and to Gareth W. Peters (University of New South
Wales) for useful comments.


SMC2

Bibliography

SMC2 : A sequential Monte Carlo algorithm with particle Markov
chain Monte Carlo updates, N. Chopin, P.E. Jacob, O.
Papaspiliopoulos, submitted
Main references:
Particle Markov Chain Monte Carlo methods, C. Andrieu, A.
Doucet, R. Holenstein, JRSS B., 2010, 72(3):269–342
The pseudo-marginal approach for eﬃcient computation, C.
Andrieu, G.O. Roberts, Ann. Statist., 2009, 37, 697–725
Random weight particle ﬁltering of continuous time processes,
P. Fearnhead, O. Papaspiliopoulos, G.O. Roberts, A. Stuart,
JRSS B., 2010, 72:497–513
Feynman-Kac Formulae, P. Del Moral, Springer


Presentation MCB seminar 09032011

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (19)

Ähnlich wie Presentation MCB seminar 09032011

Ähnlich wie Presentation MCB seminar 09032011 (20)

Mehr von Pierre Jacob

Mehr von Pierre Jacob (18)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Presentation MCB seminar 09032011