Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Introduction to Sequential quasi-Monte Carlo - Mathieu Gerber, Aug 30, 2017
Sequential quasi-Monte Carlo (SQMC) is a quasi-Monte Carlo (QMC) version of sequential Monte Carlo (or particle filtering), a popular class of Monte Carlo techniques used to carry out inference in state space models. In this talk I will first review the SQMC methodology as well as some theoretical results. Although SQMC converges faster than the usual Monte Carlo error rate its performance deteriorates quickly as the dimension of the hidden variable increases. However, I will show with an example that SQMC may perform well for some "high" dimensional problems. I will conclude this talk with some open problems and potential applications of SQMC in complicated settings.
Ähnlich wie Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Introduction to Sequential quasi-Monte Carlo - Mathieu Gerber, Aug 30, 2017
Digital Signal Processing[ECEG-3171]-Ch1_L06Rediet Moges
Ähnlich wie Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Introduction to Sequential quasi-Monte Carlo - Mathieu Gerber, Aug 30, 2017 (20)
Micro-Scholarship, What it is, How can it help me.pdf
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics Opening Workshop, Introduction to Sequential quasi-Monte Carlo - Mathieu Gerber, Aug 30, 2017
1. Sequential Quasi-Monte Carlo: From the Curse of
Dimensionality to High-Dimensional Filtering
Problems?
Mathieu Gerber
University of Bristol, School of Mathematics
Based on joint works with Nicolas Chopin (ENSAE/CREST)
SAMSI Opening Workshop on QMC
August 30, 2017
2. In this talk
I am first going to describe the kind of integration problems SQMC
is designed to solve.
I will then show that SQMC suffers from the curse of
dimensionality.
I will finally give some evidence that SQMC may be useful in
high-dimensional settings and provide ideas to pursue the research
in this direction.
3. Starting point: QMC integration
We consider in this talk the problem of computing
I(f ) =
ˆ
[0,1]s
f (u)du
QMC integration approximates I(f ) by
IN(f ) =
1
N
N
n=1
f (un
) where u1:N
is a QMC point set
If u1:N is a scrambled net, and provided that f is sufficiently
smooth (Owen 1997)
Var(IN(f )) = O N−3
{log N}s−1
4. Dimension versus effective dimension
RQMC based integration methods may converge at a much
faster rate than O N−3{log N}s−1 if the effective dimension
of f is small.
Trivial example: f (u) = s
i=1 fi (ui ).
In general, “sophisticated” QMC based integration methods are
usually needed to take advantage of the low effective
dimension of f .
5. SQMC: Set-up
SQMC is designed to approximate high dimensional integrals I(f )
for f of the form
f (u0:t) = ϕ(u0:t)Qt(u0:t), u0:t = (u0, . . . , ut) ∈ [0, 1]d(t+1)
where Qt(u0:t) is the p.d.f. on [0, 1]d(t+1) defined by
Qt(u0:t) =
1
Zt
m0(u0)G0(u0)
t
s=1
ms(us|us−1)Gs(us−1, us),
with
m0(u0)du0 a probability measure on [0, 1]d
ms(us|us−1)dus a Markov kernel acting from [0, 1]d into itself
G0(u0) > 0 and Gs(us−1, us) > 0
Zt a normalizing constant.
Jargon: Qt(u)du is called a Feynman-Kac measure.
6. Motivation: Inference in state-space models
State-space models consider an unobserved Markov chain (xt)t≥0
x0 ∼ η0(x0)dx0, xt|xt−1 ∼ qt(xt|xt−1)dxt
taking values in X = [0, 1]d , and an observed process (yt)t≥0,
yt|xt ∼ gt(yt|xt)dyt.
Typically, we are interested in recovering p(xt|y0:t) (filtering
distribution) or p(x0:t|y0:t) (smoothing distribution)
Many applications in engineering (tracking), finance (stochastic
volatility), epidemiology, ecology, neurosciences, etc.
Remark: There is only little loss of generality of assuming that
X = [0, 1]d .
7. Feynman-Kac measure and state-space models
Taking e.g. m0(x0) = η0(x0) and
ms(xs|xs−1) = qs(xs|xs−1), Gs(xs−1, xs) := gs(ys|xs) s ≥ 1
we see that Qt(x0:t)dx0:t = p(x0:t|y0:t)dx0:t.
Computing I(f ) with f = ϕ(u)Qt(u) amounts to computing the
smoothing expectation
Qt(ϕ) := E[ϕ(x0:t)|y0:t].
Important particular case: When ϕ(u) = ϕ(ut) computing I(f )
amounts to computing the filtering expectation
Qt(ϕ) := E[ϕ(xt)|y0:t].
Remark: This is an integration problem of dimension s = d(t + 1).
8. The Monte Carlo solution: Sequential Monte Carlo (or
particle filtering)
Recall that our goal is to compute
Qt(ϕ) =
ˆ
[0,1]d(t+1)
ϕ(ut)Qt(ut)dut
which is a high dimensional problem.
However, this integral can be “efficiently” computed thanks to the
following recursive property of Qt:
Qt(ut) =
1
lt
ˆ
[0,1]d
mt(ut|ut−1)Gt(ut−1, ut)Qt−1(ut−1)dut−1
Monte Carlo algorithms used to approximate this kind of integrals
are known as sequential Monte Carlo samplers (or particle filters).
9. Sequential Monte Carlo
Operations must be be performed for all n ∈ 1 : N.
At time 0,
(a) Generate xn
0 ∼ m0(dx0)
(b) Compute W n
0 = G0(xn
0)/ N
m=1 G0(xm
0 )
Recursively, for time t = 1 : T,
(a) Generate an
t−1 ∼ M(W 1:N
t−1), the multinomial
distribution that produces outcome m with probability
W m
t−1 [resampling step]
(b) Generate xn
t ∼ mt(x
an
t−1
t−1 , dxt) [mutation step]
(c) Compute W n
t = Gt(x
an
t−1
t−1 , xn
t )/ N
m=1 Gt(x
am
t−1
t−1 , xm
t )
Output at time t ≥ 0:
QN
t (dxt) :=
N
n=1
W n
t δxn
t
(dxt) ≈ Qt(dxt).
10. Why do I mean by “efficient”
Time uniform bound:
sup
t≥0
sup
ϕ∈F
E |QN
t (ϕ) − Qt(ϕ)|p 1/p
≤
C
N1/2
.
Central limit theorem:
√
N QN
t (ϕ) − Qt(ϕ) ⇒ N1(0, σ2
t,ϕ).
Law of large numbers
etc...
See the book by Del Moral (2004).
11. Sequential quasi-Monte Carlo
SQMC is a QMC version of SMC.
Each iteration is based on a QMC point set of dimension
d + 1, where the first component is use for the resampling step
and the remaining ones for the mutation.
The resampling step of SQMC requires to sort the particles
using the Hilbert space-filling curve H : [0, 1] → [0, 1]d .
The cost of SQMC is O(N log N).
We notably show that for SQMC based on scrambled nets is
such that
MSE QN
t (ϕ) = O(N−1
), ∀ϕ ∈ Cb((0, 1)d
)
Related approach: Array-RQMC of L’Ecuyer et al. (2006)
12. An example: Stochastic volatility model
Model is
yt = S
1
2
t t
xt = µ + Φ(xt−1 − µ) + Ψ
1
2 νt
with correlated noise terms: ( t, νt) ∼ N2d (0, C) and where
St = diag(exp(xt1), · · · , exp(xtd )).
Parameters are set to their true value and we compare SQMC with
SMC for the estimation of the log-likelihood function (i.e. log ZT ).
SQMC is implemented using nested scrambled Sobol’ sequences as
input.
13. Simulation results for d ∈ {1, 2, 4, 10}
d=1
d=2
d=4
d=10
100
101
102
103
104
102
103
104
105
Number of particles ( log10 scale)
Gainfactor(log10scale)
Log-likelihood evaluation (based on T = 400 data points and 200 independent
SMC and SQMC runs). Remark: Integration is on a space of dimension dT.
14. SQMC and the curse of dimensionality: regularity of the
Hilbert curve
The Hilbert curve is Hölder continuous with Hölder exponent 1/d.
Hence, as d increases the smoothness of the curve deteriorates very
quickly.
We have recently established that
Var
1
N
N
n=1
ϕ(x
an
t−1
t−1 , ) x1:N
t−1 ≤
Cd
N1+ 1
d
.
Key message: the dimension of the resampling step has a major
impact on the performance of SQMC.
Remark: 1/d is the best possible exponent for a continuous
measure preserving mapping f : [0, 1] → [0, 1]d (Jaffard and
Nicolay, 2006)
15. Toward a new implementation of SQMC
The key limitation of SQMC when d increases is its resampling step
that introduces a noise of size N−1−1/d .
Natural question: Can we come up with an other implementation
to bypass this problem?
16. Toward a new implementation of SQMC
The key limitation of SQMC when d increases is its resampling step
that introduces a noise of size N−1−1/d .
Natural question: Can we come up with an other implementation
to bypass this problem?
Good news: Yes, it is possible to implement SQMC such that only
univariate resampling steps are needed.
17. Toward a new implementation of SQMC
The key limitation of SQMC when d increases is its resampling step
that introduces a noise of size N−1−1/d .
Natural question: Can we come up with an other implementation
to bypass this problem?
Good news: Yes, it is possible to implement SQMC such that only
univariate resampling steps are needed.
Bad news: This increases the running time from O(N log N) to
O(N2).
18. Toward a new implementation of SQMC
The key limitation of SQMC when d increases is its resampling step
that introduces a noise of size N−1−1/d .
Natural question: Can we come up with an other implementation
to bypass this problem?
Good news: Yes, it is possible to implement SQMC such that only
univariate resampling steps are needed.
Bad news: This increases the running time from O(N log N) to
O(N2).
Good news: As explained below, this quadratic costs is not an issue
when dealing with high-dimensional state-space models.
19. Filtering in high-dimensional spaces
Particle filters suffer from the curse of dimensionality because they
rely on importance sampling.
To perform particle filtering in high-dimensional spaces we need
That the model has some special structure (low “effective
dimension”).
To create algorithms able to exploit this special structure
We focus below on the algorithm proposed by Beskos et al. (2014)/
Naesseth et al. (2016), which is one of the two known particle filter
algorithms whose error is stable w.r.t. d (for some models).
20. PF of Beskos et al. (2014), Naesseth et al. (2016)
The basic idea is to incorporate information coming from
observations yt progressively as we sample the components of xt.
To this end, at each time step and for each particles we run an
internal particle filter based on M ≥ 1 particles.
Each internal particle filter aims at approximating the “optimal”
proposal distribution mopt
t (xt−1, dxt), where M controls for the
quality of the approximation.
However, for any M ≥ 1 the algorithm is valid in the sense that it
converges at any time t to the filtering distribution as N → +∞.
Each step of the algorithm costs O(NM2d2) operations; that is,
the cost of the internal filters is quadratic in M.
21. PF in high-dimension: A naive SQMC version
To get some insights about why SQMC may be useful to solve
high-dimensional filtering problems we compare below the original
algorithm with a version where SQMC is used insight the internal
filters.
The external filter is a plain Monte Carlo filter and thus we cannot
hope that the variance converges faster than N−1.
The idea is that with QMC a smaller value of M is needed to get a
“good” approximation of the “optimal” proposal distribution
mopt
t (xt−1, dxt).
22. Toy example: Linear Gaussian model
We consider the following model:
xt =
1
2
xt−2 + t, t ∼ f ( t)d t
yt = xt + νt, νt ∼ Nd (0, σ2
Id )
where
f ( ) ∝ exp −
τ
2
d
i=1
2
t,i −
λ
2
d
i=2
( t,i − t,i−1)2
.
In this case, each particle internal particle filter amounts to running
a particle filter in dimension 1.
23. Linear Gaussian model: Simulation results (d = 256)
0.0025
0.0050
0.0075
0.0100
0 5 10 15 20 25
Time
Variance
Variances
Without QMC
With QMC
MAP variance comparison
Estimation of E[xt,1|y0:t]. We take N = 100 and M = 32 in the
simulations.
24. A more interesting example: Spatio-temporal model
Xt,1
Xt,2
Xt,15
Xt,16
Xt,4
Xt,3
Xt,14
Xt,13
Xt,5
Xt,8
Xt,9
Xt,12
Xt,6
Xt,7
Xt,10
Xt,11
25. Spatio-temporal model: Some remarks
Because of the complex dependence structure among the
components of xt the internal filters are not classical particle filters.
For the internal filters we use the algorithm proposed by Lindsten et
al. (2017), as in Naesseth et al. (2016).
In the QMC version, SQMC is used only in the first step of the
internal filters.
26. Spatio-temporal model: Simulation results (d = 64)
0.000
0.005
0.010
0.015
0.020
0 10 20 30
Time
Variance
colour
QMC
Without QMC
Estimation of E[xt,1|y0:t ]: We take N = 50 and M = 32 in the simulations
27. Variance reduction v.s. running time reduction
The use of SQMC allows to reduce the variance in these
high-dimensional filtering problems.
The variance reductions brought by SQMC is not impressive.
However, each iteration of this algorithm costs O(NM2d).
Roughly speaking, reducing the variance by 2 with Monte Carlo
would require to double N and thus to increase the cost by a factor
O(M2d).
For the Gaussian model M2d = 262 144 while, for the
spatio-temporal model, M2d = 65 536.
In both cases, reducing the variance by a factor 2 significantly
reduces the running time.
28. SMC in high-dimension and the modified SQMC algorithm
To break the d-dimensional resampling step the
aforementioned modified implementation of SQMC moves
from xt−1 to xt component by component.
29. SMC in high-dimension and the modified SQMC algorithm
To break the d-dimensional resampling step the
aforementioned modified implementation of SQMC moves
from xt−1 to xt component by component.
To break the curse of dimensionality due to importance
resampling, particle filtering in high-dimension requires to
incorporate information coming from yt progressively as we
sample the components of xt
30. SMC in high-dimension and the modified SQMC algorithm
To break the d-dimensional resampling step the
aforementioned modified implementation of SQMC moves
from xt−1 to xt component by component.
To break the curse of dimensionality due to importance
resampling, particle filtering in high-dimension requires to
incorporate information coming from yt progressively as we
sample the components of xt
Hence, what SQMC is ideal for SQMC is needed to perform
particle filtering in high-dimension!!
31. S(Q)MC in high-dimension: A new idea
One can used the aforementioned implementation of SMC/SQMC
for the internal filters.
This resulting algorithm
1. Seems to have some clear advantages over the Beskos et al.
(2014)/ Naesseth et al. (2016) algorithm.
2. Seems SQMC friendly in the sense that it could be
implemented so that only 1-dimensional resampling steps are
needed.
3. Is such that QMC can be efficiently used for both the internal
and external filters (plain QMC algorithm).
4. Cost O(NM2d), as the Beskos et al. (2014)/ Naesseth et al.
(2016) algorithm.
32. Some questions
Some practical questions:
How does this algorithm performs in practise (MC and QMC).
For the QMC version there is a trade-off between the
dimension of the resampling steps and the dimension of the
QMC point sets used as input. What is a good choice?
For the spatio-temporal model different implementations are
possible. What is a good choice?
Some theoretical questions
Theoretical validity for any fixed M ≥ 1?
Stability as d increases? Can we borrow the results of Beskos
et al. (2014) to say something about this?
Convergence rate?
33. Conclusion
SQMC is a QMC version of particle filtering that converges faster
than N−1/2.
In practice, we observe that SQMC
1. Converges faster than SMC when d is small (let’s say
d = 1, 2, 3).
2. Yields important gains in term of running time when d is large
so that “high-dimensional” particle filters have to be used.
3. Is, in general, not so useful for moderate values of d.
2. is probably the most interesting application of SQMC and I
proposed in this talk an idea to pursue the research in this
direction.
Beyond the QMC motivation, the development of Monte Carlo
particle filters to solve high-dimensional filtering problems is of
great interest.
34. QMC and computational statistics
Most people that work on computational statistics do not believe in
QMC.
I think that the main reasons are:
People that work in statistics care about variance reductions
but only up to a certain level.
Existing successful applications of QMC to solve statistical
problems are all for problems that are considered as
1. Easy (and thus not very exciting....)
2. Solved (see my first point)
High-dimensional particle filtering is both a complicated and an
unsolved problem and could potentially be a good problem to
convince statisticians that QMC is actually useful...