SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Stratified Monte Carlo for fast ABC
using resampling
Umberto Picchini
@uPicchini
Chalmers University of Technology
and University of Gothenburg
Sweden
Joint work1 with Richard Everitt (Reading, UK).
I am going to talk of ABC-MCMC and specifically pseudomarginal
ABC-MCMC.
The goal is to accelerate this (typically) expensive procedure using
resampling techniques.
Resampling induces a bias and the resulting posterior has (too)
large variance.
We reduce bias using stratified Monte Carlo.
1
P and Everitt (2019) Stratified sampling and resampling for approximate
Bayesian computation, arXiv:1905.07976.
Umberto Picchini, @uPicchini 2/26
• We are interested in Bayesian inference for parameters θ of a
model having an intractable likelihood function;
• that is the likelihood p(xobs|θ) for data xobs is analytically
unavailable;
• however we assume the ability to simulate pseudo-data x∗
from a simulator of a stochastic model.
• this is the same as writing x∗ ∼ p(x|θ);
• we use an ABC approach (approximate Bayesian computation);
• the key idea in ABC is to accept parameters θ∗ generating
x∗ ≈ xobs, i.e. ||x∗ − xobs|| < δ;
• the above is very inefficient (super-small acceptance
probability);
• typically much better to introduce low-dimensional summary
statistics S(·) so that sobs ≡ S(xobs), s∗ ≡ S(x∗);
Umberto Picchini, @uPicchini 3/26
ABC rejection sampler
This is the most basic (inefficient) ABC sampler.
1 Sample from the prior θ∗ ∼ p(θ)
2 Plug θ∗ into the simulator, simulate x∗ ∼ p(x|θ∗)
3 compute summary stats S(x∗)
4 accept θ∗ if ||S(x∗) − S(xobs)|| < δ
5 go back to 1 and repeat
The collection of accepted θ∗ is an ensemble of draws from
approximate posterior πδ(θ|S(xobs)).
This is super inefficient (due to proposing from the prior)
Better samplers rely on SMC or MCMC, where θ∗ ∼ q(θ|θ ).
Comprehensive monography (many chapters on arxiv):
Sisson, Fan, Beaumont. (2018). Handbook of approximate Bayesian
computation. Chapman and Hall/CRC.
Umberto Picchini, @uPicchini 4/26
Constructing appropriate summary statistics opens a pandora box
of additional issues I am not going to talk about.2
Let’s just assume we have some “informative” summaries for θ.
This way we can sample from the approximate posterior
πδ(θ|sobs) ∝ π(θ) I{||s∗−sobs||<δ}p(s∗
|θ)ds∗
We have that
πδ(θ|sobs) → π(θ|sobs) (δ → 0)
Rather unrealistic to assume S(·) a sufficient statistics, but if that
happens to be the case:
πδ(θ|sobs) ≡ πδ(θ|xobs)
2
A review is Prangle, D. (2015). Summary statistics in approximate
Bayesian computation. arXiv:1512.05633.
Umberto Picchini, @uPicchini 5/26
More in general, in place of the indicator funct. we can consider a
kernel funct Kδ(s∗, sobs) and write
πδ(θ|sobs) ∝ π(θ) Kδ(s∗
, sobs)p(s∗
|θ)ds∗
ABC likelihood
for example we can use a Gaussian kernel
Kδ(s∗
, sobs) ∝ exp(−
1
2δ2
(s∗
− sobs) Σ−1
(s∗
− sobs))
Other kernels are possible, e.g. Epanechnikov’s kernel.
Umberto Picchini, @uPicchini 6/26
The ABC likelihood: Kδ(s∗, sobs)p(s∗|θ)ds∗.
This can trivially be approximated unbiasedly via Monte Carlo as3
Kδ(s∗
, sobs)p(s∗
|θ)ds∗
≈
1
M
M
r=1
Kδ(s∗r
, sobs), s∗r
∼
iid
p(s∗
|θ).
and plugged in a Metropolis-Hastings ABC-MCMC algorithm,
proposing a move θ∗ ∼ q(θ|θ#), and accepting with probability
1 ∧
1
M
M
r=1 Kδ(s∗r, s)
1
M
M
r=1 Kδ(s#r, s)
·
π(θ∗)
π(θ#)
·
q(θ#|θ∗)
q(θ∗|θ#)
Problem: if the model simulator is computationally expensive,
having M large is unfeasible.
3
Lee, Andrieu, Doucet (2012): Discussion of Prangle 2012 JRSS-B.
Umberto Picchini, @uPicchini 7/26
So we have the approximated ABC posterior (up to a constant)
πδ(θ|sobs) ≈ π(θ) ·
1
M
M
r=1
Kδ(s∗r
, sobs)
unbiased
, s∗r
∼
iid M times
p(s∗
|θ).
• No matter the value of M, the ABC-MCMC will sample exactly
from πδ(θ|sobs), because of the unbiased likelihood estimator;
• this makes the algorithm an instance of pseudomarginal MCMC
[Andrieu, Roberts 2009]4
• Typically ABC-MCMC is computationally intensive and a small M
is chosen, say M = 1;
• the lower the M the higher the variance of the estimate of the ABC
likelihood Kδ(s∗
, s)p(s∗
|θ)ds∗
• the larger the variance the worse the mixing of the chain (due to
occasional overestimation of the likelihood).
4
Andrieu, C., and Roberts, G. O. (2009). The pseudo-marginal approach for
efficient Monte Carlo computations. The Annals of Statistics, 37(2), 697-725.
Umberto Picchini, @uPicchini 8/26
Dilemma:
a small M will decrease the runtime considerably, however it will
increase the chance to overestimate the likelihood → possibly
high-rejection rates.
Question: is it worth to have M > 1 to reduce the variance of the
ABC likelihood given the higher computational cost?
Bornn et al 20175 found that no, it is not worth and M = 1 is
just fine (when using a uniform kernel).
Basically using M = 1 is so much faster to run that the decreased
variance obtained with M > 1 is not worth given the higher
computational cost.
5
L. Bornn, N. S. Pillai, A. Smith, and D. Woodard. The use of a single
pseudo-sample in approximate Bayesian computation. Statistics and
Computing, 27(3):583590, 2017.
Umberto Picchini, @uPicchini 9/26
Data resampling
In a similar context (based on synthetic likelihood approaches) Everitt
20176
used the following approach:
At any proposed θ
• simulate say M = 1 datasets x∗
∼ p(x|θ);
• sample with replacement from elements in x∗
to obtain a
resampled dataset (with dimension dim(xobs));
• repeat the resampling to obtain x∗1
, ..., x∗R
resampled datasets
from x∗
;
• compute the summaries s∗1
, ..., s∗R
for each resampled dataset;
Cheap compared to producing M independent summaries from the
model, when the simulator is computationally intensive.
This reduces the variance of the ABC likelihood compared to using
M = 1 without resampling.
6
Everitt (2017). Bootstrapped synthetic likelihood. arXiv:1711.05825.
Umberto Picchini, @uPicchini 10/26
The problem with using the bootstrapped procedure within ABC is
that it bias the estimation of the ABC likelihood considerably.
Example: data is 1000 iid observations from N(θ = 0, 1).
Set Gaussian prior on θ → known analytic posterior.
• Left: pseudo-marginal ABC with M = 100 independent
datasets and sufficient S(xobs) = ¯xobs;
• Right: M = 1 and R = 100 resampled datasets
Umberto Picchini, @uPicchini 11/26
Stratified Monte Carlo
Stratified Monte Carlo is a variance reduction technique.
In full generality: want to approximate
µ =
D
f(x)p(x)dx
over some space D, for some function f and density (or probability
mass) function p.
Now partition D into J “strata” D1, ..., DJ :
• ∪J
j=1Dj = D
• Dj ∩ Dj = ∅, j = j
Umberto Picchini, @uPicchini 12/26
Samples from bivariate N2(0, I2)
6 concentric rings and 7 equally probable strata.
Each stratum has exactly 3 points sampled from within it.
Better to oversample from the most important slices, where the
integrand has higher mass.
Umberto Picchini, @uPicchini 13/26
Ideally the statistician should decide how many Monte Carlo draws to
sample from each stratum Dj.
• Call this number ˜nj;
• define ωj := P(X ∈ Dj)
Probabilities ωj should be known.
Then we approximate µ = D
f(x)p(x)dx with
ˆµstrat =
J
j=1
ωj
˜nj
x∗∈Dj
f(x∗
) , x∗
∼ p(x|x ∈ Dj)
This is the (unbiased) stratified MC estimator.
Variance reduction compared to vanilla MC estimator can be obtained if
we know how many ˜nj to sample from each stratum (e.g “proportional
allocation method”7
)
7
Art Owen (2013), Monte Carlo theory, methods and examples.
Umberto Picchini, @uPicchini 14/26
However in our settings we can’t assume ability to simulate
from within a given stratum; so we can’t decide ˜nj.
And we can’t assume to know anything about ωj := P(X ∈ Dj).
We use a “post stratification” approach (e.g. Owen 2013)8
• first generate many x∗ ∼ p(x) (i.e. from the model simulator);
• count the number of x∗ ending up in each stratum Dj;
• call these frequencies nj;
So these frequencies are known after the simulation is done, not
before.
However we still do not know anything about the ωj = P(X ∈ Dj).
We are going to address this soon within an ABC framework.
8
Art Owen: “Monte Carlo theory, methods and examples” 2013.
Umberto Picchini, @uPicchini 15/26
Define strata for ABC
Suppose we have an ns-dimensional summary, i.e. ns = dim(sobs)
and consider the Gaussian kernel
Kδ(s∗
, sobs) =
1
δns
exp −
1
2δ2
(s∗
− sobs) Σ−1
(s∗
− sobs) .
In ABC the µ to approximate via stratified MC is the likelihood
D
Kδ(s∗
, sobs)p(s∗
|θ)ds∗
So lets partition D...
Umberto Picchini, @uPicchini 16/26
Define strata for ABC
Example to define three strata:
• D1 = {s∗ s.t. s∗ − sobs < δ/2}
• D2 = {s∗ s.t. s∗ − sobs < δ}D1
• D3 = D{D1 ∪ D2}
And more explicitly:
• D1 = {s∗ s.t. (s∗ − sobs) Σ−1(s∗ − sobs) ∈ (0, δ/2]}
• D2 = {s∗ s.t. (s∗ − sobs) Σ−1(s∗ − sobs) ∈ (δ/2, δ]}
• D3 = {s∗ s.t. (s∗ − sobs) Σ−1(s∗ − sobs) ∈ (δ, ∞)}.
Because of our resampling approach, for every θ we have R 1
simulated summaries, say R = 100.
We just need to count how many summaries fall into D1
instead of D2 instead of D3.
This give us n1, n2 and n3 = R − (n1 + n2).
Umberto Picchini, @uPicchini 17/26
How about the strata probabilities?
We still need to estimate the strata probabilities ωj = P(s∗ ∈ Dj).
This is easy because ωj = Dj
p(s∗|θ)ds∗ which we estimate by
another MC simulation.
So
1 simulate once from the model x∗ ∼ p(x|θ)
2 resample R times from x∗ to obtain x∗1, ..., x∗R
3 compute summaries s∗1, ..., s∗R
4 obtain distances dr := (s∗r − sobs) Σ−1(s∗r − sobs)
ˆω1 :=
1
R
R
r=1
I{dr≤δ/2}, ˆω2 :=
1
R
R
r=1
I{δ/2<dr≤δ},
ˆω3 := 1 −
2
j=1
ˆωj.
Umberto Picchini, @uPicchini 18/26
We finally have a (biased) estimator of the ABC likelihood using J
strata:
ˆˆµstrat =
J
j=1
ˆωj
nj
r∈Dj
Kδ(s∗r
, sobs) ,
Bias dues both to resampling and stratification with estimated ωj.
Notice the above is not quite ok. What if some nj = 0?
(neglected stratum)
In our ABC-MCMC we reject proposal θ∗ as soon as nj = 0, so
we actually use
ˆˆµstrat =
J
j=1
ˆωj
nj
r∈Dj
Kδ(s∗r
, sobs) I{nj>0,∀j}
Umberto Picchini, @uPicchini 19/26
Stratified MC within ABC-MCMC
As usual, we accept a proposal using a MH step:
propose θ∗ ∼ q(θ|θ#) and accept with probability
1 ∧
ˆˆµstrat(θ∗)
ˆˆµstrat(θ#)
·
π(θ∗)
π(θ#)
·
q(θ#|θ∗)
q(θ∗|θ#)
if we accept, set: θ# := θ∗ and ˆˆµstrat(θ#) := ˆˆµstrat(θ∗).
Repeat a few thousands of times.
Umberto Picchini, @uPicchini 20/26
Reprising the Gaussian example
This is a super-trivial study, but it is still instructive.
Data: 1000 iid observations ∼ N(θ = 0, 1). Gaussian prior →
exact posterior
Red: exact posterior.
Blue: different types of ABC-MCMC posteriors.
-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2
0
2
4
6
8
10
12
14
(a) pseudomarginal ABC,
M = 100
-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2
0
2
4
6
8
10
12
14
(b) M = 1 and R = 100
resamples
-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2
0
2
4
6
8
10
12
14
(c) M = 1, R = 100 and
stratification
With stratification and only M = 1 we get results as good as with
M = 100 (compare left and right).
Umberto Picchini, @uPicchini 21/26
Time series
Our methodology is not restricted to iid data.
Next example, uses “block bootstrap”9, where we resample blocks
of observations for stationary time-series.
• blocks are chosen to be sufficiently large such that they retain
the short range dependence structure of the data;
• so that a resampled time series, constructed by concatenating
resampled blocks, has similar statistical properties to real data
B =



(1 : B)
block1
, (B + 1 : 2B)
block2
, ..., (nobs − B + 1 : nobs)
block
nobs
B



.
At each θ we resample blocks of indeces of simulated data.
9
Kunsch, H. R. (1989). The jackknife and the bootstrap for general
stationary observations. The Annals of Statistics, 1217-1241.
Umberto Picchini, @uPicchini 22/26
2D Lotka-Volterra time series
A predator-prey model with an intractable likelihood (Markov
jump process).
Two interacting species: X1 (# predators) and X2 (# prey).
Populations evolve according to three interactions:
• A prey may be born, with rate θ1X2, increasing X2 by one.
• The predator-prey interaction in which X1 increases by one
and X2 decreases by one, with rate θ2X1X2.
• A predator may die, with rate θ3X1, decreasing X1 by one.
Its solution may be simulated exactly using the “Gillespie
algorithm”10.
10
Gillespie, D. T. (1977). Exact stochastic simulation of coupled chemical
reactions. The journal of physical chemistry, 81(25), 2340-2361.
Umberto Picchini, @uPicchini 23/26
We have 32 observations for each species simulated via Gillespie’s
algorithm.
At each θ we simulate and resample 4 blocks each having size
B = 8.
We want inference for reaction rates (θ1, θ2, θ3).
Set vague priors log θj ∼ U(−6, 2)
Umberto Picchini, @uPicchini 24/26
We run several experiments:
• standard ABC-MCMC with M = 1 indep. datasets for each θ;
• ABC-MCMC with M = 1, R = 100 resampled datasets and allocate
across three strata;
• D1 = {s∗
s.t. distance ∈ (0, δ/2)}
• D2 = {s∗
s.t. distance ∈ (δ/2, δ]}
• D2 = {s∗
s.t. distance ∈ (δ, ∞)}
θ1 θ2 θ3 IAT accept. rate
(%)
true parameters 1 0.005 0.6
standard ABC (M = 1) 1.011 [0.93,1.13] 0.005 [0.0046,0.0055] 0.575 [0.504,0.627] 145 2.5
stratified ABC (R = 100) 0.989 [0.88,1.11] 0.005 [0.0044,0.0056] 0.577 [0.479,0.668] 114 6.5
Table: Mean and 95% posterior intervals for θ.
For stratified ABC we used a δ 5-times larger than for standard ABC!
With stratification: similar inference, but better acceptance rate and
lower IAT.
Umberto Picchini, @uPicchini 25/26
Conclusions
• stratified Monte Carlo is straightforward to implement and
effective in reducing resampling bias;
• Allows for precise ABC while using a larger δ;
• smaller variance ABC likelihood → better mixing MCMC;
• Downside: neglected strata may increase rejection rate;
• more research needed for constructing optimal strata;
• Ongoing work: comments most welcome!
• more examples at:
P and Everitt (2019). Stratified sampling and resampling for
approximate Bayesian computation,
https://arxiv.org/abs/1806.05982
picchini@chalmers.se
@uPicchini
Umberto Picchini, @uPicchini 26/26
ABC loglikelihoods for Gaussian example
-0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 0.1
-1
0
1
2
3
4
5
6
7
8
standard ABC M=100 resampling M=1, R=100 stratification M=1, R=100
Figure: 1D Gauss model: loglikelihood estimated via standard ABC (red),
resampling ABC (blue), resampling + stratification (magenta). Solid lines are
mean values over 500 estimations. Dashed lines are 2.5 and 97.5 percentiles.
— — Umberto Picchini, @uPicchini 1/1

Weitere ähnliche Inhalte

Was ist angesagt?

ABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsChristian Robert
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015Christian Robert
 
ABC short course: final chapters
ABC short course: final chaptersABC short course: final chapters
ABC short course: final chaptersChristian Robert
 
A nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formulaA nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formulaAlexander Litvinenko
 
Markov chain monte_carlo_methods_for_machine_learning
Markov chain monte_carlo_methods_for_machine_learningMarkov chain monte_carlo_methods_for_machine_learning
Markov chain monte_carlo_methods_for_machine_learningAndres Mendez-Vazquez
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forestsChristian Robert
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapterChristian Robert
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationChristian Robert
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceChristian Robert
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsChristian Robert
 

Was ist angesagt? (20)

ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
ABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified models
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
SIAM CSE 2017 talk
SIAM CSE 2017 talkSIAM CSE 2017 talk
SIAM CSE 2017 talk
 
ABC short course: final chapters
ABC short course: final chaptersABC short course: final chapters
ABC short course: final chapters
 
A nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formulaA nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formula
 
Markov chain monte_carlo_methods_for_machine_learning
Markov chain monte_carlo_methods_for_machine_learningMarkov chain monte_carlo_methods_for_machine_learning
Markov chain monte_carlo_methods_for_machine_learning
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forests
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapter
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 
Georgia Tech 2017 March Talk
Georgia Tech 2017 March TalkGeorgia Tech 2017 March Talk
Georgia Tech 2017 March Talk
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]
 
Tulane March 2017 Talk
Tulane March 2017 TalkTulane March 2017 Talk
Tulane March 2017 Talk
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
MCQMC 2016 Tutorial
MCQMC 2016 TutorialMCQMC 2016 Tutorial
MCQMC 2016 Tutorial
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 

Ähnlich wie Stratified Monte Carlo for fast ABC using resampling

Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...Umberto Picchini
 
Approximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelApproximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelMatt Moores
 
ABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsUmberto Picchini
 
Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Umberto Picchini
 
Guided sequential ABC schemes for simulation-based inference
Guided sequential ABC schemes for simulation-based inferenceGuided sequential ABC schemes for simulation-based inference
Guided sequential ABC schemes for simulation-based inferenceUmberto Picchini
 
Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxHaibinSu2
 
Introduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov ChainsIntroduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov ChainsUniversity of Salerno
 
Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Umberto Picchini
 
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...Huang Po Chun
 
My data are incomplete and noisy: Information-reduction statistical methods f...
My data are incomplete and noisy: Information-reduction statistical methods f...My data are incomplete and noisy: Information-reduction statistical methods f...
My data are incomplete and noisy: Information-reduction statistical methods f...Umberto Picchini
 
Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...
Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...
Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...Alexander Litvinenko
 
Confidence Intervals––Exact Intervals, Jackknife, and Bootstrap
Confidence Intervals––Exact Intervals, Jackknife, and BootstrapConfidence Intervals––Exact Intervals, Jackknife, and Bootstrap
Confidence Intervals––Exact Intervals, Jackknife, and BootstrapFrancesco Casalegno
 
Firefly exact MCMC for Big Data
Firefly exact MCMC for Big DataFirefly exact MCMC for Big Data
Firefly exact MCMC for Big DataGianvito Siciliano
 

Ähnlich wie Stratified Monte Carlo for fast ABC using resampling (20)

Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...
 
Approximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelApproximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts model
 
ABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space models
 
Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...
 
Guided sequential ABC schemes for simulation-based inference
Guided sequential ABC schemes for simulation-based inferenceGuided sequential ABC schemes for simulation-based inference
Guided sequential ABC schemes for simulation-based inference
 
Talk 5
Talk 5Talk 5
Talk 5
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptx
 
Introduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov ChainsIntroduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov Chains
 
Input analysis
Input analysisInput analysis
Input analysis
 
17_monte_carlo.pdf
17_monte_carlo.pdf17_monte_carlo.pdf
17_monte_carlo.pdf
 
Absorbing Random Walk Centrality
Absorbing Random Walk CentralityAbsorbing Random Walk Centrality
Absorbing Random Walk Centrality
 
Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)
 
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
 
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
 
My data are incomplete and noisy: Information-reduction statistical methods f...
My data are incomplete and noisy: Information-reduction statistical methods f...My data are incomplete and noisy: Information-reduction statistical methods f...
My data are incomplete and noisy: Information-reduction statistical methods f...
 
Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...
Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...
Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...
 
Confidence Intervals––Exact Intervals, Jackknife, and Bootstrap
Confidence Intervals––Exact Intervals, Jackknife, and BootstrapConfidence Intervals––Exact Intervals, Jackknife, and Bootstrap
Confidence Intervals––Exact Intervals, Jackknife, and Bootstrap
 
A bit about мcmc
A bit about мcmcA bit about мcmc
A bit about мcmc
 
Firefly exact MCMC for Big Data
Firefly exact MCMC for Big DataFirefly exact MCMC for Big Data
Firefly exact MCMC for Big Data
 

Kürzlich hochgeladen

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 

Kürzlich hochgeladen (20)

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 

Stratified Monte Carlo for fast ABC using resampling

  • 1. Stratified Monte Carlo for fast ABC using resampling Umberto Picchini @uPicchini Chalmers University of Technology and University of Gothenburg Sweden
  • 2. Joint work1 with Richard Everitt (Reading, UK). I am going to talk of ABC-MCMC and specifically pseudomarginal ABC-MCMC. The goal is to accelerate this (typically) expensive procedure using resampling techniques. Resampling induces a bias and the resulting posterior has (too) large variance. We reduce bias using stratified Monte Carlo. 1 P and Everitt (2019) Stratified sampling and resampling for approximate Bayesian computation, arXiv:1905.07976. Umberto Picchini, @uPicchini 2/26
  • 3. • We are interested in Bayesian inference for parameters θ of a model having an intractable likelihood function; • that is the likelihood p(xobs|θ) for data xobs is analytically unavailable; • however we assume the ability to simulate pseudo-data x∗ from a simulator of a stochastic model. • this is the same as writing x∗ ∼ p(x|θ); • we use an ABC approach (approximate Bayesian computation); • the key idea in ABC is to accept parameters θ∗ generating x∗ ≈ xobs, i.e. ||x∗ − xobs|| < δ; • the above is very inefficient (super-small acceptance probability); • typically much better to introduce low-dimensional summary statistics S(·) so that sobs ≡ S(xobs), s∗ ≡ S(x∗); Umberto Picchini, @uPicchini 3/26
  • 4. ABC rejection sampler This is the most basic (inefficient) ABC sampler. 1 Sample from the prior θ∗ ∼ p(θ) 2 Plug θ∗ into the simulator, simulate x∗ ∼ p(x|θ∗) 3 compute summary stats S(x∗) 4 accept θ∗ if ||S(x∗) − S(xobs)|| < δ 5 go back to 1 and repeat The collection of accepted θ∗ is an ensemble of draws from approximate posterior πδ(θ|S(xobs)). This is super inefficient (due to proposing from the prior) Better samplers rely on SMC or MCMC, where θ∗ ∼ q(θ|θ ). Comprehensive monography (many chapters on arxiv): Sisson, Fan, Beaumont. (2018). Handbook of approximate Bayesian computation. Chapman and Hall/CRC. Umberto Picchini, @uPicchini 4/26
  • 5. Constructing appropriate summary statistics opens a pandora box of additional issues I am not going to talk about.2 Let’s just assume we have some “informative” summaries for θ. This way we can sample from the approximate posterior πδ(θ|sobs) ∝ π(θ) I{||s∗−sobs||<δ}p(s∗ |θ)ds∗ We have that πδ(θ|sobs) → π(θ|sobs) (δ → 0) Rather unrealistic to assume S(·) a sufficient statistics, but if that happens to be the case: πδ(θ|sobs) ≡ πδ(θ|xobs) 2 A review is Prangle, D. (2015). Summary statistics in approximate Bayesian computation. arXiv:1512.05633. Umberto Picchini, @uPicchini 5/26
  • 6. More in general, in place of the indicator funct. we can consider a kernel funct Kδ(s∗, sobs) and write πδ(θ|sobs) ∝ π(θ) Kδ(s∗ , sobs)p(s∗ |θ)ds∗ ABC likelihood for example we can use a Gaussian kernel Kδ(s∗ , sobs) ∝ exp(− 1 2δ2 (s∗ − sobs) Σ−1 (s∗ − sobs)) Other kernels are possible, e.g. Epanechnikov’s kernel. Umberto Picchini, @uPicchini 6/26
  • 7. The ABC likelihood: Kδ(s∗, sobs)p(s∗|θ)ds∗. This can trivially be approximated unbiasedly via Monte Carlo as3 Kδ(s∗ , sobs)p(s∗ |θ)ds∗ ≈ 1 M M r=1 Kδ(s∗r , sobs), s∗r ∼ iid p(s∗ |θ). and plugged in a Metropolis-Hastings ABC-MCMC algorithm, proposing a move θ∗ ∼ q(θ|θ#), and accepting with probability 1 ∧ 1 M M r=1 Kδ(s∗r, s) 1 M M r=1 Kδ(s#r, s) · π(θ∗) π(θ#) · q(θ#|θ∗) q(θ∗|θ#) Problem: if the model simulator is computationally expensive, having M large is unfeasible. 3 Lee, Andrieu, Doucet (2012): Discussion of Prangle 2012 JRSS-B. Umberto Picchini, @uPicchini 7/26
  • 8. So we have the approximated ABC posterior (up to a constant) πδ(θ|sobs) ≈ π(θ) · 1 M M r=1 Kδ(s∗r , sobs) unbiased , s∗r ∼ iid M times p(s∗ |θ). • No matter the value of M, the ABC-MCMC will sample exactly from πδ(θ|sobs), because of the unbiased likelihood estimator; • this makes the algorithm an instance of pseudomarginal MCMC [Andrieu, Roberts 2009]4 • Typically ABC-MCMC is computationally intensive and a small M is chosen, say M = 1; • the lower the M the higher the variance of the estimate of the ABC likelihood Kδ(s∗ , s)p(s∗ |θ)ds∗ • the larger the variance the worse the mixing of the chain (due to occasional overestimation of the likelihood). 4 Andrieu, C., and Roberts, G. O. (2009). The pseudo-marginal approach for efficient Monte Carlo computations. The Annals of Statistics, 37(2), 697-725. Umberto Picchini, @uPicchini 8/26
  • 9. Dilemma: a small M will decrease the runtime considerably, however it will increase the chance to overestimate the likelihood → possibly high-rejection rates. Question: is it worth to have M > 1 to reduce the variance of the ABC likelihood given the higher computational cost? Bornn et al 20175 found that no, it is not worth and M = 1 is just fine (when using a uniform kernel). Basically using M = 1 is so much faster to run that the decreased variance obtained with M > 1 is not worth given the higher computational cost. 5 L. Bornn, N. S. Pillai, A. Smith, and D. Woodard. The use of a single pseudo-sample in approximate Bayesian computation. Statistics and Computing, 27(3):583590, 2017. Umberto Picchini, @uPicchini 9/26
  • 10. Data resampling In a similar context (based on synthetic likelihood approaches) Everitt 20176 used the following approach: At any proposed θ • simulate say M = 1 datasets x∗ ∼ p(x|θ); • sample with replacement from elements in x∗ to obtain a resampled dataset (with dimension dim(xobs)); • repeat the resampling to obtain x∗1 , ..., x∗R resampled datasets from x∗ ; • compute the summaries s∗1 , ..., s∗R for each resampled dataset; Cheap compared to producing M independent summaries from the model, when the simulator is computationally intensive. This reduces the variance of the ABC likelihood compared to using M = 1 without resampling. 6 Everitt (2017). Bootstrapped synthetic likelihood. arXiv:1711.05825. Umberto Picchini, @uPicchini 10/26
  • 11. The problem with using the bootstrapped procedure within ABC is that it bias the estimation of the ABC likelihood considerably. Example: data is 1000 iid observations from N(θ = 0, 1). Set Gaussian prior on θ → known analytic posterior. • Left: pseudo-marginal ABC with M = 100 independent datasets and sufficient S(xobs) = ¯xobs; • Right: M = 1 and R = 100 resampled datasets Umberto Picchini, @uPicchini 11/26
  • 12. Stratified Monte Carlo Stratified Monte Carlo is a variance reduction technique. In full generality: want to approximate µ = D f(x)p(x)dx over some space D, for some function f and density (or probability mass) function p. Now partition D into J “strata” D1, ..., DJ : • ∪J j=1Dj = D • Dj ∩ Dj = ∅, j = j Umberto Picchini, @uPicchini 12/26
  • 13. Samples from bivariate N2(0, I2) 6 concentric rings and 7 equally probable strata. Each stratum has exactly 3 points sampled from within it. Better to oversample from the most important slices, where the integrand has higher mass. Umberto Picchini, @uPicchini 13/26
  • 14. Ideally the statistician should decide how many Monte Carlo draws to sample from each stratum Dj. • Call this number ˜nj; • define ωj := P(X ∈ Dj) Probabilities ωj should be known. Then we approximate µ = D f(x)p(x)dx with ˆµstrat = J j=1 ωj ˜nj x∗∈Dj f(x∗ ) , x∗ ∼ p(x|x ∈ Dj) This is the (unbiased) stratified MC estimator. Variance reduction compared to vanilla MC estimator can be obtained if we know how many ˜nj to sample from each stratum (e.g “proportional allocation method”7 ) 7 Art Owen (2013), Monte Carlo theory, methods and examples. Umberto Picchini, @uPicchini 14/26
  • 15. However in our settings we can’t assume ability to simulate from within a given stratum; so we can’t decide ˜nj. And we can’t assume to know anything about ωj := P(X ∈ Dj). We use a “post stratification” approach (e.g. Owen 2013)8 • first generate many x∗ ∼ p(x) (i.e. from the model simulator); • count the number of x∗ ending up in each stratum Dj; • call these frequencies nj; So these frequencies are known after the simulation is done, not before. However we still do not know anything about the ωj = P(X ∈ Dj). We are going to address this soon within an ABC framework. 8 Art Owen: “Monte Carlo theory, methods and examples” 2013. Umberto Picchini, @uPicchini 15/26
  • 16. Define strata for ABC Suppose we have an ns-dimensional summary, i.e. ns = dim(sobs) and consider the Gaussian kernel Kδ(s∗ , sobs) = 1 δns exp − 1 2δ2 (s∗ − sobs) Σ−1 (s∗ − sobs) . In ABC the µ to approximate via stratified MC is the likelihood D Kδ(s∗ , sobs)p(s∗ |θ)ds∗ So lets partition D... Umberto Picchini, @uPicchini 16/26
  • 17. Define strata for ABC Example to define three strata: • D1 = {s∗ s.t. s∗ − sobs < δ/2} • D2 = {s∗ s.t. s∗ − sobs < δ}D1 • D3 = D{D1 ∪ D2} And more explicitly: • D1 = {s∗ s.t. (s∗ − sobs) Σ−1(s∗ − sobs) ∈ (0, δ/2]} • D2 = {s∗ s.t. (s∗ − sobs) Σ−1(s∗ − sobs) ∈ (δ/2, δ]} • D3 = {s∗ s.t. (s∗ − sobs) Σ−1(s∗ − sobs) ∈ (δ, ∞)}. Because of our resampling approach, for every θ we have R 1 simulated summaries, say R = 100. We just need to count how many summaries fall into D1 instead of D2 instead of D3. This give us n1, n2 and n3 = R − (n1 + n2). Umberto Picchini, @uPicchini 17/26
  • 18. How about the strata probabilities? We still need to estimate the strata probabilities ωj = P(s∗ ∈ Dj). This is easy because ωj = Dj p(s∗|θ)ds∗ which we estimate by another MC simulation. So 1 simulate once from the model x∗ ∼ p(x|θ) 2 resample R times from x∗ to obtain x∗1, ..., x∗R 3 compute summaries s∗1, ..., s∗R 4 obtain distances dr := (s∗r − sobs) Σ−1(s∗r − sobs) ˆω1 := 1 R R r=1 I{dr≤δ/2}, ˆω2 := 1 R R r=1 I{δ/2<dr≤δ}, ˆω3 := 1 − 2 j=1 ˆωj. Umberto Picchini, @uPicchini 18/26
  • 19. We finally have a (biased) estimator of the ABC likelihood using J strata: ˆˆµstrat = J j=1 ˆωj nj r∈Dj Kδ(s∗r , sobs) , Bias dues both to resampling and stratification with estimated ωj. Notice the above is not quite ok. What if some nj = 0? (neglected stratum) In our ABC-MCMC we reject proposal θ∗ as soon as nj = 0, so we actually use ˆˆµstrat = J j=1 ˆωj nj r∈Dj Kδ(s∗r , sobs) I{nj>0,∀j} Umberto Picchini, @uPicchini 19/26
  • 20. Stratified MC within ABC-MCMC As usual, we accept a proposal using a MH step: propose θ∗ ∼ q(θ|θ#) and accept with probability 1 ∧ ˆˆµstrat(θ∗) ˆˆµstrat(θ#) · π(θ∗) π(θ#) · q(θ#|θ∗) q(θ∗|θ#) if we accept, set: θ# := θ∗ and ˆˆµstrat(θ#) := ˆˆµstrat(θ∗). Repeat a few thousands of times. Umberto Picchini, @uPicchini 20/26
  • 21. Reprising the Gaussian example This is a super-trivial study, but it is still instructive. Data: 1000 iid observations ∼ N(θ = 0, 1). Gaussian prior → exact posterior Red: exact posterior. Blue: different types of ABC-MCMC posteriors. -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0 2 4 6 8 10 12 14 (a) pseudomarginal ABC, M = 100 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0 2 4 6 8 10 12 14 (b) M = 1 and R = 100 resamples -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0 2 4 6 8 10 12 14 (c) M = 1, R = 100 and stratification With stratification and only M = 1 we get results as good as with M = 100 (compare left and right). Umberto Picchini, @uPicchini 21/26
  • 22. Time series Our methodology is not restricted to iid data. Next example, uses “block bootstrap”9, where we resample blocks of observations for stationary time-series. • blocks are chosen to be sufficiently large such that they retain the short range dependence structure of the data; • so that a resampled time series, constructed by concatenating resampled blocks, has similar statistical properties to real data B =    (1 : B) block1 , (B + 1 : 2B) block2 , ..., (nobs − B + 1 : nobs) block nobs B    . At each θ we resample blocks of indeces of simulated data. 9 Kunsch, H. R. (1989). The jackknife and the bootstrap for general stationary observations. The Annals of Statistics, 1217-1241. Umberto Picchini, @uPicchini 22/26
  • 23. 2D Lotka-Volterra time series A predator-prey model with an intractable likelihood (Markov jump process). Two interacting species: X1 (# predators) and X2 (# prey). Populations evolve according to three interactions: • A prey may be born, with rate θ1X2, increasing X2 by one. • The predator-prey interaction in which X1 increases by one and X2 decreases by one, with rate θ2X1X2. • A predator may die, with rate θ3X1, decreasing X1 by one. Its solution may be simulated exactly using the “Gillespie algorithm”10. 10 Gillespie, D. T. (1977). Exact stochastic simulation of coupled chemical reactions. The journal of physical chemistry, 81(25), 2340-2361. Umberto Picchini, @uPicchini 23/26
  • 24. We have 32 observations for each species simulated via Gillespie’s algorithm. At each θ we simulate and resample 4 blocks each having size B = 8. We want inference for reaction rates (θ1, θ2, θ3). Set vague priors log θj ∼ U(−6, 2) Umberto Picchini, @uPicchini 24/26
  • 25. We run several experiments: • standard ABC-MCMC with M = 1 indep. datasets for each θ; • ABC-MCMC with M = 1, R = 100 resampled datasets and allocate across three strata; • D1 = {s∗ s.t. distance ∈ (0, δ/2)} • D2 = {s∗ s.t. distance ∈ (δ/2, δ]} • D2 = {s∗ s.t. distance ∈ (δ, ∞)} θ1 θ2 θ3 IAT accept. rate (%) true parameters 1 0.005 0.6 standard ABC (M = 1) 1.011 [0.93,1.13] 0.005 [0.0046,0.0055] 0.575 [0.504,0.627] 145 2.5 stratified ABC (R = 100) 0.989 [0.88,1.11] 0.005 [0.0044,0.0056] 0.577 [0.479,0.668] 114 6.5 Table: Mean and 95% posterior intervals for θ. For stratified ABC we used a δ 5-times larger than for standard ABC! With stratification: similar inference, but better acceptance rate and lower IAT. Umberto Picchini, @uPicchini 25/26
  • 26. Conclusions • stratified Monte Carlo is straightforward to implement and effective in reducing resampling bias; • Allows for precise ABC while using a larger δ; • smaller variance ABC likelihood → better mixing MCMC; • Downside: neglected strata may increase rejection rate; • more research needed for constructing optimal strata; • Ongoing work: comments most welcome! • more examples at: P and Everitt (2019). Stratified sampling and resampling for approximate Bayesian computation, https://arxiv.org/abs/1806.05982 picchini@chalmers.se @uPicchini Umberto Picchini, @uPicchini 26/26
  • 27. ABC loglikelihoods for Gaussian example -0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 0.1 -1 0 1 2 3 4 5 6 7 8 standard ABC M=100 resampling M=1, R=100 stratification M=1, R=100 Figure: 1D Gauss model: loglikelihood estimated via standard ABC (red), resampling ABC (blue), resampling + stratification (magenta). Solid lines are mean values over 500 estimations. Dashed lines are 2.5 and 97.5 percentiles. — — Umberto Picchini, @uPicchini 1/1