SlideShare a Scribd company logo
1 of 73
Download to read offline
Fast generic MCMC for targets with expensive
likelihoods
C.Sherlock (Lancaster), A.Golightly (Ncl) & D.Henderson
(Ncl); this talk by: Jere Koskela (Warwick)
Monte Carlo Techniques, Paris, July 2016
Motivation
Metropolis-Hastings (MH) algorithms create a realisation:
θ(1), θ(2), . . . from a Markov chain with stationary density π(θ).
For each θ(i) we evaluate π(θ(i)) - and then discard it.
Pseudo-marginal MH algorithms create a realisation of ˆπ(θ(i), u(i))
- and then discard it.
In many applications evaluating π(θ) or ˆπ(θ, u) is computationally
expensive.
We would like to reuse these values to create a more efficient MH
algorithm that still targets the correct stationary distribution.
Motivation
Metropolis-Hastings (MH) algorithms create a realisation:
θ(1), θ(2), . . . from a Markov chain with stationary density π(θ).
For each θ(i) we evaluate π(θ(i)) - and then discard it.
Pseudo-marginal MH algorithms create a realisation of ˆπ(θ(i), u(i))
- and then discard it.
In many applications evaluating π(θ) or ˆπ(θ, u) is computationally
expensive.
We would like to reuse these values to create a more efficient MH
algorithm that still targets the correct stationary distribution.
We will focus on the [pseudo-marginal] random walk Metropolis
(RWM) with a Gaussian proposal:
θ |θ ∼ N(θ, λ2
V ).
This talk
Creating an approximation to π(θ ): k-NN.
Using an approximation: Delayed acceptance [PsM]MH.
Storing the values: KD-trees.
Algorithm: adaptive da-[PsM]MH; ergodicity.
Choice of P(fixed kernel).
Simulation study.
Creating an approximation
At iteration n of the MH we have π(θ(1)), π(θ(2)), . . . , π(θ(n)), and
we would like to create ˆπa(θ ) ≈ π(θ ).
Creating an approximation
At iteration n of the MH we have π(θ(1)), π(θ(2)), . . . , π(θ(n)), and
we would like to create ˆπa(θ ) ≈ π(θ ).
Gaussian process to log π?:
Creating an approximation
At iteration n of the MH we have π(θ(1)), π(θ(2)), . . . , π(θ(n)), and
we would like to create ˆπa(θ ) ≈ π(θ ).
Gaussian process to log π?:
[problems with cost of fitting and evaluating as n ↑]
Creating an approximation
At iteration n of the MH we have π(θ(1)), π(θ(2)), . . . , π(θ(n)), and
we would like to create ˆπa(θ ) ≈ π(θ ).
Gaussian process to log π?:
[problems with cost of fitting and evaluating as n ↑]
Weighted average of k-nearest neighbour π values:
(i) Fitting cost: 0 (actually O(n0)).
Creating an approximation
At iteration n of the MH we have π(θ(1)), π(θ(2)), . . . , π(θ(n)), and
we would like to create ˆπa(θ ) ≈ π(θ ).
Gaussian process to log π?:
[problems with cost of fitting and evaluating as n ↑]
Weighted average of k-nearest neighbour π values:
(i) Fitting cost: 0 (actually O(n0)).
(ii) Per-iteration cost: O(n).
Creating an approximation
At iteration n of the MH we have π(θ(1)), π(θ(2)), . . . , π(θ(n)), and
we would like to create ˆπa(θ ) ≈ π(θ ).
Gaussian process to log π?:
[problems with cost of fitting and evaluating as n ↑]
Weighted average of k-nearest neighbour π values:
(i) Fitting cost: 0 (actually O(n0)).
(ii) Per-iteration cost: O(n).
(iii) Accuracy ↑ with n.
Delayed acceptance MH (1)
(Christen and Fox, 2005). Suppose we have a
computationally-cheap approximation to the posterior: ˆπa(θ).
Delayed acceptance MH (1)
(Christen and Fox, 2005). Suppose we have a
computationally-cheap approximation to the posterior: ˆπa(θ).
Define αda(θ; θ ) := α1(θ; θ ) α2(θ; θ ), where
α1 := 1 ∧
ˆπa(θ )q(θ|θ )
ˆπa(θ)q(θ |θ)
and α2 := 1 ∧
π(θ )/ˆπa(θ )
π(θ)/ˆπa(θ)
.
Detailed balance (with respect π(θ)) is still preserved with αda
because
Delayed acceptance MH (1)
(Christen and Fox, 2005). Suppose we have a
computationally-cheap approximation to the posterior: ˆπa(θ).
Define αda(θ; θ ) := α1(θ; θ ) α2(θ; θ ), where
α1 := 1 ∧
ˆπa(θ )q(θ|θ )
ˆπa(θ)q(θ |θ)
and α2 := 1 ∧
π(θ )/ˆπa(θ )
π(θ)/ˆπa(θ)
.
Detailed balance (with respect π(θ)) is still preserved with αda
because
π(θ) q(θ |θ) αda(θ; θ )
Delayed acceptance MH (1)
(Christen and Fox, 2005). Suppose we have a
computationally-cheap approximation to the posterior: ˆπa(θ).
Define αda(θ; θ ) := α1(θ; θ ) α2(θ; θ ), where
α1 := 1 ∧
ˆπa(θ )q(θ|θ )
ˆπa(θ)q(θ |θ)
and α2 := 1 ∧
π(θ )/ˆπa(θ )
π(θ)/ˆπa(θ)
.
Detailed balance (with respect π(θ)) is still preserved with αda
because
π(θ) q(θ |θ) αda(θ; θ )
= ˆπa(θ) q(θ |θ) α1 × π(θ)
ˆπa(θ) α2.
Delayed acceptance MH (1)
(Christen and Fox, 2005). Suppose we have a
computationally-cheap approximation to the posterior: ˆπa(θ).
Define αda(θ; θ ) := α1(θ; θ ) α2(θ; θ ), where
α1 := 1 ∧
ˆπa(θ )q(θ|θ )
ˆπa(θ)q(θ |θ)
and α2 := 1 ∧
π(θ )/ˆπa(θ )
π(θ)/ˆπa(θ)
.
Detailed balance (with respect π(θ)) is still preserved with αda
because
π(θ) q(θ |θ) αda(θ; θ )
= ˆπa(θ) q(θ |θ) α1 × π(θ)
ˆπa(θ) α2.
But this algorithm mixes worse than the equivalent MH algorithm
(Peskun, 1973; Tierney, 1998).
Delayed-acceptance [PsM]MH (2)
Using αda = α1(θ; θ )α2(θ; θ ) mixes worse but CPU time/iteration
can be much reduced.
Delayed-acceptance [PsM]MH (2)
Using αda = α1(θ; θ )α2(θ; θ ) mixes worse but CPU time/iteration
can be much reduced.
Accept ⇔ accept at Stage 1 (w.p. α1) and accept at Stage 2 (w.p. α2).
Delayed-acceptance [PsM]MH (2)
Using αda = α1(θ; θ )α2(θ; θ ) mixes worse but CPU time/iteration
can be much reduced.
Accept ⇔ accept at Stage 1 (w.p. α1) and accept at Stage 2 (w.p. α2).
α1 is quick to calculate.
Delayed-acceptance [PsM]MH (2)
Using αda = α1(θ; θ )α2(θ; θ ) mixes worse but CPU time/iteration
can be much reduced.
Accept ⇔ accept at Stage 1 (w.p. α1) and accept at Stage 2 (w.p. α2).
α1 is quick to calculate.
There is no need to calculate α2 if we reject at Stage One.
Delayed-acceptance [PsM]MH (2)
Using αda = α1(θ; θ )α2(θ; θ ) mixes worse but CPU time/iteration
can be much reduced.
Accept ⇔ accept at Stage 1 (w.p. α1) and accept at Stage 2 (w.p. α2).
α1 is quick to calculate.
There is no need to calculate α2 if we reject at Stage One.
If ˆπa is accurate then α2 ≈ 1.
Delayed-acceptance [PsM]MH (2)
Using αda = α1(θ; θ )α2(θ; θ ) mixes worse but CPU time/iteration
can be much reduced.
Accept ⇔ accept at Stage 1 (w.p. α1) and accept at Stage 2 (w.p. α2).
α1 is quick to calculate.
There is no need to calculate α2 if we reject at Stage One.
If ˆπa is accurate then α2 ≈ 1.
If ˆπa is also cheap then (RWM) can use large jump proposals
[EXPLAIN].
Delayed-acceptance [PsM]MH (2)
Using αda = α1(θ; θ )α2(θ; θ ) mixes worse but CPU time/iteration
can be much reduced.
Accept ⇔ accept at Stage 1 (w.p. α1) and accept at Stage 2 (w.p. α2).
α1 is quick to calculate.
There is no need to calculate α2 if we reject at Stage One.
If ˆπa is accurate then α2 ≈ 1.
If ˆπa is also cheap then (RWM) can use large jump proposals
[EXPLAIN].
Delayed-acceptance PMMH:
α2 := 1 ∧
ˆπ(θ , u )/ˆπa(θ )
ˆπ(θ, u)/ˆπa(θ)
.
Cheap and accurate approximation?
We: use an inverse-distance-weighted average of the π values from
the k nearest neighbours.
Cheap and accurate approximation?
We: use an inverse-distance-weighted average of the π values from
the k nearest neighbours.
But the cost is still O(n)/iter.
k-nn and the binary tree
Imagine a table with n values.
θ
(1)
1 θ
(1)
2 · · · θ
(1)
d π(θ(1))
θ
(2)
1 θ
(2)
2 · · · θ
(2)
d π(θ(2))
· · · · · · · · · · · · · · ·
θ
(n)
1 θ
(n)
2 · · · θ
(n)
d π(θ(n))
Look-up of k nearest neighbours to some θ is O(n).
k-nn and the binary tree
Imagine a table with n values.
θ
(1)
1 θ
(1)
2 · · · θ
(1)
d π(θ(1))
θ
(2)
1 θ
(2)
2 · · · θ
(2)
d π(θ(2))
· · · · · · · · · · · · · · ·
θ
(n)
1 θ
(n)
2 · · · θ
(n)
d π(θ(n))
Look-up of k nearest neighbours to some θ is O(n).
If d = 1 then could sort list or create a standard binary tree
(median)
(q1) (q3)
(oct1) (oct3) (oct5) (oct7)
for O(log n) look up.
k-nn and the binary tree
Imagine a table with n values.
θ
(1)
1 θ
(1)
2 · · · θ
(1)
d π(θ(1))
θ
(2)
1 θ
(2)
2 · · · θ
(2)
d π(θ(2))
· · · · · · · · · · · · · · ·
θ
(n)
1 θ
(n)
2 · · · θ
(n)
d π(θ(n))
Look-up of k nearest neighbours to some θ is O(n).
If d = 1 then could sort list or create a standard binary tree
(median)
(q1) (q3)
(oct1) (oct3) (oct5) (oct7)
for O(log n) look up. For d > 1 a solution is the KD-tree.
KD-tree (d=2)
d−split
1 m1:=m{θ1}
m{S} = branch splitting according to θd-split on median of S.
[L] = leaf node with a maximum of 2b − 1 leaves.
KD-tree useful if n/(3b/2) > 2d .
KD-tree (d=2)
d−split
1
2
m1:=m{θ1}
m2−:=m{θ2:θ1<m1} m2+:=m{θ2:θ1>m1}
m{S} = branch splitting according to θd-split on median of S.
[L] = leaf node with a maximum of 2b − 1 leaves.
KD-tree useful if n/(3b/2) > 2d .
KD-tree (d=2)
d−split
1
2
1
m1:=m{θ1}
m2−:=m{θ2:θ1<m1} m2+:=m{θ2:θ1>m1}
m1−−:=m{θ1:θ1<m1θ2<m2−} [L] m1+−:=m{θ1:θ1>m1θ2<m2−} [L]
m{S} = branch splitting according to θd-split on median of S.
[L] = leaf node with a maximum of 2b − 1 leaves.
KD-tree useful if n/(3b/2) > 2d .
KD-tree (d=2)
d−split
1
2
1
2
m1:=m{θ1}
m2−:=m{θ2:θ1<m1} m2+:=m{θ2:θ1>m1}
m1−−:=m{θ1:θ1<m1θ2<m2−} [L] m1+−:=m{θ1:θ1>m1θ2<m2−} [L]
[L] [L] [L] [L]
m{S} = branch splitting according to θd-split on median of S.
[L] = leaf node with a maximum of 2b − 1 leaves.
KD-tree (d=2)
d−split
1
2
1
2
m1:=m{θ1}
m2−:=m{θ2:θ1<m1} m2+:=m{θ2:θ1>m1}
m1−−:=m{θ1:θ1<m1θ2<m2−} [L] m1+−:=m{θ1:θ1>m1θ2<m2−} [L]
[L] [L] [L] [L]
m{S} = branch splitting according to θd-split on median of S.
[L] = leaf node with a maximum of 2b − 1 leaves.
Our KD-tree is useful if (roughly) n/(3b/2) > 2d .
Adaptive k-nn using a KD-tree
Initial, training run of n0 iterations. Build initial KD-tree.
Adaptive k-nn using a KD-tree
Initial, training run of n0 iterations. Build initial KD-tree.
Main run: ‘every time’ π(θ ) is evaluated, add (θ , π(θ )) to the
KD-tree.
Adaptive k-nn using a KD-tree
Initial, training run of n0 iterations. Build initial KD-tree.
Main run: ‘every time’ π(θ ) is evaluated, add (θ , π(θ )) to the
KD-tree.
Set-up is O n0(log n0)2 ; updating is O(log n) evaluation is
O(log n) and accuracy ↑ as the MCMC progresses.
Adaptive k-nn using a KD-tree
Initial, training run of n0 iterations. Build initial KD-tree.
Main run: ‘every time’ π(θ ) is evaluated, add (θ , π(θ )) to the
KD-tree.
Set-up is O n0(log n0)2 ; updating is O(log n) evaluation is
O(log n) and accuracy ↑ as the MCMC progresses.
Provided the tree is balanced. [Skip, for lack of time]
Refinements
Training dataset ⇒ better distance metric. Transform θ to
approximately isotropic before creating tree, or adding new node.
Refinements
Training dataset ⇒ better distance metric. Transform θ to
approximately isotropic before creating tree, or adding new node.
Minimum distance . If ∃ θ s.t. ||θ − θ|| < then
(i) MH: ignore new value.
(ii) PMMH: combine ˆπ(y|θ , u ) with ˆπ(y|θ, u) (running average).
Adaptive Algorithm
Components
A fixed [pseudo-marginal] kernel P([θ, u]; [dθ , du ]).
An adaptive [pseudo-marginal] DA kernel Pγ([θ, u]; [dθ , du ]).
Adaptive Algorithm
Components
A fixed [pseudo-marginal] kernel P([θ, u]; [dθ , du ]).
An adaptive [pseudo-marginal] DA kernel Pγ([θ, u]; [dθ , du ]).
Both P and Pγ generate ˆπ(θ , u) in the same way.
A fixed probability β ∈ (0, 1].
A set of probabilities: pn → 0.
Algorithm At the start of iteration n, the chain is at [θ, u] and the
DA kernel would be Pγ.
Adaptive Algorithm
Components
A fixed [pseudo-marginal] kernel P([θ, u]; [dθ , du ]).
An adaptive [pseudo-marginal] DA kernel Pγ([θ, u]; [dθ , du ]).
Both P and Pγ generate ˆπ(θ , u) in the same way.
A fixed probability β ∈ (0, 1].
A set of probabilities: pn → 0.
Algorithm At the start of iteration n, the chain is at [θ, u] and the
DA kernel would be Pγ.
1. Sample [θ , u ] from
P w.p. β
Pγ w.p. 1 − β.
Adaptive Algorithm
Components
A fixed [pseudo-marginal] kernel P([θ, u]; [dθ , du ]).
An adaptive [pseudo-marginal] DA kernel Pγ([θ, u]; [dθ , du ]).
Both P and Pγ generate ˆπ(θ , u) in the same way.
A fixed probability β ∈ (0, 1].
A set of probabilities: pn → 0.
Algorithm At the start of iteration n, the chain is at [θ, u] and the
DA kernel would be Pγ.
1. Sample [θ , u ] from
P w.p. β
Pγ w.p. 1 − β.
2. W.p. pn ‘choose a new γ’: update the kernel by including all
relevant information since the kernel was last updated.
Adaptive Algorithm
Components
A fixed [pseudo-marginal] kernel P([θ, u]; [dθ , du ]).
An adaptive [pseudo-marginal] DA kernel Pγ([θ, u]; [dθ , du ]).
Both P and Pγ generate ˆπ(θ , u) in the same way.
A fixed probability β ∈ (0, 1].
A set of probabilities: pn → 0.
Algorithm At the start of iteration n, the chain is at [θ, u] and the
DA kernel would be Pγ.
1. Sample [θ , u ] from
P w.p. β
Pγ w.p. 1 − β.
2. W.p. pn ‘choose a new γ’: update the kernel by including all
relevant information since the kernel was last updated.
Set: pn = 1/(1 + cin), where in = # expensive evaluations so far.
Ergodicity
Assumptions on the fixed kernel.
1. Minorisation: there is a density ν(θ) and δ > 0 such that
q(θ |θ)α(θ; θ ) > δν(θ ), where α is the acceptance rate from
the idealised version of the fixed MH algorithm.
Ergodicity
Assumptions on the fixed kernel.
1. Minorisation: there is a density ν(θ) and δ > 0 such that
q(θ |θ)α(θ; θ ) > δν(θ ), where α is the acceptance rate from
the idealised version of the fixed MH algorithm.
2. Bounded weights: the support of W := ˆπ(θ, U)/π(θ) is
uniformly (in θ) bounded above by some w < ∞.
Ergodicity
Assumptions on the fixed kernel.
1. Minorisation: there is a density ν(θ) and δ > 0 such that
q(θ |θ)α(θ; θ ) > δν(θ ), where α is the acceptance rate from
the idealised version of the fixed MH algorithm.
2. Bounded weights: the support of W := ˆπ(θ, U)/π(θ) is
uniformly (in θ) bounded above by some w < ∞.
Theorem Subject to Assumptions 1 and 2, the adaptive
pseudo-marginal algorithm is ergodic.
Ergodicity
Assumptions on the fixed kernel.
1. Minorisation: there is a density ν(θ) and δ > 0 such that
q(θ |θ)α(θ; θ ) > δν(θ ), where α is the acceptance rate from
the idealised version of the fixed MH algorithm.
2. Bounded weights: the support of W := ˆπ(θ, U)/π(θ) is
uniformly (in θ) bounded above by some w < ∞.
Theorem Subject to Assumptions 1 and 2, the adaptive
pseudo-marginal algorithm is ergodic.
NB. For DAMH, as opposed to DAPMMH, only the minorisation
assumption is required.
Choice of β
DARWM is more efficient when λ > ˆλRWM.
Choice of β
DARWM is more efficient when λ > ˆλRWM.
The lower α1 does not matter.
Choice of β
DARWM is more efficient when λ > ˆλRWM.
The lower α1 does not matter.
But low α1 ⇒ fewer expensive evaluations of π [or of ˆπ].
Choice of β
DARWM is more efficient when λ > ˆλRWM.
The lower α1 does not matter.
But low α1 ⇒ fewer expensive evaluations of π [or of ˆπ].
P(expensive) = β + (1 − β)α1.
Choice of β
DARWM is more efficient when λ > ˆλRWM.
The lower α1 does not matter.
But low α1 ⇒ fewer expensive evaluations of π [or of ˆπ].
P(expensive) = β + (1 − β)α1.
If α1 << 1 most of the expensive evaluations come from the fixed
kernel and much of the benefit from the adaptive kernel will be
lost.
Choice of β
DARWM is more efficient when λ > ˆλRWM.
The lower α1 does not matter.
But low α1 ⇒ fewer expensive evaluations of π [or of ˆπ].
P(expensive) = β + (1 − β)α1.
If α1 << 1 most of the expensive evaluations come from the fixed
kernel and much of the benefit from the adaptive kernel will be
lost.
Fixing β ∝ α1 (obtained from the training run) avoids this,
Choice of β
DARWM is more efficient when λ > ˆλRWM.
The lower α1 does not matter.
But low α1 ⇒ fewer expensive evaluations of π [or of ˆπ].
P(expensive) = β + (1 − β)α1.
If α1 << 1 most of the expensive evaluations come from the fixed
kernel and much of the benefit from the adaptive kernel will be
lost.
Fixing β ∝ α1 (obtained from the training run) avoids this, but the
guaranteed worst-case TVD from π after n iterations gets larger.
Choice of β
DARWM is more efficient when λ > ˆλRWM.
The lower α1 does not matter.
But low α1 ⇒ fewer expensive evaluations of π [or of ˆπ].
P(expensive) = β + (1 − β)α1.
If α1 << 1 most of the expensive evaluations come from the fixed
kernel and much of the benefit from the adaptive kernel will be
lost.
Fixing β ∝ α1 (obtained from the training run) avoids this, but the
guaranteed worst-case TVD from π after n iterations gets larger.
Consider a fixed computational budget ≈ fixed number of
expensive evaluations. This preserves the provable worst-case TVD
from π.
Examples
Lotka-Volterra MJP daPMRWM with d = 5
LNA approximation to autoregulatary system daRWM with
d = 10
Examples
Lotka-Volterra MJP daPMRWM with d = 5
LNA approximation to autoregulatary system daRWM with
d = 10
RWM: θ ∼ N(θ, λ2 ˆΣ) where ˆΣ, obtained from training run (also
gives pre-map).
Examples
Lotka-Volterra MJP daPMRWM with d = 5
LNA approximation to autoregulatary system daRWM with
d = 10
RWM: θ ∼ N(θ, λ2 ˆΣ) where ˆΣ, obtained from training run (also
gives pre-map).
Scaling, λ [and number of particles m] chosen to be optimal for
RWM.
Examples
Lotka-Volterra MJP daPMRWM with d = 5
LNA approximation to autoregulatary system daRWM with
d = 10
RWM: θ ∼ N(θ, λ2 ˆΣ) where ˆΣ, obtained from training run (also
gives pre-map).
Scaling, λ [and number of particles m] chosen to be optimal for
RWM.
n0 = 10000 (from training run), b = 10, c = 0.001.
Efficency =
minj=1...d ESSj
CPU time
Results: LV
RelESS= efficiency of DA[PM]RWM / efficiency of optimal RWM.
1.0 1.5 2.0 2.5 3.0 3.5 4.0
34567
xi
RelESS
1.0 1.5 2.0 2.5 3.0 3.5 4.0
0.000.050.100.150.20 xi
alp1
xi=1 corresponds to the DA using the scaling that is optimal for
the standard RWM algorithm. i.e. DA scaling = xi׈λRWM.
Results: Autoreg.
Solid=shorter dataset; dashed=longer dataset.
0.5 1.0 1.5 2.0 2.5 3.0
02000400060008000
xi
ESS
0.5 1.0 1.5 2.0 2.5 3.0012345
xi
RelESS
LV: further experiments
c Tree Size ˆα1 ˆα2 Rel. mESS
0.0001 41078 0.00772 0.339 7.28
0.001 40256 0.00915 0.276 6.80
0.01 43248 0.0121 0.204 4.67
∞ 10000 0.0175 0.136 3.46
LV: further experiments
c Tree Size ˆα1 ˆα2 Rel. mESS
0.0001 41078 0.00772 0.339 7.28
0.001 40256 0.00915 0.276 6.80
0.01 43248 0.0121 0.204 4.67
∞ 10000 0.0175 0.136 3.46
Using a list rather than the KD-tree reduced the efficiency by a
factor of ≈ 2.
Summary
Store the expensive evaluations of π or ˆπ in a KD-tree.
Summary
Store the expensive evaluations of π or ˆπ in a KD-tree.
Use a delayed-acceptance [PsM]RWM algorithm.
Summary
Store the expensive evaluations of π or ˆπ in a KD-tree.
Use a delayed-acceptance [PsM]RWM algorithm.
Adaptive algorithm converges subject to conditions.
Summary
Store the expensive evaluations of π or ˆπ in a KD-tree.
Use a delayed-acceptance [PsM]RWM algorithm.
Adaptive algorithm converges subject to conditions.
Need β ∝ α1 for adaptive portion to play a part.
Summary
Store the expensive evaluations of π or ˆπ in a KD-tree.
Use a delayed-acceptance [PsM]RWM algorithm.
Adaptive algorithm converges subject to conditions.
Need β ∝ α1 for adaptive portion to play a part.
Improvement in efficiency by factor of between 4-7
Summary
Store the expensive evaluations of π or ˆπ in a KD-tree.
Use a delayed-acceptance [PsM]RWM algorithm.
Adaptive algorithm converges subject to conditions.
Need β ∝ α1 for adaptive portion to play a part.
Improvement in efficiency by factor of between 4-7
Code easy-to-use, generic C code for the KD-tree is available.
Summary
Store the expensive evaluations of π or ˆπ in a KD-tree.
Use a delayed-acceptance [PsM]RWM algorithm.
Adaptive algorithm converges subject to conditions.
Need β ∝ α1 for adaptive portion to play a part.
Improvement in efficiency by factor of between 4-7
Code easy-to-use, generic C code for the KD-tree is available.
Limitations: Need n 2d for KD-tree to give worthwhile speedup
and for adequate coverage.
Summary
Store the expensive evaluations of π or ˆπ in a KD-tree.
Use a delayed-acceptance [PsM]RWM algorithm.
Adaptive algorithm converges subject to conditions.
Need β ∝ α1 for adaptive portion to play a part.
Improvement in efficiency by factor of between 4-7
Code easy-to-use, generic C code for the KD-tree is available.
Limitations: Need n 2d for KD-tree to give worthwhile speedup
and for adequate coverage.
Other: could use k nearest neighbours to estimate gradient and
curvature;
Summary
Store the expensive evaluations of π or ˆπ in a KD-tree.
Use a delayed-acceptance [PsM]RWM algorithm.
Adaptive algorithm converges subject to conditions.
Need β ∝ α1 for adaptive portion to play a part.
Improvement in efficiency by factor of between 4-7
Code easy-to-use, generic C code for the KD-tree is available.
Limitations: Need n 2d for KD-tree to give worthwhile speedup
and for adequate coverage.
Other: could use k nearest neighbours to estimate gradient and
curvature; even to fit a local GP?
Summary
Store the expensive evaluations of π or ˆπ in a KD-tree.
Use a delayed-acceptance [PsM]RWM algorithm.
Adaptive algorithm converges subject to conditions.
Need β ∝ α1 for adaptive portion to play a part.
Improvement in efficiency by factor of between 4-7
Code easy-to-use, generic C code for the KD-tree is available.
Limitations: Need n 2d for KD-tree to give worthwhile speedup
and for adequate coverage.
Other: could use k nearest neighbours to estimate gradient and
curvature; even to fit a local GP?
Thank you!

More Related Content

What's hot

short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018Christian Robert
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical MethodsChristian Robert
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerChristian Robert
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsStefano Cabras
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsCaleb (Shiqiang) Jin
 
Linear response theory and TDDFT
Linear response theory and TDDFT Linear response theory and TDDFT
Linear response theory and TDDFT Claudio Attaccalite
 
Markov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themMarkov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themPierre Jacob
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
The proof complexity of matrix algebra - Newton Institute, Cambridge 2006
The proof complexity of matrix algebra - Newton Institute, Cambridge 2006The proof complexity of matrix algebra - Newton Institute, Cambridge 2006
The proof complexity of matrix algebra - Newton Institute, Cambridge 2006Michael Soltys
 

What's hot (20)

QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 
Adc
AdcAdc
Adc
 
Rdnd2008
Rdnd2008Rdnd2008
Rdnd2008
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like sampler
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
 
Linear response theory and TDDFT
Linear response theory and TDDFT Linear response theory and TDDFT
Linear response theory and TDDFT
 
BSE and TDDFT at work
BSE and TDDFT at workBSE and TDDFT at work
BSE and TDDFT at work
 
Markov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themMarkov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing them
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
The proof complexity of matrix algebra - Newton Institute, Cambridge 2006
The proof complexity of matrix algebra - Newton Institute, Cambridge 2006The proof complexity of matrix algebra - Newton Institute, Cambridge 2006
The proof complexity of matrix algebra - Newton Institute, Cambridge 2006
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 

Viewers also liked

folding Markov chains: the origaMCMC
folding Markov chains: the origaMCMCfolding Markov chains: the origaMCMC
folding Markov chains: the origaMCMCChristian Robert
 
Integrated healthcare by Mirza Yawar Baig
Integrated healthcare by Mirza Yawar BaigIntegrated healthcare by Mirza Yawar Baig
Integrated healthcare by Mirza Yawar BaigMirza Yawar Baig
 
Kegunaan google earth dalam geometri matematika(1)
Kegunaan google earth dalam geometri matematika(1)Kegunaan google earth dalam geometri matematika(1)
Kegunaan google earth dalam geometri matematika(1)Eipusta
 
Ratio of uniforms and beyond
Ratio of uniforms and beyondRatio of uniforms and beyond
Ratio of uniforms and beyondChristian Robert
 
マーケターはどうやって自らの経験とスキルを磨くのか?(前回のつづき)
マーケターはどうやって自らの経験とスキルを磨くのか?(前回のつづき)マーケターはどうやって自らの経験とスキルを磨くのか?(前回のつづき)
マーケターはどうやって自らの経験とスキルを磨くのか?(前回のつづき)Hiroyuki Aoyama
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationChristian Robert
 
ÖNCEL AKADEMİ: TEMEL DEPREM EĞİTİMİ
ÖNCEL AKADEMİ: TEMEL DEPREM EĞİTİMİ ÖNCEL AKADEMİ: TEMEL DEPREM EĞİTİMİ
ÖNCEL AKADEMİ: TEMEL DEPREM EĞİTİMİ Ali Osman Öncel
 
PHP Conference 2015: Construindo e mantendo aplicações multi-tenant (multi-c...
PHP Conference 2015:  Construindo e mantendo aplicações multi-tenant (multi-c...PHP Conference 2015:  Construindo e mantendo aplicações multi-tenant (multi-c...
PHP Conference 2015: Construindo e mantendo aplicações multi-tenant (multi-c...Aryel Tupinambá
 
Statistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityStatistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityChristian Robert
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsChristian Robert
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...Christian Robert
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
ABC short course: final chapters
ABC short course: final chaptersABC short course: final chapters
ABC short course: final chaptersChristian Robert
 
ABC short course: survey chapter
ABC short course: survey chapterABC short course: survey chapter
ABC short course: survey chapterChristian Robert
 
集客基礎講座スライド
集客基礎講座スライド集客基礎講座スライド
集客基礎講座スライドHideki Kitaoka
 

Viewers also liked (20)

folding Markov chains: the origaMCMC
folding Markov chains: the origaMCMCfolding Markov chains: the origaMCMC
folding Markov chains: the origaMCMC
 
Integrated healthcare by Mirza Yawar Baig
Integrated healthcare by Mirza Yawar BaigIntegrated healthcare by Mirza Yawar Baig
Integrated healthcare by Mirza Yawar Baig
 
Kegunaan google earth dalam geometri matematika(1)
Kegunaan google earth dalam geometri matematika(1)Kegunaan google earth dalam geometri matematika(1)
Kegunaan google earth dalam geometri matematika(1)
 
Intractable likelihoods
Intractable likelihoodsIntractable likelihoods
Intractable likelihoods
 
Ratio of uniforms and beyond
Ratio of uniforms and beyondRatio of uniforms and beyond
Ratio of uniforms and beyond
 
マーケターはどうやって自らの経験とスキルを磨くのか?(前回のつづき)
マーケターはどうやって自らの経験とスキルを磨くのか?(前回のつづき)マーケターはどうやって自らの経験とスキルを磨くのか?(前回のつづき)
マーケターはどうやって自らの経験とスキルを磨くのか?(前回のつづき)
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
 
ÖNCEL AKADEMİ: TEMEL DEPREM EĞİTİMİ
ÖNCEL AKADEMİ: TEMEL DEPREM EĞİTİMİ ÖNCEL AKADEMİ: TEMEL DEPREM EĞİTİMİ
ÖNCEL AKADEMİ: TEMEL DEPREM EĞİTİMİ
 
Game development process 1 GDD
Game development process 1 GDDGame development process 1 GDD
Game development process 1 GDD
 
PHP Conference 2015: Construindo e mantendo aplicações multi-tenant (multi-c...
PHP Conference 2015:  Construindo e mantendo aplicações multi-tenant (multi-c...PHP Conference 2015:  Construindo e mantendo aplicações multi-tenant (multi-c...
PHP Conference 2015: Construindo e mantendo aplicações multi-tenant (multi-c...
 
Statistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityStatistics symposium talk, Harvard University
Statistics symposium talk, Harvard University
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
Big model, big data
Big model, big dataBig model, big data
Big model, big data
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
ABC short course: final chapters
ABC short course: final chaptersABC short course: final chapters
ABC short course: final chapters
 
ABC short course: survey chapter
ABC short course: survey chapterABC short course: survey chapter
ABC short course: survey chapter
 
ISBA 2016: Foundations
ISBA 2016: FoundationsISBA 2016: Foundations
ISBA 2016: Foundations
 
集客基礎講座スライド
集客基礎講座スライド集客基礎講座スライド
集客基礎講座スライド
 

Similar to Chris Sherlock's slides

第5回CCMSハンズオン(ソフトウェア講習会): AkaiKKRチュートリアル 1. KKR法
第5回CCMSハンズオン(ソフトウェア講習会): AkaiKKRチュートリアル 1. KKR法第5回CCMSハンズオン(ソフトウェア講習会): AkaiKKRチュートリアル 1. KKR法
第5回CCMSハンズオン(ソフトウェア講習会): AkaiKKRチュートリアル 1. KKR法Computational Materials Science Initiative
 
Analysis Of Algorithms Ii
Analysis Of Algorithms IiAnalysis Of Algorithms Ii
Analysis Of Algorithms IiSri Prasanna
 
D. Vulcanov - On Cosmologies with non-Minimally Coupled Scalar Field and the ...
D. Vulcanov - On Cosmologies with non-Minimally Coupled Scalar Field and the ...D. Vulcanov - On Cosmologies with non-Minimally Coupled Scalar Field and the ...
D. Vulcanov - On Cosmologies with non-Minimally Coupled Scalar Field and the ...SEENET-MTP
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Pierre Jacob
 
Skiena algorithm 2007 lecture09 linear sorting
Skiena algorithm 2007 lecture09 linear sortingSkiena algorithm 2007 lecture09 linear sorting
Skiena algorithm 2007 lecture09 linear sortingzukun
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Alexander Litvinenko
 
SMB_2012_HR_VAN_ST-last version
SMB_2012_HR_VAN_ST-last versionSMB_2012_HR_VAN_ST-last version
SMB_2012_HR_VAN_ST-last versionLilyana Vankova
 
Conformable Chebyshev differential equation of first kind
Conformable Chebyshev differential equation of first kindConformable Chebyshev differential equation of first kind
Conformable Chebyshev differential equation of first kindIJECEIAES
 
5.2 divide and conquer
5.2 divide and conquer5.2 divide and conquer
5.2 divide and conquerKrish_ver2
 
Divide and conquer
Divide and conquerDivide and conquer
Divide and conquerVikas Sharma
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning Sean Meyn
 
Non-informative reparametrisation for location-scale mixtures
Non-informative reparametrisation for location-scale mixturesNon-informative reparametrisation for location-scale mixtures
Non-informative reparametrisation for location-scale mixturesChristian Robert
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Fabian Pedregosa
 
Strong convergence of an algorithm about strongly quasi nonexpansive mappings
Strong convergence of an algorithm about strongly quasi nonexpansive mappingsStrong convergence of an algorithm about strongly quasi nonexpansive mappings
Strong convergence of an algorithm about strongly quasi nonexpansive mappingsAlexander Decker
 
Convergence methods for approximated reciprocal and reciprocal-square-root
Convergence methods for approximated reciprocal and reciprocal-square-rootConvergence methods for approximated reciprocal and reciprocal-square-root
Convergence methods for approximated reciprocal and reciprocal-square-rootKeigo Nitadori
 
11.formulation of seismic passive resistance of non
11.formulation of seismic passive resistance of non11.formulation of seismic passive resistance of non
11.formulation of seismic passive resistance of nonAlexander Decker
 

Similar to Chris Sherlock's slides (20)

The Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal FunctionThe Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal Function
 
第5回CCMSハンズオン(ソフトウェア講習会): AkaiKKRチュートリアル 1. KKR法
第5回CCMSハンズオン(ソフトウェア講習会): AkaiKKRチュートリアル 1. KKR法第5回CCMSハンズオン(ソフトウェア講習会): AkaiKKRチュートリアル 1. KKR法
第5回CCMSハンズオン(ソフトウェア講習会): AkaiKKRチュートリアル 1. KKR法
 
Analysis Of Algorithms Ii
Analysis Of Algorithms IiAnalysis Of Algorithms Ii
Analysis Of Algorithms Ii
 
Igv2008
Igv2008Igv2008
Igv2008
 
D. Vulcanov - On Cosmologies with non-Minimally Coupled Scalar Field and the ...
D. Vulcanov - On Cosmologies with non-Minimally Coupled Scalar Field and the ...D. Vulcanov - On Cosmologies with non-Minimally Coupled Scalar Field and the ...
D. Vulcanov - On Cosmologies with non-Minimally Coupled Scalar Field and the ...
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
Skiena algorithm 2007 lecture09 linear sorting
Skiena algorithm 2007 lecture09 linear sortingSkiena algorithm 2007 lecture09 linear sorting
Skiena algorithm 2007 lecture09 linear sorting
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...
 
SMB_2012_HR_VAN_ST-last version
SMB_2012_HR_VAN_ST-last versionSMB_2012_HR_VAN_ST-last version
SMB_2012_HR_VAN_ST-last version
 
Conformable Chebyshev differential equation of first kind
Conformable Chebyshev differential equation of first kindConformable Chebyshev differential equation of first kind
Conformable Chebyshev differential equation of first kind
 
03 dc
03 dc03 dc
03 dc
 
5.2 divide and conquer
5.2 divide and conquer5.2 divide and conquer
5.2 divide and conquer
 
Divide and conquer
Divide and conquerDivide and conquer
Divide and conquer
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning
 
Non-informative reparametrisation for location-scale mixtures
Non-informative reparametrisation for location-scale mixturesNon-informative reparametrisation for location-scale mixtures
Non-informative reparametrisation for location-scale mixtures
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
 
Pairing scott
Pairing scottPairing scott
Pairing scott
 
Strong convergence of an algorithm about strongly quasi nonexpansive mappings
Strong convergence of an algorithm about strongly quasi nonexpansive mappingsStrong convergence of an algorithm about strongly quasi nonexpansive mappings
Strong convergence of an algorithm about strongly quasi nonexpansive mappings
 
Convergence methods for approximated reciprocal and reciprocal-square-root
Convergence methods for approximated reciprocal and reciprocal-square-rootConvergence methods for approximated reciprocal and reciprocal-square-root
Convergence methods for approximated reciprocal and reciprocal-square-root
 
11.formulation of seismic passive resistance of non
11.formulation of seismic passive resistance of non11.formulation of seismic passive resistance of non
11.formulation of seismic passive resistance of non
 

More from Christian Robert

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceChristian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinChristian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihoodChristian Robert
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 

More from Christian Robert (20)

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 

Recently uploaded

Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 

Recently uploaded (20)

Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 

Chris Sherlock's slides

  • 1. Fast generic MCMC for targets with expensive likelihoods C.Sherlock (Lancaster), A.Golightly (Ncl) & D.Henderson (Ncl); this talk by: Jere Koskela (Warwick) Monte Carlo Techniques, Paris, July 2016
  • 2. Motivation Metropolis-Hastings (MH) algorithms create a realisation: θ(1), θ(2), . . . from a Markov chain with stationary density π(θ). For each θ(i) we evaluate π(θ(i)) - and then discard it. Pseudo-marginal MH algorithms create a realisation of ˆπ(θ(i), u(i)) - and then discard it. In many applications evaluating π(θ) or ˆπ(θ, u) is computationally expensive. We would like to reuse these values to create a more efficient MH algorithm that still targets the correct stationary distribution.
  • 3. Motivation Metropolis-Hastings (MH) algorithms create a realisation: θ(1), θ(2), . . . from a Markov chain with stationary density π(θ). For each θ(i) we evaluate π(θ(i)) - and then discard it. Pseudo-marginal MH algorithms create a realisation of ˆπ(θ(i), u(i)) - and then discard it. In many applications evaluating π(θ) or ˆπ(θ, u) is computationally expensive. We would like to reuse these values to create a more efficient MH algorithm that still targets the correct stationary distribution. We will focus on the [pseudo-marginal] random walk Metropolis (RWM) with a Gaussian proposal: θ |θ ∼ N(θ, λ2 V ).
  • 4. This talk Creating an approximation to π(θ ): k-NN. Using an approximation: Delayed acceptance [PsM]MH. Storing the values: KD-trees. Algorithm: adaptive da-[PsM]MH; ergodicity. Choice of P(fixed kernel). Simulation study.
  • 5. Creating an approximation At iteration n of the MH we have π(θ(1)), π(θ(2)), . . . , π(θ(n)), and we would like to create ˆπa(θ ) ≈ π(θ ).
  • 6. Creating an approximation At iteration n of the MH we have π(θ(1)), π(θ(2)), . . . , π(θ(n)), and we would like to create ˆπa(θ ) ≈ π(θ ). Gaussian process to log π?:
  • 7. Creating an approximation At iteration n of the MH we have π(θ(1)), π(θ(2)), . . . , π(θ(n)), and we would like to create ˆπa(θ ) ≈ π(θ ). Gaussian process to log π?: [problems with cost of fitting and evaluating as n ↑]
  • 8. Creating an approximation At iteration n of the MH we have π(θ(1)), π(θ(2)), . . . , π(θ(n)), and we would like to create ˆπa(θ ) ≈ π(θ ). Gaussian process to log π?: [problems with cost of fitting and evaluating as n ↑] Weighted average of k-nearest neighbour π values: (i) Fitting cost: 0 (actually O(n0)).
  • 9. Creating an approximation At iteration n of the MH we have π(θ(1)), π(θ(2)), . . . , π(θ(n)), and we would like to create ˆπa(θ ) ≈ π(θ ). Gaussian process to log π?: [problems with cost of fitting and evaluating as n ↑] Weighted average of k-nearest neighbour π values: (i) Fitting cost: 0 (actually O(n0)). (ii) Per-iteration cost: O(n).
  • 10. Creating an approximation At iteration n of the MH we have π(θ(1)), π(θ(2)), . . . , π(θ(n)), and we would like to create ˆπa(θ ) ≈ π(θ ). Gaussian process to log π?: [problems with cost of fitting and evaluating as n ↑] Weighted average of k-nearest neighbour π values: (i) Fitting cost: 0 (actually O(n0)). (ii) Per-iteration cost: O(n). (iii) Accuracy ↑ with n.
  • 11. Delayed acceptance MH (1) (Christen and Fox, 2005). Suppose we have a computationally-cheap approximation to the posterior: ˆπa(θ).
  • 12. Delayed acceptance MH (1) (Christen and Fox, 2005). Suppose we have a computationally-cheap approximation to the posterior: ˆπa(θ). Define αda(θ; θ ) := α1(θ; θ ) α2(θ; θ ), where α1 := 1 ∧ ˆπa(θ )q(θ|θ ) ˆπa(θ)q(θ |θ) and α2 := 1 ∧ π(θ )/ˆπa(θ ) π(θ)/ˆπa(θ) . Detailed balance (with respect π(θ)) is still preserved with αda because
  • 13. Delayed acceptance MH (1) (Christen and Fox, 2005). Suppose we have a computationally-cheap approximation to the posterior: ˆπa(θ). Define αda(θ; θ ) := α1(θ; θ ) α2(θ; θ ), where α1 := 1 ∧ ˆπa(θ )q(θ|θ ) ˆπa(θ)q(θ |θ) and α2 := 1 ∧ π(θ )/ˆπa(θ ) π(θ)/ˆπa(θ) . Detailed balance (with respect π(θ)) is still preserved with αda because π(θ) q(θ |θ) αda(θ; θ )
  • 14. Delayed acceptance MH (1) (Christen and Fox, 2005). Suppose we have a computationally-cheap approximation to the posterior: ˆπa(θ). Define αda(θ; θ ) := α1(θ; θ ) α2(θ; θ ), where α1 := 1 ∧ ˆπa(θ )q(θ|θ ) ˆπa(θ)q(θ |θ) and α2 := 1 ∧ π(θ )/ˆπa(θ ) π(θ)/ˆπa(θ) . Detailed balance (with respect π(θ)) is still preserved with αda because π(θ) q(θ |θ) αda(θ; θ ) = ˆπa(θ) q(θ |θ) α1 × π(θ) ˆπa(θ) α2.
  • 15. Delayed acceptance MH (1) (Christen and Fox, 2005). Suppose we have a computationally-cheap approximation to the posterior: ˆπa(θ). Define αda(θ; θ ) := α1(θ; θ ) α2(θ; θ ), where α1 := 1 ∧ ˆπa(θ )q(θ|θ ) ˆπa(θ)q(θ |θ) and α2 := 1 ∧ π(θ )/ˆπa(θ ) π(θ)/ˆπa(θ) . Detailed balance (with respect π(θ)) is still preserved with αda because π(θ) q(θ |θ) αda(θ; θ ) = ˆπa(θ) q(θ |θ) α1 × π(θ) ˆπa(θ) α2. But this algorithm mixes worse than the equivalent MH algorithm (Peskun, 1973; Tierney, 1998).
  • 16. Delayed-acceptance [PsM]MH (2) Using αda = α1(θ; θ )α2(θ; θ ) mixes worse but CPU time/iteration can be much reduced.
  • 17. Delayed-acceptance [PsM]MH (2) Using αda = α1(θ; θ )α2(θ; θ ) mixes worse but CPU time/iteration can be much reduced. Accept ⇔ accept at Stage 1 (w.p. α1) and accept at Stage 2 (w.p. α2).
  • 18. Delayed-acceptance [PsM]MH (2) Using αda = α1(θ; θ )α2(θ; θ ) mixes worse but CPU time/iteration can be much reduced. Accept ⇔ accept at Stage 1 (w.p. α1) and accept at Stage 2 (w.p. α2). α1 is quick to calculate.
  • 19. Delayed-acceptance [PsM]MH (2) Using αda = α1(θ; θ )α2(θ; θ ) mixes worse but CPU time/iteration can be much reduced. Accept ⇔ accept at Stage 1 (w.p. α1) and accept at Stage 2 (w.p. α2). α1 is quick to calculate. There is no need to calculate α2 if we reject at Stage One.
  • 20. Delayed-acceptance [PsM]MH (2) Using αda = α1(θ; θ )α2(θ; θ ) mixes worse but CPU time/iteration can be much reduced. Accept ⇔ accept at Stage 1 (w.p. α1) and accept at Stage 2 (w.p. α2). α1 is quick to calculate. There is no need to calculate α2 if we reject at Stage One. If ˆπa is accurate then α2 ≈ 1.
  • 21. Delayed-acceptance [PsM]MH (2) Using αda = α1(θ; θ )α2(θ; θ ) mixes worse but CPU time/iteration can be much reduced. Accept ⇔ accept at Stage 1 (w.p. α1) and accept at Stage 2 (w.p. α2). α1 is quick to calculate. There is no need to calculate α2 if we reject at Stage One. If ˆπa is accurate then α2 ≈ 1. If ˆπa is also cheap then (RWM) can use large jump proposals [EXPLAIN].
  • 22. Delayed-acceptance [PsM]MH (2) Using αda = α1(θ; θ )α2(θ; θ ) mixes worse but CPU time/iteration can be much reduced. Accept ⇔ accept at Stage 1 (w.p. α1) and accept at Stage 2 (w.p. α2). α1 is quick to calculate. There is no need to calculate α2 if we reject at Stage One. If ˆπa is accurate then α2 ≈ 1. If ˆπa is also cheap then (RWM) can use large jump proposals [EXPLAIN]. Delayed-acceptance PMMH: α2 := 1 ∧ ˆπ(θ , u )/ˆπa(θ ) ˆπ(θ, u)/ˆπa(θ) .
  • 23. Cheap and accurate approximation? We: use an inverse-distance-weighted average of the π values from the k nearest neighbours.
  • 24. Cheap and accurate approximation? We: use an inverse-distance-weighted average of the π values from the k nearest neighbours. But the cost is still O(n)/iter.
  • 25. k-nn and the binary tree Imagine a table with n values. θ (1) 1 θ (1) 2 · · · θ (1) d π(θ(1)) θ (2) 1 θ (2) 2 · · · θ (2) d π(θ(2)) · · · · · · · · · · · · · · · θ (n) 1 θ (n) 2 · · · θ (n) d π(θ(n)) Look-up of k nearest neighbours to some θ is O(n).
  • 26. k-nn and the binary tree Imagine a table with n values. θ (1) 1 θ (1) 2 · · · θ (1) d π(θ(1)) θ (2) 1 θ (2) 2 · · · θ (2) d π(θ(2)) · · · · · · · · · · · · · · · θ (n) 1 θ (n) 2 · · · θ (n) d π(θ(n)) Look-up of k nearest neighbours to some θ is O(n). If d = 1 then could sort list or create a standard binary tree (median) (q1) (q3) (oct1) (oct3) (oct5) (oct7) for O(log n) look up.
  • 27. k-nn and the binary tree Imagine a table with n values. θ (1) 1 θ (1) 2 · · · θ (1) d π(θ(1)) θ (2) 1 θ (2) 2 · · · θ (2) d π(θ(2)) · · · · · · · · · · · · · · · θ (n) 1 θ (n) 2 · · · θ (n) d π(θ(n)) Look-up of k nearest neighbours to some θ is O(n). If d = 1 then could sort list or create a standard binary tree (median) (q1) (q3) (oct1) (oct3) (oct5) (oct7) for O(log n) look up. For d > 1 a solution is the KD-tree.
  • 28. KD-tree (d=2) d−split 1 m1:=m{θ1} m{S} = branch splitting according to θd-split on median of S. [L] = leaf node with a maximum of 2b − 1 leaves. KD-tree useful if n/(3b/2) > 2d .
  • 29. KD-tree (d=2) d−split 1 2 m1:=m{θ1} m2−:=m{θ2:θ1<m1} m2+:=m{θ2:θ1>m1} m{S} = branch splitting according to θd-split on median of S. [L] = leaf node with a maximum of 2b − 1 leaves. KD-tree useful if n/(3b/2) > 2d .
  • 30. KD-tree (d=2) d−split 1 2 1 m1:=m{θ1} m2−:=m{θ2:θ1<m1} m2+:=m{θ2:θ1>m1} m1−−:=m{θ1:θ1<m1θ2<m2−} [L] m1+−:=m{θ1:θ1>m1θ2<m2−} [L] m{S} = branch splitting according to θd-split on median of S. [L] = leaf node with a maximum of 2b − 1 leaves. KD-tree useful if n/(3b/2) > 2d .
  • 31. KD-tree (d=2) d−split 1 2 1 2 m1:=m{θ1} m2−:=m{θ2:θ1<m1} m2+:=m{θ2:θ1>m1} m1−−:=m{θ1:θ1<m1θ2<m2−} [L] m1+−:=m{θ1:θ1>m1θ2<m2−} [L] [L] [L] [L] [L] m{S} = branch splitting according to θd-split on median of S. [L] = leaf node with a maximum of 2b − 1 leaves.
  • 32. KD-tree (d=2) d−split 1 2 1 2 m1:=m{θ1} m2−:=m{θ2:θ1<m1} m2+:=m{θ2:θ1>m1} m1−−:=m{θ1:θ1<m1θ2<m2−} [L] m1+−:=m{θ1:θ1>m1θ2<m2−} [L] [L] [L] [L] [L] m{S} = branch splitting according to θd-split on median of S. [L] = leaf node with a maximum of 2b − 1 leaves. Our KD-tree is useful if (roughly) n/(3b/2) > 2d .
  • 33. Adaptive k-nn using a KD-tree Initial, training run of n0 iterations. Build initial KD-tree.
  • 34. Adaptive k-nn using a KD-tree Initial, training run of n0 iterations. Build initial KD-tree. Main run: ‘every time’ π(θ ) is evaluated, add (θ , π(θ )) to the KD-tree.
  • 35. Adaptive k-nn using a KD-tree Initial, training run of n0 iterations. Build initial KD-tree. Main run: ‘every time’ π(θ ) is evaluated, add (θ , π(θ )) to the KD-tree. Set-up is O n0(log n0)2 ; updating is O(log n) evaluation is O(log n) and accuracy ↑ as the MCMC progresses.
  • 36. Adaptive k-nn using a KD-tree Initial, training run of n0 iterations. Build initial KD-tree. Main run: ‘every time’ π(θ ) is evaluated, add (θ , π(θ )) to the KD-tree. Set-up is O n0(log n0)2 ; updating is O(log n) evaluation is O(log n) and accuracy ↑ as the MCMC progresses. Provided the tree is balanced. [Skip, for lack of time]
  • 37. Refinements Training dataset ⇒ better distance metric. Transform θ to approximately isotropic before creating tree, or adding new node.
  • 38. Refinements Training dataset ⇒ better distance metric. Transform θ to approximately isotropic before creating tree, or adding new node. Minimum distance . If ∃ θ s.t. ||θ − θ|| < then (i) MH: ignore new value. (ii) PMMH: combine ˆπ(y|θ , u ) with ˆπ(y|θ, u) (running average).
  • 39. Adaptive Algorithm Components A fixed [pseudo-marginal] kernel P([θ, u]; [dθ , du ]). An adaptive [pseudo-marginal] DA kernel Pγ([θ, u]; [dθ , du ]).
  • 40. Adaptive Algorithm Components A fixed [pseudo-marginal] kernel P([θ, u]; [dθ , du ]). An adaptive [pseudo-marginal] DA kernel Pγ([θ, u]; [dθ , du ]). Both P and Pγ generate ˆπ(θ , u) in the same way. A fixed probability β ∈ (0, 1]. A set of probabilities: pn → 0. Algorithm At the start of iteration n, the chain is at [θ, u] and the DA kernel would be Pγ.
  • 41. Adaptive Algorithm Components A fixed [pseudo-marginal] kernel P([θ, u]; [dθ , du ]). An adaptive [pseudo-marginal] DA kernel Pγ([θ, u]; [dθ , du ]). Both P and Pγ generate ˆπ(θ , u) in the same way. A fixed probability β ∈ (0, 1]. A set of probabilities: pn → 0. Algorithm At the start of iteration n, the chain is at [θ, u] and the DA kernel would be Pγ. 1. Sample [θ , u ] from P w.p. β Pγ w.p. 1 − β.
  • 42. Adaptive Algorithm Components A fixed [pseudo-marginal] kernel P([θ, u]; [dθ , du ]). An adaptive [pseudo-marginal] DA kernel Pγ([θ, u]; [dθ , du ]). Both P and Pγ generate ˆπ(θ , u) in the same way. A fixed probability β ∈ (0, 1]. A set of probabilities: pn → 0. Algorithm At the start of iteration n, the chain is at [θ, u] and the DA kernel would be Pγ. 1. Sample [θ , u ] from P w.p. β Pγ w.p. 1 − β. 2. W.p. pn ‘choose a new γ’: update the kernel by including all relevant information since the kernel was last updated.
  • 43. Adaptive Algorithm Components A fixed [pseudo-marginal] kernel P([θ, u]; [dθ , du ]). An adaptive [pseudo-marginal] DA kernel Pγ([θ, u]; [dθ , du ]). Both P and Pγ generate ˆπ(θ , u) in the same way. A fixed probability β ∈ (0, 1]. A set of probabilities: pn → 0. Algorithm At the start of iteration n, the chain is at [θ, u] and the DA kernel would be Pγ. 1. Sample [θ , u ] from P w.p. β Pγ w.p. 1 − β. 2. W.p. pn ‘choose a new γ’: update the kernel by including all relevant information since the kernel was last updated. Set: pn = 1/(1 + cin), where in = # expensive evaluations so far.
  • 44. Ergodicity Assumptions on the fixed kernel. 1. Minorisation: there is a density ν(θ) and δ > 0 such that q(θ |θ)α(θ; θ ) > δν(θ ), where α is the acceptance rate from the idealised version of the fixed MH algorithm.
  • 45. Ergodicity Assumptions on the fixed kernel. 1. Minorisation: there is a density ν(θ) and δ > 0 such that q(θ |θ)α(θ; θ ) > δν(θ ), where α is the acceptance rate from the idealised version of the fixed MH algorithm. 2. Bounded weights: the support of W := ˆπ(θ, U)/π(θ) is uniformly (in θ) bounded above by some w < ∞.
  • 46. Ergodicity Assumptions on the fixed kernel. 1. Minorisation: there is a density ν(θ) and δ > 0 such that q(θ |θ)α(θ; θ ) > δν(θ ), where α is the acceptance rate from the idealised version of the fixed MH algorithm. 2. Bounded weights: the support of W := ˆπ(θ, U)/π(θ) is uniformly (in θ) bounded above by some w < ∞. Theorem Subject to Assumptions 1 and 2, the adaptive pseudo-marginal algorithm is ergodic.
  • 47. Ergodicity Assumptions on the fixed kernel. 1. Minorisation: there is a density ν(θ) and δ > 0 such that q(θ |θ)α(θ; θ ) > δν(θ ), where α is the acceptance rate from the idealised version of the fixed MH algorithm. 2. Bounded weights: the support of W := ˆπ(θ, U)/π(θ) is uniformly (in θ) bounded above by some w < ∞. Theorem Subject to Assumptions 1 and 2, the adaptive pseudo-marginal algorithm is ergodic. NB. For DAMH, as opposed to DAPMMH, only the minorisation assumption is required.
  • 48. Choice of β DARWM is more efficient when λ > ˆλRWM.
  • 49. Choice of β DARWM is more efficient when λ > ˆλRWM. The lower α1 does not matter.
  • 50. Choice of β DARWM is more efficient when λ > ˆλRWM. The lower α1 does not matter. But low α1 ⇒ fewer expensive evaluations of π [or of ˆπ].
  • 51. Choice of β DARWM is more efficient when λ > ˆλRWM. The lower α1 does not matter. But low α1 ⇒ fewer expensive evaluations of π [or of ˆπ]. P(expensive) = β + (1 − β)α1.
  • 52. Choice of β DARWM is more efficient when λ > ˆλRWM. The lower α1 does not matter. But low α1 ⇒ fewer expensive evaluations of π [or of ˆπ]. P(expensive) = β + (1 − β)α1. If α1 << 1 most of the expensive evaluations come from the fixed kernel and much of the benefit from the adaptive kernel will be lost.
  • 53. Choice of β DARWM is more efficient when λ > ˆλRWM. The lower α1 does not matter. But low α1 ⇒ fewer expensive evaluations of π [or of ˆπ]. P(expensive) = β + (1 − β)α1. If α1 << 1 most of the expensive evaluations come from the fixed kernel and much of the benefit from the adaptive kernel will be lost. Fixing β ∝ α1 (obtained from the training run) avoids this,
  • 54. Choice of β DARWM is more efficient when λ > ˆλRWM. The lower α1 does not matter. But low α1 ⇒ fewer expensive evaluations of π [or of ˆπ]. P(expensive) = β + (1 − β)α1. If α1 << 1 most of the expensive evaluations come from the fixed kernel and much of the benefit from the adaptive kernel will be lost. Fixing β ∝ α1 (obtained from the training run) avoids this, but the guaranteed worst-case TVD from π after n iterations gets larger.
  • 55. Choice of β DARWM is more efficient when λ > ˆλRWM. The lower α1 does not matter. But low α1 ⇒ fewer expensive evaluations of π [or of ˆπ]. P(expensive) = β + (1 − β)α1. If α1 << 1 most of the expensive evaluations come from the fixed kernel and much of the benefit from the adaptive kernel will be lost. Fixing β ∝ α1 (obtained from the training run) avoids this, but the guaranteed worst-case TVD from π after n iterations gets larger. Consider a fixed computational budget ≈ fixed number of expensive evaluations. This preserves the provable worst-case TVD from π.
  • 56. Examples Lotka-Volterra MJP daPMRWM with d = 5 LNA approximation to autoregulatary system daRWM with d = 10
  • 57. Examples Lotka-Volterra MJP daPMRWM with d = 5 LNA approximation to autoregulatary system daRWM with d = 10 RWM: θ ∼ N(θ, λ2 ˆΣ) where ˆΣ, obtained from training run (also gives pre-map).
  • 58. Examples Lotka-Volterra MJP daPMRWM with d = 5 LNA approximation to autoregulatary system daRWM with d = 10 RWM: θ ∼ N(θ, λ2 ˆΣ) where ˆΣ, obtained from training run (also gives pre-map). Scaling, λ [and number of particles m] chosen to be optimal for RWM.
  • 59. Examples Lotka-Volterra MJP daPMRWM with d = 5 LNA approximation to autoregulatary system daRWM with d = 10 RWM: θ ∼ N(θ, λ2 ˆΣ) where ˆΣ, obtained from training run (also gives pre-map). Scaling, λ [and number of particles m] chosen to be optimal for RWM. n0 = 10000 (from training run), b = 10, c = 0.001. Efficency = minj=1...d ESSj CPU time
  • 60. Results: LV RelESS= efficiency of DA[PM]RWM / efficiency of optimal RWM. 1.0 1.5 2.0 2.5 3.0 3.5 4.0 34567 xi RelESS 1.0 1.5 2.0 2.5 3.0 3.5 4.0 0.000.050.100.150.20 xi alp1 xi=1 corresponds to the DA using the scaling that is optimal for the standard RWM algorithm. i.e. DA scaling = xi׈λRWM.
  • 61. Results: Autoreg. Solid=shorter dataset; dashed=longer dataset. 0.5 1.0 1.5 2.0 2.5 3.0 02000400060008000 xi ESS 0.5 1.0 1.5 2.0 2.5 3.0012345 xi RelESS
  • 62. LV: further experiments c Tree Size ˆα1 ˆα2 Rel. mESS 0.0001 41078 0.00772 0.339 7.28 0.001 40256 0.00915 0.276 6.80 0.01 43248 0.0121 0.204 4.67 ∞ 10000 0.0175 0.136 3.46
  • 63. LV: further experiments c Tree Size ˆα1 ˆα2 Rel. mESS 0.0001 41078 0.00772 0.339 7.28 0.001 40256 0.00915 0.276 6.80 0.01 43248 0.0121 0.204 4.67 ∞ 10000 0.0175 0.136 3.46 Using a list rather than the KD-tree reduced the efficiency by a factor of ≈ 2.
  • 64. Summary Store the expensive evaluations of π or ˆπ in a KD-tree.
  • 65. Summary Store the expensive evaluations of π or ˆπ in a KD-tree. Use a delayed-acceptance [PsM]RWM algorithm.
  • 66. Summary Store the expensive evaluations of π or ˆπ in a KD-tree. Use a delayed-acceptance [PsM]RWM algorithm. Adaptive algorithm converges subject to conditions.
  • 67. Summary Store the expensive evaluations of π or ˆπ in a KD-tree. Use a delayed-acceptance [PsM]RWM algorithm. Adaptive algorithm converges subject to conditions. Need β ∝ α1 for adaptive portion to play a part.
  • 68. Summary Store the expensive evaluations of π or ˆπ in a KD-tree. Use a delayed-acceptance [PsM]RWM algorithm. Adaptive algorithm converges subject to conditions. Need β ∝ α1 for adaptive portion to play a part. Improvement in efficiency by factor of between 4-7
  • 69. Summary Store the expensive evaluations of π or ˆπ in a KD-tree. Use a delayed-acceptance [PsM]RWM algorithm. Adaptive algorithm converges subject to conditions. Need β ∝ α1 for adaptive portion to play a part. Improvement in efficiency by factor of between 4-7 Code easy-to-use, generic C code for the KD-tree is available.
  • 70. Summary Store the expensive evaluations of π or ˆπ in a KD-tree. Use a delayed-acceptance [PsM]RWM algorithm. Adaptive algorithm converges subject to conditions. Need β ∝ α1 for adaptive portion to play a part. Improvement in efficiency by factor of between 4-7 Code easy-to-use, generic C code for the KD-tree is available. Limitations: Need n 2d for KD-tree to give worthwhile speedup and for adequate coverage.
  • 71. Summary Store the expensive evaluations of π or ˆπ in a KD-tree. Use a delayed-acceptance [PsM]RWM algorithm. Adaptive algorithm converges subject to conditions. Need β ∝ α1 for adaptive portion to play a part. Improvement in efficiency by factor of between 4-7 Code easy-to-use, generic C code for the KD-tree is available. Limitations: Need n 2d for KD-tree to give worthwhile speedup and for adequate coverage. Other: could use k nearest neighbours to estimate gradient and curvature;
  • 72. Summary Store the expensive evaluations of π or ˆπ in a KD-tree. Use a delayed-acceptance [PsM]RWM algorithm. Adaptive algorithm converges subject to conditions. Need β ∝ α1 for adaptive portion to play a part. Improvement in efficiency by factor of between 4-7 Code easy-to-use, generic C code for the KD-tree is available. Limitations: Need n 2d for KD-tree to give worthwhile speedup and for adequate coverage. Other: could use k nearest neighbours to estimate gradient and curvature; even to fit a local GP?
  • 73. Summary Store the expensive evaluations of π or ˆπ in a KD-tree. Use a delayed-acceptance [PsM]RWM algorithm. Adaptive algorithm converges subject to conditions. Need β ∝ α1 for adaptive portion to play a part. Improvement in efficiency by factor of between 4-7 Code easy-to-use, generic C code for the KD-tree is available. Limitations: Need n 2d for KD-tree to give worthwhile speedup and for adequate coverage. Other: could use k nearest neighbours to estimate gradient and curvature; even to fit a local GP? Thank you!