SlideShare ist ein Scribd-Unternehmen logo
1 von 71
Downloaden Sie, um offline zu lesen
Monte Carlo methods for some
not-quite-but-almost Bayesian problems
Pierre E. Jacob
Department of Statistics, Harvard University
joint work with
Ruobin Gong, Paul T. Edlefsen, Arthur P. Dempster
John O’Leary, Yves F. Atchad´e, Niloy Biswas, Paul Vanetti
and others
Pierre E. Jacob Monte Carlo for not quite Bayes
Introduction
A lot of questions in statistics give rise to non-trivial
computational problems.
Among these, some are numerical integration problems, ⇔
about sampling from probability distributions.
Besag, Markov chain Monte Carlo for statistical inference, 2001.
Computational challenges arise in deviations from standard
Bayesian inference, motivated by three questions,
quantifying ignorance,
model misspecification,
robustness to some perturbation of the data.
Pierre E. Jacob Monte Carlo for not quite Bayes
Outline
1 Dempster–Shafer analysis of count data
2 Unbiased MCMC and diagnostics of convergence
3 Modular Bayesian inference
4 Bagging posterior distributions
Pierre E. Jacob Monte Carlo for not quite Bayes
Outline
1 Dempster–Shafer analysis of count data
2 Unbiased MCMC and diagnostics of convergence
3 Modular Bayesian inference
4 Bagging posterior distributions
Pierre E. Jacob Monte Carlo for not quite Bayes
Inference with count data
Notation : [N] := {1, . . . , N}. Simplex ∆.
Observations : xn ∈ [K] := {1, . . . , K}, x = (x1, . . . , xN ).
Index sets : Ik = {n ∈ [N] : xn = k}.
Counts : Nk = |Ik|.
Model: xn
iid
∼ Categorical(θ) with θ = (θk)k∈[K],
i.e. P(xn = k) = θk for all n, k.
Goal: estimate θ, predict, etc.
Maximum likelihood estimator: ˆθk = Nk/N.
Bayesian inference combines likelihood with prior on θ into a
posterior distribution, assigning a probability ∈ [0, 1] to any
measurable subset of the simplex ∆.
Pierre E. Jacob Monte Carlo for not quite Bayes
Sampling from a Categorical distribution
2 3
1
∆1(θ)
∆2(θ)∆3(θ)
θ
Subsimplex ∆k(θ), for θ ∈ ∆:
{z ∈ ∆ : ∀ ∈ [K] z /zk ≥ θ /θk}.
Sampling mechanism, for θ ∈ ∆:
- draw un uniform on ∆,
- define xn such that un ∈ ∆xn (θ),
denoted also xn = m(un, θ).
Then P(xn = k) = θk,
because Vol(∆k(θ)) = θk.
Pierre E. Jacob Monte Carlo for not quite Bayes
Arthur Dempster’s approach to inference
Observations x = (xn)n∈[N] are fixed.
If we draw u1, . . . , un ∼ ∆, there might exist θ ∈ ∆ such that
∀n ∈ [N] xn = m(un, θ),
or such a θ might not exist.
Arthur P. Dempster. New methods for reasoning towards posterior
distributions based on sample data. Annals of Mathematical Statistics, 1966.
Arthur P. Dempster. Statistical inference from a Dempster—Shafer
perspective. Past, Present, and Future of Statistical Science, 2014.
Pierre E. Jacob Monte Carlo for not quite Bayes
Draws in the simplex
Counts: (2, 3, 1). Let’s draw N = 6 uniform samples on ∆.
2 3
1
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Draws in the simplex
Each un is associated to an observed xn ∈ {11, 22, 33}.
2 3
1
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Draws in the simplex
If there exists a feasible θ, it cannot be just anywhere.
2 3
1
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Draws in the simplex
The uns of each category add constraints on θ.
2 3
1
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Draws in the simplex
Overall the constraints define a polytope for θ, or an empty set.
2 3
1
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Draws in the simplex
Here, there is a polytope of θ such that ∀n ∈ [N] xn = m(un, θ).
2 3
1
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Draws in the simplex
Any θ in the polytope separates the uns appropriately.
2 3
1
qqq
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Draws in the simplex
Let’s try again with fresh uniform samples on ∆.
2 3
1
q q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Draws in the simplex
Here there is no θ ∈ ∆ such that ∀n ∈ [N] xn = m(un, θ).
2 3
1
q q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Lower and upper probabilities
Consider the set
Rx = (u1, . . . , uN ) ∈ ∆N
: ∃θ ∈ ∆ ∀n ∈ [N] xn = m(un, θ) .
and denote by νx the uniform distribution on Rx.
For u ∈ Rx, there is a set F(u) = {θ ∈ ∆ : ∀n xn = m(un, θ)}.
For a set Σ ⊂ ∆ of interest, define
(lower probability) P(Σ) = 1(F(u) ⊂ Σ)νx(du),
(upper probability) ¯P(Σ) = 1(F(u) ∩ Σ = ∅)νx(du).
Pierre E. Jacob Monte Carlo for not quite Bayes
Summary and Monte Carlo problem
Arthur Dempster’s approach, later called Dempster–Shafer
theory of belief functions, is based on a distribution of
feasible sets,
F(u) = {θ ∈ ∆ ∀n ∈ [N] xn = m(un, θ)},
where u ∼ νx, the uniform distribution on Rx.
How do we obtain samples from this distribution?
Rejection rate 99%, for data (2, 3, 1).
Hit-and-run algorithm?
Our proposed strategy is a Gibbs sampler. Starting from
some u ∈ Rx, we will iteratively refresh some components
un of u given others.
Pierre E. Jacob Monte Carlo for not quite Bayes
Gibbs sampler: initialization
We can obtain some u in Rx as follows.
Choose an arbitrary θ ∈ ∆.
For all n ∈ [N] sample un uniformly in ∆k(θ) where xn = k.
2 3
1
∆1(θ)
∆2(θ)∆3(θ)
θ
q
q
q
q
q
q
To sample components un given
others, we will express Rx,
{u : ∃θ ∀n xn = m(un, θ)}
in terms of relations that the
components un must satisfy with
respect to one another.
Pierre E. Jacob Monte Carlo for not quite Bayes
Equivalent representation
For any θ ∈ ∆,
∀n ∈ [N] xn = m(un, θ)
⇔ ∀k ∈ [K] ∀n ∈ Ik un ∈ ∆k(θ)
⇔ ∀k ∈ [K] ∀n ∈ Ik ∀ ∈ [K]
un,
un,k
≥
θ
θk
because ∆k(θ) = {z ∈ ∆ : ∀ ∈ [K] z /zk ≥ θ /θk}.
This is equivalent to
∀k ∈ [K] ∀ ∈ [K] min
n∈Ik
un,
un,k
≥
θ
θk
.
Therefore, denoting ηk→ = minn∈Ik
un, /un,k, we can write
Rx = u ∈ ∆N
: ∃θ ∈ ∆ ∀k, ∈ [K] θ /θk ≤ ηk→ .
Pierre E. Jacob Monte Carlo for not quite Bayes
Linear constraints
Counts: (9, 8, 3), u in Rx.
Values ηk→ = minn∈Ik
un, /un,k define linear constraints on θ.
2 3
1
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
θ3 θ1 = η1→3
θ2 θ1 = η1→2
Pierre E. Jacob Monte Carlo for not quite Bayes
Some inequalities
Next, assume u ∈ Rx, write ηk→ = minn∈Ik
un, /un,k, and
consider some implications.
There exists θ ∈ ∆ such that θ /θk ≤ ηk→ for all k, ∈ [K].
Then, for all k,
θ
θk
≤ ηk→ , and
θk
θ
≤ η →k, thus ηk→ η →k ≥ 1.
Pierre E. Jacob Monte Carlo for not quite Bayes
More inequalities
We can continue, if K ≥ 3: for all k, , j,
η−1
→k ≤
θ
θk
=
θ
θj
θj
θk
≤ ηj→ ηk→j,
thus ηk→jηj→ η →k ≥ 1.
And if K ≥ 4, for all k, , j, m
ηk→jηj→ η →mηm→k ≥ 1.
Generally,
∀L ∈ [K] ∀j1, . . . , jL ∈ [K] ηj1→j2 ηj2→j3 . . . ηjL→j1 ≥ 1.
Pierre E. Jacob Monte Carlo for not quite Bayes
Main result
So far, if ∃θ ∈ ∆ such that θ /θk ≤ ηk→ for k, ∈ [K] then
∀L ∈ [K] ∀j1, . . . , jL ∈ [K] ηj1→j2 ηj2→j3 . . . ηjL→j1 ≥ 1.
The reverse implication holds too.
This would mean
Rx = {u : ∃θ ∀k, ∈ [K] θ /θk ≤ ηk→ }
= {u : ∀L ∈ [K] ∀j1, . . . , jL ∈ [K] ηj1→j2 ηj2→j3 . . . ηjL→j1 ≥ 1}.
i.e. Rx is represented by relations between components (un).
This helps computing conditional distributions under νx,
leading to a Gibbs sampler.
Pierre E. Jacob Monte Carlo for not quite Bayes
Some remarks on these inequalities
∀L ∈ [K] ∀j1, . . . , jL ∈ [K] ηj1→j2 ηj2→j3 . . . ηjL→j1 ≥ 1.
We can consider only unique indices in j1, . . . , jL,
since the other cases can be deduced from those.
Example: η1→2η2→4η4→3η3→2η2→1 ≥ 1,
follows from η1→2η2→1 ≥ 1 and η2→4η4→3η3→2 ≥ 1.
The indices j1 → j2 → · · · → jL → j1 form a cycle.
Pierre E. Jacob Monte Carlo for not quite Bayes
Graphs
Fully connected graph with weight log ηk→ on edge (k, ).
1
2
3
log(η1→2)
log(η2→1)
Value of a path = sum of the weights along the path.
Negative cycle = path from vertex to itself with negative value
Pierre E. Jacob Monte Carlo for not quite Bayes
Graphs
∀L ∀j1, . . . , jL ηj1→j2 . . . ηjL→j1 ≥ 1
⇔ ∀L ∀j1, . . . , jL log(ηj1→j2 ) + . . . + log(ηjL→j1 ) ≥ 0
⇔ there are no negative cycles in the graph.
1
2
3
log(η1→2)
log(η2→1)
Pierre E. Jacob Monte Carlo for not quite Bayes
Summary (wake up)
We want to sample uniformly on the set Rx,
Rx = {u : ∃θ ∀k, ∈ [K] θ /θk ≤ ηk→ }.
We have claimed that this set can also be written
{u : ∀L ∈ [K] ∀j1, . . . , jL ∈ [K] ηj1→j2 ηj2→j3 . . . ηjL→j1 ≥ 1}.
The inequalities hold if and only if,
there are no negative cycles
in a fully connected graph with K vertices
and weight log ηk→ on edge (k, ), for all k, ∈ [K].
Pierre E. Jacob Monte Carlo for not quite Bayes
Proof
Proof of claim: “inequalities” ⇒ “∃θ : θ /θk ≤ ηk→ ∀k, ”.
min(k → ) := minimum value of path from k to in the graph.
Finite ∀k, because of absence of negative cycles in the graph.
Define θ via θk ∝ exp(min(K → k)).
Then θ ∈ ∆. Furthermore, for all k,
min(K → ) ≤ min(K → k) + log(ηk→ ),
therefore θ /θk ≤ ηk→ .
Pierre E. Jacob Monte Carlo for not quite Bayes
Conditional distributions
We can obtain conditional distributions of un for n ∈ Ik given
(un)n/∈Ik
with respect to νx:
un given (un)n/∈Ik
are i.i.d. uniform in ∆k(θ ),
where θ ∝ exp(− min( → k)) for all ,
with min( → k) := minimum value of path from to k.
Shortest paths can be computed in polynomial time.
Pierre E. Jacob Monte Carlo for not quite Bayes
Conditional distributions
Counts: (9, 8, 3). What is the conditional distribution of
(un)n∈Ik
given (un)n/∈Ik
under νx?
2 3
1
q
q
q
qq q
q
q
q
q
q
q q
q
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Conditional distributions
Counts: (9, 8, 3). What is the conditional distribution of
(un)n∈Ik
given (un)n/∈Ik
under νx?
2 3
1
q
q
q q
q
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Conditional distributions
Counts: (9, 8, 3). What is the conditional distribution of
(un)n∈Ik
given (un)n/∈Ik
under νx?
2 3
1
q
q
q q
q
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Conditional distributions
Counts: (9, 8, 3). What is the conditional distribution of
(un)n∈Ik
given (un)n/∈Ik
under νx?
2 3
1
q
q
q q
q
q
q
q
q
q
q
Pierre E. Jacob Monte Carlo for not quite Bayes
Gibbs sampler
Initial u(0) ∈ Rx.
At each iteration t ≥ 1, for each category k ∈ [K],
1 compute θ such that, for n ∈ Ik,
un given other components is uniform on ∆k(θ ).
2 Draw u
(t)
n ∼ ∆k(θ ) for n ∈ Ik.
3 Update η
(t)
k→ for ∈ [K].
In step 1, θ is obtained by computing shortest path in graph
with weights η
(t)
k→ on edge (k, ).
Computed e.g. with Bellman–Ford algorithm, implemented in
Cs´ardi & Nepusz, igraph package, 2006.
Alternatively, we can compute θ by solving a linear program,
Berkelaar, Eikland & Notebaert, lpsolve package, 2004
Pierre E. Jacob Monte Carlo for not quite Bayes
Gibbs sampler
Counts: (9, 8, 3), 100 polytopes generated by the sampler.
2 3
1
Pierre E. Jacob Monte Carlo for not quite Bayes
Cost per iteration
Cost in seconds for 100 full sweeps.
0.0
0.3
0.6
0.9
4 8 12 16
K
elapsed
N 256 512 1024 2048
https://github.com/pierrejacob/dempsterpolytope
Pierre E. Jacob Monte Carlo for not quite Bayes
Cost per iteration
Cost in seconds for 100 full sweeps.
0.0
0.3
0.6
0.9
256 512 1024 2048
N
elapsed
K 4 8 12 16
https://github.com/pierrejacob/dempsterpolytope
Pierre E. Jacob Monte Carlo for not quite Bayes
How many iterations for convergence?
Let ν(t) by the distribution of u(t) after t iterations.
TV(ν(t), νx) = supA |ν(t)(A) − νx(A)|.
0.00
0.25
0.50
0.75
1.00
0 25 50 75 100
iteration
TVupperbounds
K 5 10 20
Pierre E. Jacob Monte Carlo for not quite Bayes
How many iterations for convergence?
Let ν(t) by the distribution of u(t) after t iterations.
TV(ν(t), νx) = supA |ν(t)(A) − νx(A)|.
0.00
0.25
0.50
0.75
1.00
0 50 100 150 200
iteration
TVupperbounds
N 50 100 150 200
Pierre E. Jacob Monte Carlo for not quite Bayes
Summary
A Gibbs sampler can be used to approximate lower and upper
probabilities in the Dempster–Shafer framework.
Is perfect sampling possible here?
Extensions for hierarchical counts, hidden Markov models?
Jacob, Gong, Edlefsen & Dempster, A Gibbs sampler for a class of
random convex polytopes. On arXiv and researchers.one.
https://github.com/pierrejacob/dempsterpolytope
Pierre E. Jacob Monte Carlo for not quite Bayes
Outline
1 Dempster–Shafer analysis of count data
2 Unbiased MCMC and diagnostics of convergence
3 Modular Bayesian inference
4 Bagging posterior distributions
Pierre E. Jacob Monte Carlo for not quite Bayes
Coupled chains
Glynn & Rhee, Exact estimation for MC equilibrium expectations, 2014.
Generate two chains (Xt) and (Yt), going to π, as follows:
sample X0 and Y0 from π0 (independently, or not),
sample Xt|Xt−1 ∼ P(Xt−1, ·) for t = 1, . . . , L,
for t ≥ L + 1, sample
(Xt, Yt−L)|(Xt−1, Yt−L−1) ∼ ¯P ((Xt−1, Yt−L−1), ·).
¯P must be such that
Xt+1|Xt ∼ P(Xt, ·) and Yt|Yt−1 ∼ P(Yt−1, ·)
(thus Xt and Yt have the same distribution for all t ≥ 0),
there exists a random time τ such that Xt = Yt−L for t ≥ τ
(the chains meet and remain “faithful”).
Pierre E. Jacob Monte Carlo for not quite Bayes
Coupled chains
0
4
8
0 50 100 150 200
iteration
x
π = N(0, 1), RWMH with Normal proposal std = 0.5, π0 = N(10, 32
)
Pierre E. Jacob Monte Carlo for not quite Bayes
Unbiased estimators
Under some conditions, the estimator
1
m − k + 1
m
t=k
h(Xt)
+
1
m − k + 1
τ−1
t=k+L
min m − k + 1,
t − k
L
(h(Xt) − h(Yt−L)),
has expectation h(x)π(dx), finite cost and finite variance.
“MCMC estimator + bias correction terms”
Its efficiency can be close to that of MCMC estimators,
if k, m chosen appropriately (and L also).
Jacob, O’Leary & Atchad´e, Unbiased MCMC with couplings, 2019.
Pierre E. Jacob Monte Carlo for not quite Bayes
Finite-time bias of MCMC
Total variation distance between Xt ∼ πt and π = limt→∞ πt:
πt − π TV ≤ E[max(0, (τ − L − t)/L )].
0.000
0.005
0.010
0.015
0 50 100 150 200
τ − lag
lag = 1
1e−04
1e−03
1e−02
1e−01
1e+00
1e+01
1e+02
0 50 100 150 200
iteration
TVupperbounds
Pierre E. Jacob Monte Carlo for not quite Bayes
Finite-time bias of MCMC
Total variation distance between Xt ∼ πt and π = limt→∞ πt:
πt − π TV ≤ E[max(0, (τ − L − t)/L )].
0.000
0.005
0.010
0.015
0 50 100 150
τ − lag
lag = 50
1e−04
1e−03
1e−02
1e−01
1e+00
1e+01
1e+02
0 50 100 150 200
iteration
TVupperbounds
Pierre E. Jacob Monte Carlo for not quite Bayes
Finite-time bias of MCMC
Total variation distance between Xt ∼ πt and π = limt→∞ πt:
πt − π TV ≤ E[max(0, (τ − L − t)/L )].
0.000
0.005
0.010
0.015
0 50 100 150
τ − lag
lag = 100
1e−04
1e−03
1e−02
1e−01
1e+00
1e+01
1e+02
0 50 100 150 200
iteration
TVupperbounds
Pierre E. Jacob Monte Carlo for not quite Bayes
Finite-time bias of MCMC
Upper bounds can also be obtained for e.g. 1-Wasserstein.
And perhaps lower bounds?
Applicable in e.g. high-dimensional and/or discrete spaces.
Biswas, Jacob & Vanetti, Estimating Convergence of Markov chains
with L-Lag Couplings, 2019.
Pierre E. Jacob Monte Carlo for not quite Bayes
Finite-time bias of MCMC
Example: Gibbs sampler for Dempster’s analysis of counts.
0.00
0.25
0.50
0.75
1.00
0 50 100 150 200
iteration
TVupperbounds
N 50 100 150 200
This quantifies bias of MCMC estimators, not variance.
Pierre E. Jacob Monte Carlo for not quite Bayes
Outline
1 Dempster–Shafer analysis of count data
2 Unbiased MCMC and diagnostics of convergence
3 Modular Bayesian inference
4 Bagging posterior distributions
Pierre E. Jacob Monte Carlo for not quite Bayes
Models made of modules
First module:
parameter θ1, data Y1
prior: p1(θ1)
likelihood: p1(Y1|θ1)
Second module:
parameter θ2, data Y2
prior: p2(θ2|θ1)
likelihood: p2 (Y2|θ1, θ2)
We are interested in the estimation of θ1, θ2 or both.
Pierre E. Jacob Monte Carlo for not quite Bayes
Joint model approach
Parameter (θ1, θ2), with prior
p(θ1, θ2) = p1(θ1)p2(θ2|θ1).
Data (Y1, Y2), likelihood
p(Y1, Y2|θ1, θ2) = p1(Y1|θ1)p2(Y2|θ1, θ2).
Posterior distribution
π (θ1, θ2|Y1, Y2) ∝ p1 (θ1) p1(Y1|θ1)p2 (θ2|θ1) p2 (Y2|θ1, θ2).
Pierre E. Jacob Monte Carlo for not quite Bayes
Joint model approach
In the joint model approach, all data are used to
simultaneously infer all parameters. . .
. . . so that uncertainty about θ1 is propagated to the
estimation of θ2. . .
. . . but misspecification of the 2nd module can damage the
estimation of θ1.
What about allowing uncertainty propagation, but
preventing feedback of some modules on others?
Pierre E. Jacob Monte Carlo for not quite Bayes
Cut distribution
One might want to propagate uncertainty without allowing
“feedback” of second module on first module.
Cut distribution:
πcut
(θ1, θ2; Y1, Y2) = p1(θ1|Y1)p2 (θ2|θ1, Y2).
Different from the posterior distribution under joint model,
under which the first marginal is π(θ1|Y1, Y2).
Pierre E. Jacob Monte Carlo for not quite Bayes
Example: epidemiological study
Model of virus prevalence
∀i = 1, . . . , I Zi ∼ Binomial(Ni, ϕi),
Zi is number of women infected with high-risk HPV in a
sample of size Ni in country i.
Beta(1,1) prior on each ϕi, independently.
Impact of prevalence onto cervical cancer occurrence
∀i = 1, . . . , I Yi ∼ Poisson(λiTi), log(λi) = θ2,1 + θ2,2ϕi,
Yi is number of cancer cases arising from Ti woman-years of
follow-up in country i.
N(0, 103) on θ2,1, θ2,2, independently.
Plummer, Cuts in Bayesian graphical models, 2014.
Pierre E. Jacob Monte Carlo for not quite Bayes
Monte Carlo with joint model approach
Joint model posterior has density
π (θ1, θ2|Y1, Y2) ∝ p1 (θ1) p1 (Y1|θ1)p2 (θ2|θ1) p2 (Y2|θ1, θ2).
The computational complexity typically grows
super-linearly with the number of modules.
Difficulties stack up. . .
intractability, multimodality, ridges, etc.
Pierre E. Jacob Monte Carlo for not quite Bayes
Monte Carlo with cut distribution
The cut distribution is defined as
πcut
(θ1, θ2; Y1, Y2) = p1(θ1|Y1)p2 (θ2|θ1, Y2) ∝
π (θ1, θ2|Y1, Y2)
p2 (Y2|θ1)
.
The denominator is the feedback of the 2nd module on θ1:
p2 (Y2|θ1) = p2(Y2|θ1, θ2)p2(dθ2|θ1).
The feedback term is typically intractable.
Pierre E. Jacob Monte Carlo for not quite Bayes
Monte Carlo with cut distribution
WinBUGS’ approach via the cut function: alternate between
sampling θ1 from K1(θ1 → dθ1), targeting p1(dθ1|Y1);
sampling θ2 from K2
θ1
(θ2 → dθ2), targeting p2(dθ2|θ1, Y2).
This does not leave the cut distribution invariant!
Iterating the kernel K2
θ1
enough times mitigates the issue.
Plummer, Cuts in Bayesian graphical models, 2014.
Pierre E. Jacob Monte Carlo for not quite Bayes
Monte Carlo with cut distribution
In a perfect world, we could sample i.i.d.
θi
1 from p1(θ1|Y1),
θi
2 given θi
1 from p2(θ2|θi
1, Y2),
then (θi
1, θi
2) would be i.i.d. from the cut distribution.
Pierre E. Jacob Monte Carlo for not quite Bayes
Monte Carlo with cut distribution
In an MCMC world, we can sample
θi
1 approximately from p1(θ1|Y1) using MCMC,
θi
2 given θi
1 approximately from p2(θ2|θi
1, Y2) using MCMC,
then resulting samples approximate the cut distribution,
in the limit of the numbers of iterations, at both stages.
Pierre E. Jacob Monte Carlo for not quite Bayes
Monte Carlo with cut distribution
In an unbiased MCMC world, we can approximate expectations
h(x)π(dx) without bias, in finite compute time.
We can obtain an unbiased approximation of p1(θ1|Y1), and for
each θ1, an unbiased approximation of p2(θ2|θ1, Y2).
Thus, by the tower property, we can unbiasedly estimate
h(θ1, θ2)p2(dθ2|θ1, Y1)p1(dθ1|Y1).
Jacob, O’Leary & Atchad´e, Unbiased MCMC with couplings, 2019.
Pierre E. Jacob Monte Carlo for not quite Bayes
Example: epidemiological study
0
1
2
3
−2.5 −2.0 −1.5
θ2,1
density
0.00
0.05
0.10
0.15
10 15 20 25
θ2,2
densityApproximation of the marginals of the cut distribution of
(θ2,1, θ2,2), the parameters of the Poisson regression module in
the epidemiological model of Plummer (2014).
Jacob, Holmes, Murray, Robert & Nicholson, Better together?
Statistical learning in models made of modules.
Pierre E. Jacob Monte Carlo for not quite Bayes
Outline
1 Dempster–Shafer analysis of count data
2 Unbiased MCMC and diagnostics of convergence
3 Modular Bayesian inference
4 Bagging posterior distributions
Pierre E. Jacob Monte Carlo for not quite Bayes
Bagging posterior distributions
We can stabilize the posterior distribution by using a
bootstrap and aggregation scheme, in the spirit of bag-
ging (Breiman, 1996b). In a nutshell, denote by D a
bootstrap or subsample of the data D. The posterior of
the random parameters θ given the data D has c.d.f.
F(·|D), and we can stabilize this using
FBayesBag(·|D) = E [F(·|D )],
where E is with respect to the bootstrap- or subsam-
pling scheme. We call it the BayesBag estimator. It
can be approximated by averaging over B posterior com-
putations for bootstrap- or subsamples, which might be
a rather demanding task (although say B=10 would al-
ready stabilize to a certain extent).
B¨uhlmann, Discussion of Big Bayes Stories and BayesBag, 2014.
Pierre E. Jacob Monte Carlo for not quite Bayes
Bagging posterior distributions
For b = 1, . . . , B
Sample data set D(b) by bootstrapping from D.
Obtain MCMC approximation ˆπ(b) of posterior given D(b).
Finally obtain B−1 B
b=1 ˆπ(b).
Converges to “BayesBag” distribution as both B and number of
MCMC samples go to infinity.
If we can obtain unbiased approximation of posterior given any
D, the resulting approximation of “BayesBag” would be
consistent as B → ∞ only.
Exactly the same reasoning as for the cut distribution.
Example at https://statisfaction.wordpress.com/2019/
10/02/bayesbag-and-how-to-approximate-it/
Pierre E. Jacob Monte Carlo for not quite Bayes
Discussion
Some existing alternatives to standard Bayesian inference
are well motivated, but raise computational questions.
There are on-going efforts toward scalable Monte Carlo
methods, e.g. using coupled Markov chains or regeneration
techniques, in addition to sustained search for new MCMC
algorithms.
Quantification of variance is commonly done, quantification
of bias is also possible.
What makes a computational method convenient? It does
not seem to be entirely about asymptotic efficiency when
method is optimally tuned.
Thank you for listening!
Funding provided by the National Science Foundation,
grants DMS-1712872 and DMS-1844695.
Pierre E. Jacob Monte Carlo for not quite Bayes
References
Practical couplings in the literature. . .
Propp & Wilson, Exact sampling with coupled Markov chains
and applications to statistical mechanics, Random Structures &
Algorithms, 1996.
Johnson, Studying convergence of Markov chain Monte Carlo
algorithms using coupled sample paths, JASA, 1996.
Neal, Circularly-coupled Markov chain sampling, UoT tech
report, 1999.
Glynn & Rhee, Exact estimation for Markov chain equilibrium
expectations, Journal of Applied Probability, 2014.
Agapiou, Roberts & Vollmer, Unbiased Monte Carlo: posterior
estimation for intractable/infinite-dimensional models, Bernoulli,
2018.
Pierre E. Jacob Monte Carlo for not quite Bayes
References
Finite-time bias of MCMC. . .
Brooks & Roberts, Assessing convergence of Markov chain
Monte Carlo algorithms, STCO, 1998.
Cowles & Rosenthal, A simulation approach to convergence rates
for Markov chain Monte Carlo algorithms, STCO, 1998.
Johnson, Studying convergence of Markov chain Monte Carlo
algorithms using coupled sample paths, JASA, 1996.
Gorham, Duncan, Vollmer & Mackey, Measuring Sample Quality
with Diffusions, AAP, 2019.
Pierre E. Jacob Monte Carlo for not quite Bayes
References
Own work. . .
with John O’Leary, Yves F. Atchad´e
Unbiased Markov chain Monte Carlo with couplings, 2019.
with Fredrik Lindsten, Thomas Sch¨on
Smoothing with Couplings of Conditional Particle Filters, 2019.
with Jeremy Heng
Unbiased Hamiltonian Monte Carlo with couplings, 2019.
with Lawrence Middleton, George Deligiannidis, Arnaud
Doucet
Unbiased Markov chain Monte Carlo for intractable target
distributions, 2019.
Unbiased Smoothing using Particle Independent
Metropolis-Hastings, 2019.
Pierre E. Jacob Monte Carlo for not quite Bayes
References
with Maxime Rischard, Natesh Pillai
Unbiased estimation of log normalizing constants with
applications to Bayesian cross-validation.
with Niloy Biswas, Paul Vanetti
Estimating Convergence of Markov chains with L-Lag Couplings,
2019.
with Chris Holmes, Lawrence Murray, Christian Robert,
George Nicholson
Better together? Statistical learning in models made of modules.
Pierre E. Jacob Monte Carlo for not quite Bayes

Weitere ähnliche Inhalte

Was ist angesagt?

Generating Chebychev Chaotic Sequence
Generating Chebychev Chaotic SequenceGenerating Chebychev Chaotic Sequence
Generating Chebychev Chaotic SequenceCheng-An Yang
 
Non-informative reparametrisation for location-scale mixtures
Non-informative reparametrisation for location-scale mixturesNon-informative reparametrisation for location-scale mixtures
Non-informative reparametrisation for location-scale mixturesChristian Robert
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsCaleb (Shiqiang) Jin
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsStefano Cabras
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
 
Injective hulls of simple modules over Noetherian rings
Injective hulls of simple modules over Noetherian ringsInjective hulls of simple modules over Noetherian rings
Injective hulls of simple modules over Noetherian ringsMatematica Portuguesa
 
Non-sampling functional approximation of linear and non-linear Bayesian Update
Non-sampling functional approximation of linear and non-linear Bayesian UpdateNon-sampling functional approximation of linear and non-linear Bayesian Update
Non-sampling functional approximation of linear and non-linear Bayesian UpdateAlexander Litvinenko
 
Recent developments on unbiased MCMC
Recent developments on unbiased MCMCRecent developments on unbiased MCMC
Recent developments on unbiased MCMCPierre Jacob
 
How to find a cheap surrogate to approximate Bayesian Update Formula and to a...
How to find a cheap surrogate to approximate Bayesian Update Formula and to a...How to find a cheap surrogate to approximate Bayesian Update Formula and to a...
How to find a cheap surrogate to approximate Bayesian Update Formula and to a...Alexander Litvinenko
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distancesChristian Robert
 
Machine learning (12)
Machine learning (12)Machine learning (12)
Machine learning (12)NYversity
 
Rank awarealgs small11
Rank awarealgs small11Rank awarealgs small11
Rank awarealgs small11Jules Esp
 

Was ist angesagt? (19)

The Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal FunctionThe Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal Function
 
Mathematical Statistics Homework Help
Mathematical Statistics Homework HelpMathematical Statistics Homework Help
Mathematical Statistics Homework Help
 
talk MCMC & SMC 2004
talk MCMC & SMC 2004talk MCMC & SMC 2004
talk MCMC & SMC 2004
 
Generating Chebychev Chaotic Sequence
Generating Chebychev Chaotic SequenceGenerating Chebychev Chaotic Sequence
Generating Chebychev Chaotic Sequence
 
Non-informative reparametrisation for location-scale mixtures
Non-informative reparametrisation for location-scale mixturesNon-informative reparametrisation for location-scale mixtures
Non-informative reparametrisation for location-scale mixtures
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
Imc2016 day2-solutions
Imc2016 day2-solutionsImc2016 day2-solutions
Imc2016 day2-solutions
 
Chemistry Assignment Help
Chemistry Assignment Help Chemistry Assignment Help
Chemistry Assignment Help
 
Injective hulls of simple modules over Noetherian rings
Injective hulls of simple modules over Noetherian ringsInjective hulls of simple modules over Noetherian rings
Injective hulls of simple modules over Noetherian rings
 
compressed-sensing
compressed-sensingcompressed-sensing
compressed-sensing
 
Non-sampling functional approximation of linear and non-linear Bayesian Update
Non-sampling functional approximation of linear and non-linear Bayesian UpdateNon-sampling functional approximation of linear and non-linear Bayesian Update
Non-sampling functional approximation of linear and non-linear Bayesian Update
 
Recent developments on unbiased MCMC
Recent developments on unbiased MCMCRecent developments on unbiased MCMC
Recent developments on unbiased MCMC
 
How to find a cheap surrogate to approximate Bayesian Update Formula and to a...
How to find a cheap surrogate to approximate Bayesian Update Formula and to a...How to find a cheap surrogate to approximate Bayesian Update Formula and to a...
How to find a cheap surrogate to approximate Bayesian Update Formula and to a...
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 
Machine learning (12)
Machine learning (12)Machine learning (12)
Machine learning (12)
 
Rank awarealgs small11
Rank awarealgs small11Rank awarealgs small11
Rank awarealgs small11
 
Lecture6.handout
Lecture6.handoutLecture6.handout
Lecture6.handout
 

Ähnlich wie Monte Carlo Methods for Not Quite Bayesian Inference

Monte Carlo methods for some not-quite-but-almost Bayesian problems
Monte Carlo methods for some not-quite-but-almost Bayesian problemsMonte Carlo methods for some not-quite-but-almost Bayesian problems
Monte Carlo methods for some not-quite-but-almost Bayesian problemsPierre Jacob
 
Unbiased MCMC with couplings
Unbiased MCMC with couplingsUnbiased MCMC with couplings
Unbiased MCMC with couplingsPierre Jacob
 
Unbiased Markov chain Monte Carlo methods
Unbiased Markov chain Monte Carlo methods Unbiased Markov chain Monte Carlo methods
Unbiased Markov chain Monte Carlo methods Pierre Jacob
 
Markov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themMarkov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themPierre Jacob
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Pierre Jacob
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Pierre Jacob
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixturesChristian Robert
 
ABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsUmberto Picchini
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBOYoonho Lee
 
P, NP and NP-Complete, Theory of NP-Completeness V2
P, NP and NP-Complete, Theory of NP-Completeness V2P, NP and NP-Complete, Theory of NP-Completeness V2
P, NP and NP-Complete, Theory of NP-Completeness V2S.Shayan Daneshvar
 
Talk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesTalk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesPierre Jacob
 
SMB_2012_HR_VAN_ST-last version
SMB_2012_HR_VAN_ST-last versionSMB_2012_HR_VAN_ST-last version
SMB_2012_HR_VAN_ST-last versionLilyana Vankova
 
Variational Bayes: A Gentle Introduction
Variational Bayes: A Gentle IntroductionVariational Bayes: A Gentle Introduction
Variational Bayes: A Gentle IntroductionFlavio Morelli
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
A nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formulaA nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formulaAlexander Litvinenko
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methodsChristian Robert
 
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...mathsjournal
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Fabian Pedregosa
 

Ähnlich wie Monte Carlo Methods for Not Quite Bayesian Inference (20)

Monte Carlo methods for some not-quite-but-almost Bayesian problems
Monte Carlo methods for some not-quite-but-almost Bayesian problemsMonte Carlo methods for some not-quite-but-almost Bayesian problems
Monte Carlo methods for some not-quite-but-almost Bayesian problems
 
Unbiased MCMC with couplings
Unbiased MCMC with couplingsUnbiased MCMC with couplings
Unbiased MCMC with couplings
 
Unbiased Markov chain Monte Carlo methods
Unbiased Markov chain Monte Carlo methods Unbiased Markov chain Monte Carlo methods
Unbiased Markov chain Monte Carlo methods
 
Markov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themMarkov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing them
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
 
ABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space models
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBO
 
P, NP and NP-Complete, Theory of NP-Completeness V2
P, NP and NP-Complete, Theory of NP-Completeness V2P, NP and NP-Complete, Theory of NP-Completeness V2
P, NP and NP-Complete, Theory of NP-Completeness V2
 
Talk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesTalk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniques
 
SMB_2012_HR_VAN_ST-last version
SMB_2012_HR_VAN_ST-last versionSMB_2012_HR_VAN_ST-last version
SMB_2012_HR_VAN_ST-last version
 
Variational Bayes: A Gentle Introduction
Variational Bayes: A Gentle IntroductionVariational Bayes: A Gentle Introduction
Variational Bayes: A Gentle Introduction
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
A nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formulaA nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formula
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methods
 
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
 

Mehr von Pierre Jacob

ISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lectureISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lecturePierre Jacob
 
Couplings of Markov chains and the Poisson equation
Couplings of Markov chains and the Poisson equation Couplings of Markov chains and the Poisson equation
Couplings of Markov chains and the Poisson equation Pierre Jacob
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Pierre Jacob
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Pierre Jacob
 
Current limitations of sequential inference in general hidden Markov models
Current limitations of sequential inference in general hidden Markov modelsCurrent limitations of sequential inference in general hidden Markov models
Current limitations of sequential inference in general hidden Markov modelsPierre Jacob
 
On non-negative unbiased estimators
On non-negative unbiased estimatorsOn non-negative unbiased estimators
On non-negative unbiased estimatorsPierre Jacob
 
Path storage in the particle filter
Path storage in the particle filterPath storage in the particle filter
Path storage in the particle filterPierre Jacob
 
Density exploration methods
Density exploration methodsDensity exploration methods
Density exploration methodsPierre Jacob
 
SMC^2: an algorithm for sequential analysis of state-space models
SMC^2: an algorithm for sequential analysis of state-space modelsSMC^2: an algorithm for sequential analysis of state-space models
SMC^2: an algorithm for sequential analysis of state-space modelsPierre Jacob
 
PAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ WarwickPAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ WarwickPierre Jacob
 
Presentation of SMC^2 at BISP7
Presentation of SMC^2 at BISP7Presentation of SMC^2 at BISP7
Presentation of SMC^2 at BISP7Pierre Jacob
 
Presentation MCB seminar 09032011
Presentation MCB seminar 09032011Presentation MCB seminar 09032011
Presentation MCB seminar 09032011Pierre Jacob
 

Mehr von Pierre Jacob (12)

ISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lectureISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lecture
 
Couplings of Markov chains and the Poisson equation
Couplings of Markov chains and the Poisson equation Couplings of Markov chains and the Poisson equation
Couplings of Markov chains and the Poisson equation
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
Current limitations of sequential inference in general hidden Markov models
Current limitations of sequential inference in general hidden Markov modelsCurrent limitations of sequential inference in general hidden Markov models
Current limitations of sequential inference in general hidden Markov models
 
On non-negative unbiased estimators
On non-negative unbiased estimatorsOn non-negative unbiased estimators
On non-negative unbiased estimators
 
Path storage in the particle filter
Path storage in the particle filterPath storage in the particle filter
Path storage in the particle filter
 
Density exploration methods
Density exploration methodsDensity exploration methods
Density exploration methods
 
SMC^2: an algorithm for sequential analysis of state-space models
SMC^2: an algorithm for sequential analysis of state-space modelsSMC^2: an algorithm for sequential analysis of state-space models
SMC^2: an algorithm for sequential analysis of state-space models
 
PAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ WarwickPAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ Warwick
 
Presentation of SMC^2 at BISP7
Presentation of SMC^2 at BISP7Presentation of SMC^2 at BISP7
Presentation of SMC^2 at BISP7
 
Presentation MCB seminar 09032011
Presentation MCB seminar 09032011Presentation MCB seminar 09032011
Presentation MCB seminar 09032011
 

Kürzlich hochgeladen

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 

Kürzlich hochgeladen (20)

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 

Monte Carlo Methods for Not Quite Bayesian Inference

  • 1. Monte Carlo methods for some not-quite-but-almost Bayesian problems Pierre E. Jacob Department of Statistics, Harvard University joint work with Ruobin Gong, Paul T. Edlefsen, Arthur P. Dempster John O’Leary, Yves F. Atchad´e, Niloy Biswas, Paul Vanetti and others Pierre E. Jacob Monte Carlo for not quite Bayes
  • 2. Introduction A lot of questions in statistics give rise to non-trivial computational problems. Among these, some are numerical integration problems, ⇔ about sampling from probability distributions. Besag, Markov chain Monte Carlo for statistical inference, 2001. Computational challenges arise in deviations from standard Bayesian inference, motivated by three questions, quantifying ignorance, model misspecification, robustness to some perturbation of the data. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 3. Outline 1 Dempster–Shafer analysis of count data 2 Unbiased MCMC and diagnostics of convergence 3 Modular Bayesian inference 4 Bagging posterior distributions Pierre E. Jacob Monte Carlo for not quite Bayes
  • 4. Outline 1 Dempster–Shafer analysis of count data 2 Unbiased MCMC and diagnostics of convergence 3 Modular Bayesian inference 4 Bagging posterior distributions Pierre E. Jacob Monte Carlo for not quite Bayes
  • 5. Inference with count data Notation : [N] := {1, . . . , N}. Simplex ∆. Observations : xn ∈ [K] := {1, . . . , K}, x = (x1, . . . , xN ). Index sets : Ik = {n ∈ [N] : xn = k}. Counts : Nk = |Ik|. Model: xn iid ∼ Categorical(θ) with θ = (θk)k∈[K], i.e. P(xn = k) = θk for all n, k. Goal: estimate θ, predict, etc. Maximum likelihood estimator: ˆθk = Nk/N. Bayesian inference combines likelihood with prior on θ into a posterior distribution, assigning a probability ∈ [0, 1] to any measurable subset of the simplex ∆. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 6. Sampling from a Categorical distribution 2 3 1 ∆1(θ) ∆2(θ)∆3(θ) θ Subsimplex ∆k(θ), for θ ∈ ∆: {z ∈ ∆ : ∀ ∈ [K] z /zk ≥ θ /θk}. Sampling mechanism, for θ ∈ ∆: - draw un uniform on ∆, - define xn such that un ∈ ∆xn (θ), denoted also xn = m(un, θ). Then P(xn = k) = θk, because Vol(∆k(θ)) = θk. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 7. Arthur Dempster’s approach to inference Observations x = (xn)n∈[N] are fixed. If we draw u1, . . . , un ∼ ∆, there might exist θ ∈ ∆ such that ∀n ∈ [N] xn = m(un, θ), or such a θ might not exist. Arthur P. Dempster. New methods for reasoning towards posterior distributions based on sample data. Annals of Mathematical Statistics, 1966. Arthur P. Dempster. Statistical inference from a Dempster—Shafer perspective. Past, Present, and Future of Statistical Science, 2014. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 8. Draws in the simplex Counts: (2, 3, 1). Let’s draw N = 6 uniform samples on ∆. 2 3 1 q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 9. Draws in the simplex Each un is associated to an observed xn ∈ {11, 22, 33}. 2 3 1 q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 10. Draws in the simplex If there exists a feasible θ, it cannot be just anywhere. 2 3 1 q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 11. Draws in the simplex The uns of each category add constraints on θ. 2 3 1 q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 12. Draws in the simplex Overall the constraints define a polytope for θ, or an empty set. 2 3 1 q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 13. Draws in the simplex Here, there is a polytope of θ such that ∀n ∈ [N] xn = m(un, θ). 2 3 1 q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 14. Draws in the simplex Any θ in the polytope separates the uns appropriately. 2 3 1 qqq q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 15. Draws in the simplex Let’s try again with fresh uniform samples on ∆. 2 3 1 q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 16. Draws in the simplex Here there is no θ ∈ ∆ such that ∀n ∈ [N] xn = m(un, θ). 2 3 1 q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 17. Lower and upper probabilities Consider the set Rx = (u1, . . . , uN ) ∈ ∆N : ∃θ ∈ ∆ ∀n ∈ [N] xn = m(un, θ) . and denote by νx the uniform distribution on Rx. For u ∈ Rx, there is a set F(u) = {θ ∈ ∆ : ∀n xn = m(un, θ)}. For a set Σ ⊂ ∆ of interest, define (lower probability) P(Σ) = 1(F(u) ⊂ Σ)νx(du), (upper probability) ¯P(Σ) = 1(F(u) ∩ Σ = ∅)νx(du). Pierre E. Jacob Monte Carlo for not quite Bayes
  • 18. Summary and Monte Carlo problem Arthur Dempster’s approach, later called Dempster–Shafer theory of belief functions, is based on a distribution of feasible sets, F(u) = {θ ∈ ∆ ∀n ∈ [N] xn = m(un, θ)}, where u ∼ νx, the uniform distribution on Rx. How do we obtain samples from this distribution? Rejection rate 99%, for data (2, 3, 1). Hit-and-run algorithm? Our proposed strategy is a Gibbs sampler. Starting from some u ∈ Rx, we will iteratively refresh some components un of u given others. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 19. Gibbs sampler: initialization We can obtain some u in Rx as follows. Choose an arbitrary θ ∈ ∆. For all n ∈ [N] sample un uniformly in ∆k(θ) where xn = k. 2 3 1 ∆1(θ) ∆2(θ)∆3(θ) θ q q q q q q To sample components un given others, we will express Rx, {u : ∃θ ∀n xn = m(un, θ)} in terms of relations that the components un must satisfy with respect to one another. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 20. Equivalent representation For any θ ∈ ∆, ∀n ∈ [N] xn = m(un, θ) ⇔ ∀k ∈ [K] ∀n ∈ Ik un ∈ ∆k(θ) ⇔ ∀k ∈ [K] ∀n ∈ Ik ∀ ∈ [K] un, un,k ≥ θ θk because ∆k(θ) = {z ∈ ∆ : ∀ ∈ [K] z /zk ≥ θ /θk}. This is equivalent to ∀k ∈ [K] ∀ ∈ [K] min n∈Ik un, un,k ≥ θ θk . Therefore, denoting ηk→ = minn∈Ik un, /un,k, we can write Rx = u ∈ ∆N : ∃θ ∈ ∆ ∀k, ∈ [K] θ /θk ≤ ηk→ . Pierre E. Jacob Monte Carlo for not quite Bayes
  • 21. Linear constraints Counts: (9, 8, 3), u in Rx. Values ηk→ = minn∈Ik un, /un,k define linear constraints on θ. 2 3 1 q q q q q q q q q q q q q q q q q q q q θ3 θ1 = η1→3 θ2 θ1 = η1→2 Pierre E. Jacob Monte Carlo for not quite Bayes
  • 22. Some inequalities Next, assume u ∈ Rx, write ηk→ = minn∈Ik un, /un,k, and consider some implications. There exists θ ∈ ∆ such that θ /θk ≤ ηk→ for all k, ∈ [K]. Then, for all k, θ θk ≤ ηk→ , and θk θ ≤ η →k, thus ηk→ η →k ≥ 1. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 23. More inequalities We can continue, if K ≥ 3: for all k, , j, η−1 →k ≤ θ θk = θ θj θj θk ≤ ηj→ ηk→j, thus ηk→jηj→ η →k ≥ 1. And if K ≥ 4, for all k, , j, m ηk→jηj→ η →mηm→k ≥ 1. Generally, ∀L ∈ [K] ∀j1, . . . , jL ∈ [K] ηj1→j2 ηj2→j3 . . . ηjL→j1 ≥ 1. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 24. Main result So far, if ∃θ ∈ ∆ such that θ /θk ≤ ηk→ for k, ∈ [K] then ∀L ∈ [K] ∀j1, . . . , jL ∈ [K] ηj1→j2 ηj2→j3 . . . ηjL→j1 ≥ 1. The reverse implication holds too. This would mean Rx = {u : ∃θ ∀k, ∈ [K] θ /θk ≤ ηk→ } = {u : ∀L ∈ [K] ∀j1, . . . , jL ∈ [K] ηj1→j2 ηj2→j3 . . . ηjL→j1 ≥ 1}. i.e. Rx is represented by relations between components (un). This helps computing conditional distributions under νx, leading to a Gibbs sampler. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 25. Some remarks on these inequalities ∀L ∈ [K] ∀j1, . . . , jL ∈ [K] ηj1→j2 ηj2→j3 . . . ηjL→j1 ≥ 1. We can consider only unique indices in j1, . . . , jL, since the other cases can be deduced from those. Example: η1→2η2→4η4→3η3→2η2→1 ≥ 1, follows from η1→2η2→1 ≥ 1 and η2→4η4→3η3→2 ≥ 1. The indices j1 → j2 → · · · → jL → j1 form a cycle. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 26. Graphs Fully connected graph with weight log ηk→ on edge (k, ). 1 2 3 log(η1→2) log(η2→1) Value of a path = sum of the weights along the path. Negative cycle = path from vertex to itself with negative value Pierre E. Jacob Monte Carlo for not quite Bayes
  • 27. Graphs ∀L ∀j1, . . . , jL ηj1→j2 . . . ηjL→j1 ≥ 1 ⇔ ∀L ∀j1, . . . , jL log(ηj1→j2 ) + . . . + log(ηjL→j1 ) ≥ 0 ⇔ there are no negative cycles in the graph. 1 2 3 log(η1→2) log(η2→1) Pierre E. Jacob Monte Carlo for not quite Bayes
  • 28. Summary (wake up) We want to sample uniformly on the set Rx, Rx = {u : ∃θ ∀k, ∈ [K] θ /θk ≤ ηk→ }. We have claimed that this set can also be written {u : ∀L ∈ [K] ∀j1, . . . , jL ∈ [K] ηj1→j2 ηj2→j3 . . . ηjL→j1 ≥ 1}. The inequalities hold if and only if, there are no negative cycles in a fully connected graph with K vertices and weight log ηk→ on edge (k, ), for all k, ∈ [K]. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 29. Proof Proof of claim: “inequalities” ⇒ “∃θ : θ /θk ≤ ηk→ ∀k, ”. min(k → ) := minimum value of path from k to in the graph. Finite ∀k, because of absence of negative cycles in the graph. Define θ via θk ∝ exp(min(K → k)). Then θ ∈ ∆. Furthermore, for all k, min(K → ) ≤ min(K → k) + log(ηk→ ), therefore θ /θk ≤ ηk→ . Pierre E. Jacob Monte Carlo for not quite Bayes
  • 30. Conditional distributions We can obtain conditional distributions of un for n ∈ Ik given (un)n/∈Ik with respect to νx: un given (un)n/∈Ik are i.i.d. uniform in ∆k(θ ), where θ ∝ exp(− min( → k)) for all , with min( → k) := minimum value of path from to k. Shortest paths can be computed in polynomial time. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 31. Conditional distributions Counts: (9, 8, 3). What is the conditional distribution of (un)n∈Ik given (un)n/∈Ik under νx? 2 3 1 q q q qq q q q q q q q q q q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 32. Conditional distributions Counts: (9, 8, 3). What is the conditional distribution of (un)n∈Ik given (un)n/∈Ik under νx? 2 3 1 q q q q q q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 33. Conditional distributions Counts: (9, 8, 3). What is the conditional distribution of (un)n∈Ik given (un)n/∈Ik under νx? 2 3 1 q q q q q q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 34. Conditional distributions Counts: (9, 8, 3). What is the conditional distribution of (un)n∈Ik given (un)n/∈Ik under νx? 2 3 1 q q q q q q q q q q q Pierre E. Jacob Monte Carlo for not quite Bayes
  • 35. Gibbs sampler Initial u(0) ∈ Rx. At each iteration t ≥ 1, for each category k ∈ [K], 1 compute θ such that, for n ∈ Ik, un given other components is uniform on ∆k(θ ). 2 Draw u (t) n ∼ ∆k(θ ) for n ∈ Ik. 3 Update η (t) k→ for ∈ [K]. In step 1, θ is obtained by computing shortest path in graph with weights η (t) k→ on edge (k, ). Computed e.g. with Bellman–Ford algorithm, implemented in Cs´ardi & Nepusz, igraph package, 2006. Alternatively, we can compute θ by solving a linear program, Berkelaar, Eikland & Notebaert, lpsolve package, 2004 Pierre E. Jacob Monte Carlo for not quite Bayes
  • 36. Gibbs sampler Counts: (9, 8, 3), 100 polytopes generated by the sampler. 2 3 1 Pierre E. Jacob Monte Carlo for not quite Bayes
  • 37. Cost per iteration Cost in seconds for 100 full sweeps. 0.0 0.3 0.6 0.9 4 8 12 16 K elapsed N 256 512 1024 2048 https://github.com/pierrejacob/dempsterpolytope Pierre E. Jacob Monte Carlo for not quite Bayes
  • 38. Cost per iteration Cost in seconds for 100 full sweeps. 0.0 0.3 0.6 0.9 256 512 1024 2048 N elapsed K 4 8 12 16 https://github.com/pierrejacob/dempsterpolytope Pierre E. Jacob Monte Carlo for not quite Bayes
  • 39. How many iterations for convergence? Let ν(t) by the distribution of u(t) after t iterations. TV(ν(t), νx) = supA |ν(t)(A) − νx(A)|. 0.00 0.25 0.50 0.75 1.00 0 25 50 75 100 iteration TVupperbounds K 5 10 20 Pierre E. Jacob Monte Carlo for not quite Bayes
  • 40. How many iterations for convergence? Let ν(t) by the distribution of u(t) after t iterations. TV(ν(t), νx) = supA |ν(t)(A) − νx(A)|. 0.00 0.25 0.50 0.75 1.00 0 50 100 150 200 iteration TVupperbounds N 50 100 150 200 Pierre E. Jacob Monte Carlo for not quite Bayes
  • 41. Summary A Gibbs sampler can be used to approximate lower and upper probabilities in the Dempster–Shafer framework. Is perfect sampling possible here? Extensions for hierarchical counts, hidden Markov models? Jacob, Gong, Edlefsen & Dempster, A Gibbs sampler for a class of random convex polytopes. On arXiv and researchers.one. https://github.com/pierrejacob/dempsterpolytope Pierre E. Jacob Monte Carlo for not quite Bayes
  • 42. Outline 1 Dempster–Shafer analysis of count data 2 Unbiased MCMC and diagnostics of convergence 3 Modular Bayesian inference 4 Bagging posterior distributions Pierre E. Jacob Monte Carlo for not quite Bayes
  • 43. Coupled chains Glynn & Rhee, Exact estimation for MC equilibrium expectations, 2014. Generate two chains (Xt) and (Yt), going to π, as follows: sample X0 and Y0 from π0 (independently, or not), sample Xt|Xt−1 ∼ P(Xt−1, ·) for t = 1, . . . , L, for t ≥ L + 1, sample (Xt, Yt−L)|(Xt−1, Yt−L−1) ∼ ¯P ((Xt−1, Yt−L−1), ·). ¯P must be such that Xt+1|Xt ∼ P(Xt, ·) and Yt|Yt−1 ∼ P(Yt−1, ·) (thus Xt and Yt have the same distribution for all t ≥ 0), there exists a random time τ such that Xt = Yt−L for t ≥ τ (the chains meet and remain “faithful”). Pierre E. Jacob Monte Carlo for not quite Bayes
  • 44. Coupled chains 0 4 8 0 50 100 150 200 iteration x π = N(0, 1), RWMH with Normal proposal std = 0.5, π0 = N(10, 32 ) Pierre E. Jacob Monte Carlo for not quite Bayes
  • 45. Unbiased estimators Under some conditions, the estimator 1 m − k + 1 m t=k h(Xt) + 1 m − k + 1 τ−1 t=k+L min m − k + 1, t − k L (h(Xt) − h(Yt−L)), has expectation h(x)π(dx), finite cost and finite variance. “MCMC estimator + bias correction terms” Its efficiency can be close to that of MCMC estimators, if k, m chosen appropriately (and L also). Jacob, O’Leary & Atchad´e, Unbiased MCMC with couplings, 2019. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 46. Finite-time bias of MCMC Total variation distance between Xt ∼ πt and π = limt→∞ πt: πt − π TV ≤ E[max(0, (τ − L − t)/L )]. 0.000 0.005 0.010 0.015 0 50 100 150 200 τ − lag lag = 1 1e−04 1e−03 1e−02 1e−01 1e+00 1e+01 1e+02 0 50 100 150 200 iteration TVupperbounds Pierre E. Jacob Monte Carlo for not quite Bayes
  • 47. Finite-time bias of MCMC Total variation distance between Xt ∼ πt and π = limt→∞ πt: πt − π TV ≤ E[max(0, (τ − L − t)/L )]. 0.000 0.005 0.010 0.015 0 50 100 150 τ − lag lag = 50 1e−04 1e−03 1e−02 1e−01 1e+00 1e+01 1e+02 0 50 100 150 200 iteration TVupperbounds Pierre E. Jacob Monte Carlo for not quite Bayes
  • 48. Finite-time bias of MCMC Total variation distance between Xt ∼ πt and π = limt→∞ πt: πt − π TV ≤ E[max(0, (τ − L − t)/L )]. 0.000 0.005 0.010 0.015 0 50 100 150 τ − lag lag = 100 1e−04 1e−03 1e−02 1e−01 1e+00 1e+01 1e+02 0 50 100 150 200 iteration TVupperbounds Pierre E. Jacob Monte Carlo for not quite Bayes
  • 49. Finite-time bias of MCMC Upper bounds can also be obtained for e.g. 1-Wasserstein. And perhaps lower bounds? Applicable in e.g. high-dimensional and/or discrete spaces. Biswas, Jacob & Vanetti, Estimating Convergence of Markov chains with L-Lag Couplings, 2019. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 50. Finite-time bias of MCMC Example: Gibbs sampler for Dempster’s analysis of counts. 0.00 0.25 0.50 0.75 1.00 0 50 100 150 200 iteration TVupperbounds N 50 100 150 200 This quantifies bias of MCMC estimators, not variance. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 51. Outline 1 Dempster–Shafer analysis of count data 2 Unbiased MCMC and diagnostics of convergence 3 Modular Bayesian inference 4 Bagging posterior distributions Pierre E. Jacob Monte Carlo for not quite Bayes
  • 52. Models made of modules First module: parameter θ1, data Y1 prior: p1(θ1) likelihood: p1(Y1|θ1) Second module: parameter θ2, data Y2 prior: p2(θ2|θ1) likelihood: p2 (Y2|θ1, θ2) We are interested in the estimation of θ1, θ2 or both. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 53. Joint model approach Parameter (θ1, θ2), with prior p(θ1, θ2) = p1(θ1)p2(θ2|θ1). Data (Y1, Y2), likelihood p(Y1, Y2|θ1, θ2) = p1(Y1|θ1)p2(Y2|θ1, θ2). Posterior distribution π (θ1, θ2|Y1, Y2) ∝ p1 (θ1) p1(Y1|θ1)p2 (θ2|θ1) p2 (Y2|θ1, θ2). Pierre E. Jacob Monte Carlo for not quite Bayes
  • 54. Joint model approach In the joint model approach, all data are used to simultaneously infer all parameters. . . . . . so that uncertainty about θ1 is propagated to the estimation of θ2. . . . . . but misspecification of the 2nd module can damage the estimation of θ1. What about allowing uncertainty propagation, but preventing feedback of some modules on others? Pierre E. Jacob Monte Carlo for not quite Bayes
  • 55. Cut distribution One might want to propagate uncertainty without allowing “feedback” of second module on first module. Cut distribution: πcut (θ1, θ2; Y1, Y2) = p1(θ1|Y1)p2 (θ2|θ1, Y2). Different from the posterior distribution under joint model, under which the first marginal is π(θ1|Y1, Y2). Pierre E. Jacob Monte Carlo for not quite Bayes
  • 56. Example: epidemiological study Model of virus prevalence ∀i = 1, . . . , I Zi ∼ Binomial(Ni, ϕi), Zi is number of women infected with high-risk HPV in a sample of size Ni in country i. Beta(1,1) prior on each ϕi, independently. Impact of prevalence onto cervical cancer occurrence ∀i = 1, . . . , I Yi ∼ Poisson(λiTi), log(λi) = θ2,1 + θ2,2ϕi, Yi is number of cancer cases arising from Ti woman-years of follow-up in country i. N(0, 103) on θ2,1, θ2,2, independently. Plummer, Cuts in Bayesian graphical models, 2014. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 57. Monte Carlo with joint model approach Joint model posterior has density π (θ1, θ2|Y1, Y2) ∝ p1 (θ1) p1 (Y1|θ1)p2 (θ2|θ1) p2 (Y2|θ1, θ2). The computational complexity typically grows super-linearly with the number of modules. Difficulties stack up. . . intractability, multimodality, ridges, etc. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 58. Monte Carlo with cut distribution The cut distribution is defined as πcut (θ1, θ2; Y1, Y2) = p1(θ1|Y1)p2 (θ2|θ1, Y2) ∝ π (θ1, θ2|Y1, Y2) p2 (Y2|θ1) . The denominator is the feedback of the 2nd module on θ1: p2 (Y2|θ1) = p2(Y2|θ1, θ2)p2(dθ2|θ1). The feedback term is typically intractable. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 59. Monte Carlo with cut distribution WinBUGS’ approach via the cut function: alternate between sampling θ1 from K1(θ1 → dθ1), targeting p1(dθ1|Y1); sampling θ2 from K2 θ1 (θ2 → dθ2), targeting p2(dθ2|θ1, Y2). This does not leave the cut distribution invariant! Iterating the kernel K2 θ1 enough times mitigates the issue. Plummer, Cuts in Bayesian graphical models, 2014. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 60. Monte Carlo with cut distribution In a perfect world, we could sample i.i.d. θi 1 from p1(θ1|Y1), θi 2 given θi 1 from p2(θ2|θi 1, Y2), then (θi 1, θi 2) would be i.i.d. from the cut distribution. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 61. Monte Carlo with cut distribution In an MCMC world, we can sample θi 1 approximately from p1(θ1|Y1) using MCMC, θi 2 given θi 1 approximately from p2(θ2|θi 1, Y2) using MCMC, then resulting samples approximate the cut distribution, in the limit of the numbers of iterations, at both stages. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 62. Monte Carlo with cut distribution In an unbiased MCMC world, we can approximate expectations h(x)π(dx) without bias, in finite compute time. We can obtain an unbiased approximation of p1(θ1|Y1), and for each θ1, an unbiased approximation of p2(θ2|θ1, Y2). Thus, by the tower property, we can unbiasedly estimate h(θ1, θ2)p2(dθ2|θ1, Y1)p1(dθ1|Y1). Jacob, O’Leary & Atchad´e, Unbiased MCMC with couplings, 2019. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 63. Example: epidemiological study 0 1 2 3 −2.5 −2.0 −1.5 θ2,1 density 0.00 0.05 0.10 0.15 10 15 20 25 θ2,2 densityApproximation of the marginals of the cut distribution of (θ2,1, θ2,2), the parameters of the Poisson regression module in the epidemiological model of Plummer (2014). Jacob, Holmes, Murray, Robert & Nicholson, Better together? Statistical learning in models made of modules. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 64. Outline 1 Dempster–Shafer analysis of count data 2 Unbiased MCMC and diagnostics of convergence 3 Modular Bayesian inference 4 Bagging posterior distributions Pierre E. Jacob Monte Carlo for not quite Bayes
  • 65. Bagging posterior distributions We can stabilize the posterior distribution by using a bootstrap and aggregation scheme, in the spirit of bag- ging (Breiman, 1996b). In a nutshell, denote by D a bootstrap or subsample of the data D. The posterior of the random parameters θ given the data D has c.d.f. F(·|D), and we can stabilize this using FBayesBag(·|D) = E [F(·|D )], where E is with respect to the bootstrap- or subsam- pling scheme. We call it the BayesBag estimator. It can be approximated by averaging over B posterior com- putations for bootstrap- or subsamples, which might be a rather demanding task (although say B=10 would al- ready stabilize to a certain extent). B¨uhlmann, Discussion of Big Bayes Stories and BayesBag, 2014. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 66. Bagging posterior distributions For b = 1, . . . , B Sample data set D(b) by bootstrapping from D. Obtain MCMC approximation ˆπ(b) of posterior given D(b). Finally obtain B−1 B b=1 ˆπ(b). Converges to “BayesBag” distribution as both B and number of MCMC samples go to infinity. If we can obtain unbiased approximation of posterior given any D, the resulting approximation of “BayesBag” would be consistent as B → ∞ only. Exactly the same reasoning as for the cut distribution. Example at https://statisfaction.wordpress.com/2019/ 10/02/bayesbag-and-how-to-approximate-it/ Pierre E. Jacob Monte Carlo for not quite Bayes
  • 67. Discussion Some existing alternatives to standard Bayesian inference are well motivated, but raise computational questions. There are on-going efforts toward scalable Monte Carlo methods, e.g. using coupled Markov chains or regeneration techniques, in addition to sustained search for new MCMC algorithms. Quantification of variance is commonly done, quantification of bias is also possible. What makes a computational method convenient? It does not seem to be entirely about asymptotic efficiency when method is optimally tuned. Thank you for listening! Funding provided by the National Science Foundation, grants DMS-1712872 and DMS-1844695. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 68. References Practical couplings in the literature. . . Propp & Wilson, Exact sampling with coupled Markov chains and applications to statistical mechanics, Random Structures & Algorithms, 1996. Johnson, Studying convergence of Markov chain Monte Carlo algorithms using coupled sample paths, JASA, 1996. Neal, Circularly-coupled Markov chain sampling, UoT tech report, 1999. Glynn & Rhee, Exact estimation for Markov chain equilibrium expectations, Journal of Applied Probability, 2014. Agapiou, Roberts & Vollmer, Unbiased Monte Carlo: posterior estimation for intractable/infinite-dimensional models, Bernoulli, 2018. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 69. References Finite-time bias of MCMC. . . Brooks & Roberts, Assessing convergence of Markov chain Monte Carlo algorithms, STCO, 1998. Cowles & Rosenthal, A simulation approach to convergence rates for Markov chain Monte Carlo algorithms, STCO, 1998. Johnson, Studying convergence of Markov chain Monte Carlo algorithms using coupled sample paths, JASA, 1996. Gorham, Duncan, Vollmer & Mackey, Measuring Sample Quality with Diffusions, AAP, 2019. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 70. References Own work. . . with John O’Leary, Yves F. Atchad´e Unbiased Markov chain Monte Carlo with couplings, 2019. with Fredrik Lindsten, Thomas Sch¨on Smoothing with Couplings of Conditional Particle Filters, 2019. with Jeremy Heng Unbiased Hamiltonian Monte Carlo with couplings, 2019. with Lawrence Middleton, George Deligiannidis, Arnaud Doucet Unbiased Markov chain Monte Carlo for intractable target distributions, 2019. Unbiased Smoothing using Particle Independent Metropolis-Hastings, 2019. Pierre E. Jacob Monte Carlo for not quite Bayes
  • 71. References with Maxime Rischard, Natesh Pillai Unbiased estimation of log normalizing constants with applications to Bayesian cross-validation. with Niloy Biswas, Paul Vanetti Estimating Convergence of Markov chains with L-Lag Couplings, 2019. with Chris Holmes, Lawrence Murray, Christian Robert, George Nicholson Better together? Statistical learning in models made of modules. Pierre E. Jacob Monte Carlo for not quite Bayes