SlideShare a Scribd company logo
1 of 64
Download to read offline
VC.M. Bishop’s PRML
Tran Quoc Hoan
@k09hthaduonght.wordpress.com/
10 January 2016, PRML Reading, Hasegawa lab., Tokyo
The University of Tokyo
Chapter 11: Sampling Methods
Introduction
Introduction 2
Generating a random number is not easy!
True Random 

Number
Pseudo-Random 

Number
Gather “entropy”, or seemingly 

random data from the physical world
Seed
Random number
PRG
http://www.howtogeek.com/183051/htg-explains-how-computers-generate-random-numbers/
Ex. Mersenne Twister
Generating a number follow a probability
distribution is more difficult!
For today’s meeting
Agendas 3
• Cover PRML chapter 11
• Understand the general concept of sampling from
desired distribution
• Introduction to MCMC world
• More about MCMC
• Details in PaperAlert
Outline
Sampling Methods 4
Basic Sampling 

Algorithms
Markov Chain 

Monte Carlo
Gibbs Sampling
Slice Sampling
Hybrid Monte Carlo
Estimating the Partition 

Function
Part I:
General concept 

of basic sampling
Part II:
MCMC world
Progress…
5
Basic Sampling

Algorithms
Markov Chain

Monte Carlo
Gibbs Sampling
Slice Sampling
Hybrid Monte Carlo
Estimating the

Partition Function
Sampling Methods
Standard distributions
11.1 Basic Sampling Algorithms 6
• Goal: Sampling from desired distribution p(y) .
• Assumption: can generate random in U[0,1]
z y = h 1
(z)
h(y) =
Z y
1
p(x)dx
Generate
random
Transform
Uniformly
distributed
h is cumulative
distribution of p
0  y < 1
h(y) = 1 exp( y)
p(y) = exp( y)
y = 1
ln(1 z)
Ex. Consider exponential
distribution
where
then
and
p(y) = p(z)
dz
dy If h-1 is easy to know
Transformation method
711.1 Basic Sampling Algorithms
Rejection Sampling
811.1 Basic Sampling Algorithms
• Assumption 1: Sampling from p(z) is difficult but we are
able to evaluate p(z) for any given value of z, up to some
normalizing constant Z
• Assumption 2: We know how to sample from a proposal
distribution q(z) and there exist a constant k such that
p(z) =
1
Zp
˜p(z)
kq(z) ˜p(z)
• Then we know algorithm to obtain independent samples
from p(.)
(11.13)
Rejection Sampling
911.1 Basic Sampling Algorithms
Generate z0
from
proposal
q(.)
Consider
constant k
such that
kq(z) cover
p~(z)
Generate
u0 from
U[0, kq(z0)]
Reject z0 if
Keep z0 if
u0 > ˜p(z0)
u0  ˜p(z0)
• Efficiency of the method
depend on the ratio between
the grey area and the white
area
• Proof p(accept) =
Z
˜p(z)
kq(z)
q(z)dz
=
1
k
Z
˜p(z)dz
Rejection Sampling Example
1011.1 Basic Sampling Algorithms
• Sampling from Gamma
distribution (green curve)
Gam(z|a, b) =
ba
za 1
exp( bz)
(a)
at z = (a-1)/b
• Proposal distribution -> Cauchy
distribution (red curve)
q(z) =
c0
1 +
(z z0)2
d2
0
achieved by transforming z = d0 tan(⇡u) + z0
where u draw uniformly from [0, 1]
• We need to find z0, c0, d0 such that q(z) is greater (or equal) everywhere
to Gam(z|a,b) with smallest d0c0 (defines area)
z0 =
a 1
b
, d2
0 = 2a 1, c0 =
1
⇡d0
Adaptive Rejection Sampling
1111.1 Basic Sampling Algorithms
• The proposal distribution q(.) may be difficult to construct.
Fig 11.6: If a sample point is rejected, it is added to the set of the
grid points and used to refine the envelope distribution.
Construct
q(z) from
initial grid
points
Generate z4
from q(z)
Generate
u0 from
U[0, q(z4)]
Reject z4 if
Keep z4 if
• Rejection sampling methods
are inefficient if sampling in high
dimension (exponential decrease of
acceptance rate with dimensionality)
u0  ˜p(z4)
u0 > ˜p(z4)
but it is used to refine
the envelope
z4
Importance Sampling
1211.1 Basic Sampling Algorithms
IntegralBasic idea:
Transform the integral
into an expectation over a
simple, known
distribution
p(z) f(z)
z
q(z)
Conditions:
q(z) > 0 when f(z)p(z) ≠ 0
Easy to sample from q(z)
E[f] =
Z
f(z)p(z)dz
E[f] =
Z
f(z)p(z)
q(z)
q(z)
dz
E[f] =
Z
f(z)
p(z)
q(z)
q(z)dz
E[f] =
1
S
X
s
w(s)
f(z(s)
)
Proposal
Importance
weight
Monte Carlo
correct the bias
introduced by
sampling from a
wrong distribution
• All the generated samples are retained
Normalized
w(s)
/
p(z(s)
)
q(z(s))
z(s)
⇠ q(z)
SIR(sampling-importance-resampling)
1311.1 Basic Sampling Algorithms
• Rejection sampling: choosing q(z) and constant k is not suitable way
• SIR: based on the use of a proposal distribution q(z) but avoids
having to determine the constant k
1. Draw L samples z(1)
, z(2)
, ...z(L)
from q(z)
2. Calculate the importance weight
p(z(l)
)
q(z(l))
8l = 1...L
3. Normalize the weights to obtain w1...wL
4. Draw a second set of L samples from the discrete distribution
(z(1)
, z(2)
, ...z(L)
) with probabilities (w1...wL)
• The resulting L samples are distributed according to p(z) if L -> ∞
SIR(sampling-importance-resampling)
1411.1 Basic Sampling Algorithms
1. Draw L samples z(1)
, z(2)
, ...z(L)
from q(z)
2. Calculate the importance weight
p(z(l)
)
q(z(l))
8l = 1...L
3. Normalize the weights to obtain w1...wL
4. Draw a second set of L samples from the discrete distribution
(z(1)
, z(2)
, ...z(L)
) with probabilities (w1...wL)
• Proof
=
P
l I(z(l)
 a)˜p(z(l)
)/q(z(l)
)
P
l ˜p(z(l))/q(z(l))
p(z  a) =
X
l:z(l)a
wl p(z  a) =
R
I(z  a){˜p(z)/q(z)}q(z)dz
R
{˜p(z)/q(z)}q(z)dz
=
R
I(z  a)˜p(z)dz
R
˜p(z)dz
=
Z
I(z  a)p(z)dz
I(F) = 1 if F is TRUE else 0
If L ! 1 then
Sampling and the EM algorithm
1511.1 Basic Sampling Algorithms
• Use some Monte Carlo method to approximate the expectation of the E-step
Monte Carlo EM algorithm
• The expected complete-data log likelihood, given by 

(Z: hidden; X: observed; : parameters)✓
(11.28)Q(✓, ✓old
) =
Z
p(Z|X, ✓old
) ln p(Z, X|✓)dZ
may be approximated by (where Z(l)
are drawn from p(Z, X|✓old
) )
Q(✓, ✓old
) ⇡
1
L
LX
l=1
ln p(Z(l)
, X|✓) (11.29)
Stochastic EM algorithm
• Considering a finite mixture model, only one sample Z may be drawn at each
E-step (makes a hard assignment of each data point to one of the components)
IP Algorithm
1611.1 Basic Sampling Algorithms
• For a full Bayesian treatment in which we wish to draw samples
from the joint posterior p(✓, Z|X)
IP algorithm
• I-step. We wish to sample from p(Z|X) but we cannot do this directly. Notice
that
p(Z|X) =
Z
p(Z|✓, X)p(✓|X)d✓ (11.30)
p(✓|X), and then use this to draw a sample Z(l)
from p(Z|✓(l)
, X)
for l = 1...L we first draw a sample ✓(l)
from the current estimate for
• P-step. Given the relation
p(✓|X) =
Z
p(✓|Z, X)p(Z|X)dZ (11.31)
we use the samples {Z(l)} obtained from I-step to compute
p(✓|X) ⇡
1
L
LX
l=1
ln p(✓|Z(l)
, X) (11.32)
In Reviews…
1711.1 Basic Sampling Algorithms
• Inverse function method
- Analytical reliable but unable to deal with complicated distribution
• Rejection sampling
- Able to deal with complicated distribution but difficult to choose proposal
distribution and constant k
- Sometimes, it wastes samples due to rejection process
• Adaptive rejection sampling
- Use envelope function to reduce rejected samples.
- Difficult to deal with high dimension, sharp peak distribution
• Importance sampling
- Approximate expectation with weights in proposal distribution, not sample
from desired distribution
• SIR
- Combine rejection sampling and importance sampling
• Monte Carlo EM
• IP algorithm for data expand
Progress…
18
Basic Sampling 

Algorithms
Markov Chain 

Monte Carlo
Gibbs Sampling
Slice Sampling
Hybrid Monte Carlo
Estimating the Partition 

Function
Part I:
General concept 

of basic sampling
Part II:
Welcome to MCMC world
Sampling Methods
Markov Chain Monte Carlo (MCMC)
1911.2 Markov Chain Monte Carlo
• MCMC: general strategy which allows sampling from a large class
of distribution (based on the mechanism of Markov chains)
• MCMC scales well with the dimensionality of the sample space
Posterior distributionMLE Likelihood function MCMC
Estimate valueWrong estimate
Estimate top of mountain 

(depend on initial value)
Estimate posterior distribution 

(approach to global optimal,
not depend on initial value)
Slice sampling
Gibbs
sampling
Metropolis
method
Metropolis-Hastings Method
Markov Chain Monte Carlo
Inverse function
Rejection
sampling
Adaptive rejection
sampling
Importance
sampling
SIR
Data expand
sampling
MCMC: the idea
2011.2 Markov Chain Monte Carlo
• Goal: to generate a set of samples from p(z)
• Idea: to generate samples from a Markov Chain whose invariant
distribution is p(z)
1. Knowing the current sample is z(τ), generate a candidate sample z*
from a proposal distribution q(z|z(τ))
2. Accept the sample according to an appropriate criterion.
3. If the candidate sample is accepted then z(τ+1) = z* otherwise z(τ+1) = z(τ)
• The proposal distribution depends on the current state
• Samples z(1),z(2),… form a Markov chain and the distribution of z(τ)
tends to p(z) as τ -> ∞
• Assumption: We know how to evaluate (but not Zp)˜p(z) = Zpp(z)
Metropolis Algorithm
2111.2 Markov Chain Monte Carlo
• The proposal distribution is symmetric
• The candidate sample is accepted with probability
q(zA|zB) = q(zB|zA)
A(z⇤
, z(⌧)
) = min
✓
1,
˜p(z⇤
)
˜p(z(⌧))
◆
(11.33)
Fig 11.9: The proposal distribution is an
isotopic Gaussian distribution whose std =
0.2. Accepted steps in green, rejected
steps in red, std contour is ellipse. 150
candidate samples, 43 rejected.
Markov Chains
2211.2 Markov Chain Monte Carlo
• Q: under what circumstances will a Markov chain converge to the desired
distribution ?
• First order Markov chain: series of random variables z(1), …,z(M)
such that
p(z(m+1)
|z(1)
, ..., z(m)
) = p(z(m+1)
|z(m)
) 8m (11.37)
• Markov chain specified by p(z(0)) and the transition probabilities
Tm(z(m)
, z(m+1)
) = p(z(m+1)
|z(m)
)
• A distribution p*(z) is said to be invariant for a Markov chain if
p⇤
(z) =
X
z0
T(z0
, z)p⇤
(z0
)
with a sufficient condition is to choose the transitions to satisfy the
property of detailed balance
p⇤
(z)T(z, z0
) = T(z0
, z)p⇤
(z0
) (11.40)
Markov Chains
2311.2 Markov Chain Monte Carlo
Ergodicity
2411.2 Markov Chain Monte Carlo
Unique invariant distribution
if ‘forget’ starting point, z(0)
Image source: Murray, MLSS 2009 slides
Markov Chains
2511.2 Markov Chain Monte Carlo
(11.40)
• Goal: to generate a set of samples from p(z)
• Idea: to generate samples from a Markov Chain whose invariant
distribution is p(z)
• How: choose the transition probability T( z, z’ ) satisfy the property
of detailed balance for p(z)
p(z)T(z, z0
) = T(z0
, z)p(z0
)
• T( z, z’ ) can be constructed from a set of “base” transitions B1, B2, …,Bk
T(z0
, z) =
KX
k=1
↵kBk(z0
, z)
T(z0
, z) =
X
z1
...
X
zK 1
B1(z0
, z1)...BK 1(zK 2, zK 1)BK(zK 1, z)
or
(11.42)
(11.43)
The Metropolis-Hasting Algorithm
2611.2 Markov Chain Monte Carlo
• Generalization of the Metropolis algorithm (the proposal
distribution q is no longer symmetric).
• Knowing the current sample is z(τ), generate a candidate
sample z* from a proposal distribution q(z|z(τ))
• Accept it with probability
Ak(z⇤
, z(⌧)
) = min
✓
1,
˜p(z⇤
)qk(z(⌧)
|z⇤
)
˜p(z(⌧))qk(z⇤|z(⌧))
◆
(11.44)
where k labels the members of the set of possible
transitions being considered.
The Metropolis-Hasting Algorithm
2711.2 Markov Chain Monte Carlo
• Prove that p(z) is the invariant distribution of the chain
• Notice that the transition probability of this chain is defined as
• We need to prove
p(z)Tk(z, z0
) = Tk(z0
, z)p(z0
)
Ak(z⇤
, z(⌧)
) = min
✓
1,
˜p(z⇤
)qk(z(⌧)
|z⇤
)
˜p(z(⌧))qk(z⇤|z(⌧))
◆
p(z) = ˜p(z)/Zp
Proof
Tk(z, z0
) = qk(z0
|z)Ak(z0
, z)
p(z)qk(z0
|z)Ak(z0
, z) = min(p(z)qk(z0
|z), p(z0
)qk(z|z0
))
Use
= min(p(z0
)qk(z|z0
), p(z)qk(z0
|z))
= p(z)qk(z|z0
)Ak(z, z0
)
(Q.E.D)
The Metropolis-Hasting Algorithm
2811.2 Markov Chain Monte Carlo
• Common choice for q: Gaussian centered on the current state
✓ small variance -> high rate of acceptation but slow
exploration of the state space + non independent samples
✓ large variance -> high rate of rejection
Fig 11.10: Use of an isotropic Gaussian
proposal (blue circle) to sample from a
Gaussian distribution (red). The scale ρ
of the proposal should be on the order of
σmin , but the algorithm may have low
convergence (to explore the state space
in the other direction -> (σmax/σmin)2
iterations required)
Summary so far…
2911.2 Markov Chain Monte Carlo
• We need approximate methods to solve sum/integrals
• Monte Carlo does not explicitly depend on dimension,
although simple methods work only in low dimensions
• Markov Chain Monte Carlo (MCMC) can make local
moves. By assuming less, it’s more applicable to higher
dimensions
• Simple computations => “easy” to implement
(harder to diagnose)
Progress…
30
Basic Sampling

Algorithms
Markov Chain

Monte Carlo
Gibbs Sampling
Slice Sampling
Hybrid Monte Carlo
Estimating the

Partition Function
Sampling Methods
Gibbs Sampling
3111.3 Gibbs Sampling
• Sample each variable in turn, conditioned on the values of all
other variables in the distribution (method with no rejection)
✓ Initialize {z1, z2, …, zM}
✓ For τ = 1,2,…,T pick each variable in sequently turn or randomly and
resample
z⌧+1
i / p(zi|z⌧
i) for i = 1...M
Proof of validity
• Consider a Metropolis-Hastings sampling step involving the variable zk in
which the remaining variables zk remain fixed and the transition probability
qk(z⇤
|z) = p(z⇤
k|zk)
then, acceptance probability is
Ak(z⇤
, z) =
p(z⇤
)qk(z|z⇤
)
p(z)qk(z⇤|z)
=
p(z⇤
k|z⇤
k)p(z⇤
k)p(zk|z⇤
k)
p(zk|zk)p(zk)p(z⇤
k|zk)
= 1
where z⇤
k = zk
Gibbs Sampling
3211.3 Gibbs Sampling
Fig 11.11: Illustration of Gibbs sampling, by
alternate updates of two variables (blue steps)
whose distribution is a correlated Gaussian (red).
The step size is governed by the standard deviation
of the conditional distribution (green curve), and
is O(l), leading to slow progress. The number of
steps needed to obtain an independent sample from
the distribution is O((L/l)2)
Progress…
33
Basic Sampling

Algorithms
Markov Chain

Monte Carlo
Gibbs Sampling
Slice Sampling
Hybrid Monte Carlo
Estimating the

Partition Function
Sampling Methods
Auxiliary variables
3411.4 Slice Sampling
• Collapsing: analytically integrate variables out
• Auxiliary methods
Introduce extra variables integrate by MCMC
Explore where⇡(✓, h)
Z
⇡(✓, h)dh = ⇡(✓)
Slice Sampling
3511.4 Slice Sampling
• Problem of Metropolis algorithm ( proposal q(z|z’) = q(z’|z) )
✓ Step size is too small, slow convergence (random walk behavior)
✓ Step size is too large, high estimator variance (high rejection rate)
• Idea: adapt step size automatically to suitable value
• Technique: introduce variable u and sample (u, z) jointly. Ignoring u
leads to the desired samples of p(z)
Slice Sampling
3611.4 Slice Sampling
• Sample z and u uniformly from area under the distribution
✓ Fix z, sample u uniform from
✓ Fix u, sample z uniform from the slice through the distribution
• How to sample z from the slice
slice
[0, ˜p(z)]
{z : ˜p(z) > u}
✓ Start with the region of width w containing z(τ)
✓ If end point in slice, then extend region by w in that direction
✓ Sample z’ uniform from region
✓ If z’ in slice, then accept as z(τ+1)
✓ If not: make z’ new end point of the region, and resample z’
Multivariate distribution: slice
sampling within Gibbs sampler
See next slides for more details
Slice Sampling Idea
3711.4 Slice Sampling
˜p(z)
(z, u)
z
Sample uniformly under curve ˜p(z) / p(z)
p(u|z) = Uniform[0, ˜p(z)]
p(z|u) /
(
1 if ˜p(z) u
0 if otherwise
= Uniform on the slice
u
Slide from MCMC NIPS2015 tutorial
Slice Sampling Idea
3811.4 Slice Sampling
Rejection sampling p(z|u) using broader uniform
z
(z, u)
u
Unimodal conditionals
Slide from MCMC NIPS2015 tutorial
Slice Sampling Idea
3911.4 Slice Sampling
Adaptive rejection sampling p(z|u)
z
(z, u)
u
Unimodal conditionals
Slide from MCMC NIPS2015 tutorial
Slice Sampling Idea
4011.4 Slice Sampling
Quickly find new z and no rejection recorded
z
(z, u)
u
|
Unimodal conditionals
Slide from MCMC NIPS2015 tutorial
Slice Sampling Idea
4111.4 Slice Sampling
Multimodal conditionals
˜p(z)
(z, u)
u
z
Use updates that leave p(z|u) invariant
- place bracket randomly around point
- linearly step out until ends are off slice
- sample on bracket, shrinking as before
Slide from MCMC NIPS2015 tutorial
Progress…
42
Basic Sampling

Algorithms
Markov Chain

Monte Carlo
Gibbs Sampling
Slice Sampling
Hybrid Monte Carlo
Estimating the

Partition Function
Sampling Methods
Hybrid Monte Carlo
4311.5 Hybrid Monte Carlo
• Problem of Metropolis algorithm is the step size trade-off
• Hybrid Monte Carlo is suitable in continuous state spaces
✓ Able to make large jumps in state space with low rejection rate
✓ Adopts physical system (Hamiltonian) dynamics rather than a
probability distribution to propose future states in the Markov chain.
• Goal: to sample from
p(z) =
1
Zp
exp( E(z))
where E(z) is considered as potential energy function of system over z
Hamiltonian dynamics
4411.5 Hybrid Monte Carlo
• Hamiltonian dynamics describe how kinetic energy is converted to
potential energy (and vice versa) as an object moves throughout in time
• Evolution of state variable z = {zi} under continuous time τ.
• Momentum variables correspond to rate of change of state.
ri =
dzi
d⌧
(11.53)Join (z, r) space is
called phase space
• For each location the object takes, there is an associated potential energy
E(z), and for each momentum there is an associated kinetic energy K(r).
Total energy of the system is constant
and known as Hamiltonian
H(z, r) = E(z) + K(r)
and
@ri
@⌧
=
@H
@zi
=
@E(z)
@zi
@zi
@⌧
=
@H
@ri
=
@K(r)
@ri
• Preserve volume in phase space div V = 0 with V =
✓
dz
d⌧
,
dr
d⌧
◆
(11.62)
Simulating Hamiltonian dynamics
4511.5 Hybrid Monte Carlo
@ri
@⌧
=
@H
@zi
=
@E(z)
@zi
@zi
@⌧
=
@H
@ri
=
@K(r)
@ri
• If we have expression for partial and a set of initial conditions
(z0, r0), we can predict the location and momentum at any point
in time.
Leap Frog method (run for L steps to simulate dynamics over L x δ units of time)
1. Take a half step in time to update the momentum variable
ri(⌧ + /2) = ri(⌧) ( /2)
@E
@zi(⌧)
zi(⌧ + ) = zi(⌧) +
@K
@ri(⌧ + /2)
2. Take a full step in time to update the position variable
3. Take the remaining half step in time to finish updating the momentum
variable
ri(⌧ + ) = ri(⌧ + /2) ( /2)
@E
@zi(⌧ + )
Simulating Hamiltonian oscillator
4611.5 Hybrid Monte Carlo
F = kz
K(v) =
(mv)2
2m
=
v2
2
=
r2
2
= K(r)
Leap Frog equations
1. r(⌧ + /2) = r(⌧) ( /2)z(⌧)
2. z(⌧ + ) = z(⌧) + ( )r(⌧ + /2)
3. r(⌧ + ) = r(⌧ + /2) ( /2)z(⌧ + )
r
z
E+K H
Energy Phase Space
Img Ref. https://theclevermachine.wordpress.com/2012/11/18/
mcmc-hamiltonian-monte-carlo-a-k-a-hybrid-monte-carlo/
E(z) =
Z
Fdz =
kz2
2
Harmonic Oscillator
Target distribution
4711.5 Hybrid Monte Carlo
• Consider canonical distribution p(✓) =
1
Zp
exp( E(✓))
• Canonical distribution for the Hamiltonian dynamics energy function is
p(z, r) / exp( H(z, r)) = exp( E(z) K(r))
/ p(z)p(r) state z and momentum r are independently distributed
• We can use Hamiltonian dynamics to sample from the joint canonical
distribution over r and z and simply ignore the momentum contributions.
idea of introducing auxiliary variables (r) to facilitate the Markov chain of (z)
• A common choose
K(r) =
rT
r
2
and E(z) = log p(z)
Hybrid Monte Carlo
4811.5 Hybrid Monte Carlo
• Combination of Metropolis algorithm and Hamiltonian Dynamics
Algorithm to draw M samples from a target distribution
1. Set τ = 0
2. Generate an initial position state z(0) ~ π(0)
3. Repeat until τ = M
Set τ = τ + 1
- Sample a new initial momentum variable from the momentum canonical distribution r0 ~ p(r)
- Set z0 = z(τ - 1)
- Run Leap Frog algorithm starting at [z0, r0] for L step and step size δ to obtain proposed
states z* and r*
- Calculate the Metropolis acceptance probability
↵ = min(1, exp{H(z0, r0) H(z⇤
, r⇤
)})
- Draw a random number u uniformly from [0, 1]
If u ≤ α accept the position and set the next state z(τ) = z* else set z(τ)= z(τ-1)
Hybrid Monte Carlo simulation
4911.5 Hybrid Monte Carlo
Hamiltonian Monte Carlo for sampling
a Bivariate Normal distribution
E(z) = log(e
zT ⌃ 1z
2 ) const
p(z) = N(µ, ⌃) with µ = [0, 0]
The MH algorithm converges much slower
than HMC, and consecutive samples have
much higher autocorrelation than samples
drawn using HMC
Img Source. https://theclevermachine.wordpress.com/2012/11/18/
mcmc-hamiltonian-monte-carlo-a-k-a-hybrid-monte-carlo/
Detailed balance
5011.5 Hybrid Monte Carlo
Transition probability going from R to R’ Transition probability going from R’ to R
1
ZH
exp( H(R)) V
1
2
min{1, exp(H(R) H(R0
))}
1
ZH
exp( H(R0
)) V
1
2
min{1, exp(H(R0
) H(R))}
Update after sequence of L leapfrog iterations of step size δ

the leapfrog integration preserves phase-space volume
R
R’
=
time-reversible
prob of choosing positive step size δ or negative step size -δ
Progress…
51
Basic Sampling

Algorithms
Markov Chain

Monte Carlo
Gibbs Sampling
Slice Sampling
Hybrid Monte Carlo
Estimating the

Partition Function
Sampling Methods
Estimating the Partition Function
5211.6 Estimating the Partition Function
• Most sampling algorithms require distribution up to the constant
partition function ZE (not needed in order to draw samples from p(z))
pE(z) =
1
ZE
exp{ E(z)}
ZE =
X
z
exp{ E(z)}
• Partition function is useful for model comparison (because it
represent for the probability of observed data).
p(hidden|observed) =
p(hidden, observed)
p(observed)
• For model comparison, we’re interested in ratio of partition functions
Using importance sampling
5311.6 Estimating the Partition Function
• Use importance sampling from proposal pG with energy G(z)
ZE
ZG
=
P
z exp( E(z))
P
z exp( G(z))
=
P
z exp( E(z) + G(z)) exp( G(z))
P
z exp( G(z))
= EpG
[exp( E(z) + G(z))] '
1
L
exp( E(z(l)
) + G(z(l)
)) (11.72)
sampled from pG
• Problem: pG need match pE
• Idea: we can use samples z(l) from pE from a Markov chain
• If ZG is easy to compute we can estimate ZE
pG(z) =
1
L
LX
l=1
T(z(l)
, z) (11.73)
where T gives the transition probabilities of the chain
• We now define G(z) = -log pG(z) and use in (11.72)
Chaining
5411.6 Estimating the Partition Function
• Partition function ratio estimation requires matching distributions.
• Partition function ZG needs to be evaluated exactly (but only simple
distribution) => Poor matching with complicated distribution
• Idea: use set of distributions between the simple p1 and complex pM
ZM
Z1
=
Z2
Z1
Z3
Z2
...
ZM
ZM 1
E↵(z) = (1 ↵)E1(z) + ↵EM (z)
• The intermediate distributions interpolate from E1 to EM
(11.74)
(11.75)
• Use single Markov chain run initially for the system p1 and then after some
suitable number of steps moves on to the next distribution in the sequence.
Summary
55
Basic Sampling 

Algorithms
Markov Chain 

Monte Carlo
Gibbs Sampling
Slice Sampling
Hybrid Monte Carlo
Estimating the Partition 

Function
Part I: General concept 

of basic sampling
Part II:
MCMC world
Sampling Methods
Papers Alert
56Sampling Methods
• Markov Chain Monte Carlo Method without Detailed Balance
http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.105.120603
• Hamiltonian Annealed Importance Sampling for partition function estimation
http://arxiv.org/abs/1205.1925
• Hamiltonian Monte Carlo with Reduced Momentum Flips
(2010) Hidemaro Suwa and Synge Todo
(2012) Jascha Sohl-Dickstein, Benjamin J. Culpepper
(2012) Jascha Sohl-Dickstein http://arxiv.org/abs/1205.1939
http://jmlr.org/proceedings/papers/v32/sohl-dickstein14.pdf
• Hamiltonian Monte Carlo Without Detailed Balance
(2014) Jascha Sohl-Dickstein
• A Markov Jump Process for More Efficient Hamiltonian Monte Carlo
(2015) Jascha Sohl-Dickstein http://arxiv.org/abs/1509.03808
http://jmlr.org/proceedings/papers/v37/salimans15.pdf
• Markov Chain Monte Carlo and Variational Inference: Bridging the Gap
(2015) Tim Salimans
Observing Dark Worlds
57Dark Matter Worlds Halo
Dark Matter bending the light from a background galaxy. In extreme cases
the galaxy here is seen as the two arcs surrounding it
https://www.kaggle.com/c/DarkWorlds
Observing Dark Worlds
58Dark Matter Worlds Halo
https://www.kaggle.com/c/DarkWorlds
We observe that this stuff
aggregates and forms massive
structures called Dark Matter
Halos.
There are many galaxies
behind a Dark Matter
halo, their shapes will
correlate with its position.
Observing Dark Worlds
59Dark Matter Worlds Halo
https://www.kaggle.com/c/DarkWorlds
The task is then to use this
“bending of light” to
estimate where in the sky
this dark matter is located.
Observing Dark Worlds
60Dark Matter Worlds Halo
https://www.kaggle.com/c/DarkWorlds• It is really one of statistics: given the noisy data (the elliptical galaxies)
recover the model and parameters (position and mass of the dark
matter) that generated them
• Step 1: construct a prior distribution p(x) for halo positions (e.g. uniform)
• Step 2: construct a probabilistic model for the data (observed ellipticities of
the galaxies) p(e|x) p(ei|x) = N(
X
j=allhalos
di,jmjf(ri,j), 2
)
http://timsalimans.com/observing-dark-worlds/
✦ dij = tangential direction, i.e. the direction in which halo j bends the light of galaxy i
✦ mj is the mass of halo j
✦ f(rij) is a decreasing function in the euclidean distance rij between galaxy i and halo j.
✦For the large halos assign m as a log-uniform distribution in [40,180], and f(rij) = 1/max(rij, 240)
✦For the small halos, fixed the mass at 20 and f(rij) = 1/max(rij, 70)
Observing Dark Worlds
61Dark Matter Worlds Halo
• Step 3: Get posterior distribution for halo positions p(x|e) =
p(e|x)p(x)/p(e) (simple random-walk Metropolis Hastings
sampler to approximate the posterior distribution )
• Step 4: Minimization the expected loss
˜x = arg min
prediction
Ep(x|e)L(prediction, x)
http://timsalimans.com/observing-dark-worlds/
Dark Matter Worlds Halo Slide from MCMC NIPS2015 tutorial
Dark Matter Worlds Halo Slide from MCMC NIPS2015 tutorial
Dark Matter Worlds Halo Slide from MCMC NIPS2015 tutorial

More Related Content

What's hot

PRML 条件付き混合モデル 14.5
PRML 条件付き混合モデル 14.5PRML 条件付き混合モデル 14.5
PRML 条件付き混合モデル 14.5tmtm otm
 
Prml 11.1.4 重点サンプリング
Prml 11.1.4 重点サンプリングPrml 11.1.4 重点サンプリング
Prml 11.1.4 重点サンプリングYo Ehara
 
PRML§12-連続潜在変数
PRML§12-連続潜在変数PRML§12-連続潜在変数
PRML§12-連続潜在変数Keisuke OTAKI
 
入門パターン認識と機械学習12章
入門パターン認識と機械学習12章入門パターン認識と機械学習12章
入門パターン認識と機械学習12章hiro5585
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative ModelsMLReview
 
PRML輪読#14
PRML輪読#14PRML輪読#14
PRML輪読#14matsuolab
 
Domain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveyDomain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveySangwoo Mo
 
PRML第6章「カーネル法」
PRML第6章「カーネル法」PRML第6章「カーネル法」
PRML第6章「カーネル法」Keisuke Sugawara
 
混合モデルとEMアルゴリズム(PRML第9章)
混合モデルとEMアルゴリズム(PRML第9章)混合モデルとEMアルゴリズム(PRML第9章)
混合モデルとEMアルゴリズム(PRML第9章)Takao Yamanaka
 
混合ガウスモデルとEMアルゴリスム
混合ガウスモデルとEMアルゴリスム混合ガウスモデルとEMアルゴリスム
混合ガウスモデルとEMアルゴリスム貴之 八木
 
PRML EP法 10.7 10.7.2
PRML EP法 10.7 10.7.2 PRML EP法 10.7 10.7.2
PRML EP法 10.7 10.7.2 tmtm otm
 
パターン認識 04 混合正規分布
パターン認識 04 混合正規分布パターン認識 04 混合正規分布
パターン認識 04 混合正規分布sleipnir002
 
PRML輪読#10
PRML輪読#10PRML輪読#10
PRML輪読#10matsuolab
 
変分ベイズ法の説明
変分ベイズ法の説明変分ベイズ法の説明
変分ベイズ法の説明Haruka Ozaki
 
【書きかけ】一般化線形モデルの流れ
【書きかけ】一般化線形モデルの流れ【書きかけ】一般化線形モデルの流れ
【書きかけ】一般化線形モデルの流れTomoshige Nakamura
 
3.3節 変分近似法(前半)
3.3節 変分近似法(前半)3.3節 変分近似法(前半)
3.3節 変分近似法(前半)tn1031
 
PRML第9章「混合モデルとEM」
PRML第9章「混合モデルとEM」PRML第9章「混合モデルとEM」
PRML第9章「混合モデルとEM」Keisuke Sugawara
 

What's hot (20)

PRML 条件付き混合モデル 14.5
PRML 条件付き混合モデル 14.5PRML 条件付き混合モデル 14.5
PRML 条件付き混合モデル 14.5
 
Prml 11.1.4 重点サンプリング
Prml 11.1.4 重点サンプリングPrml 11.1.4 重点サンプリング
Prml 11.1.4 重点サンプリング
 
PRML§12-連続潜在変数
PRML§12-連続潜在変数PRML§12-連続潜在変数
PRML§12-連続潜在変数
 
入門パターン認識と機械学習12章
入門パターン認識と機械学習12章入門パターン認識と機械学習12章
入門パターン認識と機械学習12章
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
 
PRML輪読#14
PRML輪読#14PRML輪読#14
PRML輪読#14
 
PRML 12.2-12.2.2
PRML 12.2-12.2.2PRML 12.2-12.2.2
PRML 12.2-12.2.2
 
Domain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveyDomain Transfer and Adaptation Survey
Domain Transfer and Adaptation Survey
 
PRML第6章「カーネル法」
PRML第6章「カーネル法」PRML第6章「カーネル法」
PRML第6章「カーネル法」
 
PRML8章
PRML8章PRML8章
PRML8章
 
混合モデルとEMアルゴリズム(PRML第9章)
混合モデルとEMアルゴリズム(PRML第9章)混合モデルとEMアルゴリズム(PRML第9章)
混合モデルとEMアルゴリズム(PRML第9章)
 
混合ガウスモデルとEMアルゴリスム
混合ガウスモデルとEMアルゴリスム混合ガウスモデルとEMアルゴリスム
混合ガウスモデルとEMアルゴリスム
 
PRML EP法 10.7 10.7.2
PRML EP法 10.7 10.7.2 PRML EP法 10.7 10.7.2
PRML EP法 10.7 10.7.2
 
パターン認識 04 混合正規分布
パターン認識 04 混合正規分布パターン認識 04 混合正規分布
パターン認識 04 混合正規分布
 
PRML輪読#10
PRML輪読#10PRML輪読#10
PRML輪読#10
 
変分ベイズ法の説明
変分ベイズ法の説明変分ベイズ法の説明
変分ベイズ法の説明
 
【書きかけ】一般化線形モデルの流れ
【書きかけ】一般化線形モデルの流れ【書きかけ】一般化線形モデルの流れ
【書きかけ】一般化線形モデルの流れ
 
Chapter11.2
Chapter11.2Chapter11.2
Chapter11.2
 
3.3節 変分近似法(前半)
3.3節 変分近似法(前半)3.3節 変分近似法(前半)
3.3節 変分近似法(前半)
 
PRML第9章「混合モデルとEM」
PRML第9章「混合モデルとEM」PRML第9章「混合モデルとEM」
PRML第9章「混合モデルとEM」
 

Viewers also liked

010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian ProcessHa Phuong
 
016_20160722 Molecular Circuits For Dynamic Noise Filtering
016_20160722 Molecular Circuits For Dynamic Noise Filtering016_20160722 Molecular Circuits For Dynamic Noise Filtering
016_20160722 Molecular Circuits For Dynamic Noise FilteringHa Phuong
 
006 20151207 draws - Deep Recurrent Attentive Writer
006 20151207 draws - Deep Recurrent Attentive Writer006 20151207 draws - Deep Recurrent Attentive Writer
006 20151207 draws - Deep Recurrent Attentive WriterHa Phuong
 
015_20160422 Controlling Synchronous Patterns In Complex Networks
015_20160422 Controlling Synchronous Patterns In Complex Networks015_20160422 Controlling Synchronous Patterns In Complex Networks
015_20160422 Controlling Synchronous Patterns In Complex NetworksHa Phuong
 
005 20151130 adversary_networks
005 20151130 adversary_networks005 20151130 adversary_networks
005 20151130 adversary_networksHa Phuong
 
018 20160902 Machine Learning Framework for Analysis of Transport through Com...
018 20160902 Machine Learning Framework for Analysis of Transport through Com...018 20160902 Machine Learning Framework for Analysis of Transport through Com...
018 20160902 Machine Learning Framework for Analysis of Transport through Com...Ha Phuong
 
013_20160328_Topological_Measurement_Of_Protein_Compressibility
013_20160328_Topological_Measurement_Of_Protein_Compressibility013_20160328_Topological_Measurement_Of_Protein_Compressibility
013_20160328_Topological_Measurement_Of_Protein_CompressibilityHa Phuong
 
008 20151221 Return of Frustrating Easy Domain Adaptation
008 20151221 Return of Frustrating Easy Domain Adaptation008 20151221 Return of Frustrating Easy Domain Adaptation
008 20151221 Return of Frustrating Easy Domain AdaptationHa Phuong
 
009_20150201_Structural Inference for Uncertain Networks
009_20150201_Structural Inference for Uncertain Networks009_20150201_Structural Inference for Uncertain Networks
009_20150201_Structural Inference for Uncertain NetworksHa Phuong
 
011_20160321_Topological_data_analysis_of_contagion_map
011_20160321_Topological_data_analysis_of_contagion_map011_20160321_Topological_data_analysis_of_contagion_map
011_20160321_Topological_data_analysis_of_contagion_mapHa Phuong
 
017_20160826 Thermodynamics Of Stochastic Turing Machines
017_20160826 Thermodynamics Of Stochastic Turing Machines017_20160826 Thermodynamics Of Stochastic Turing Machines
017_20160826 Thermodynamics Of Stochastic Turing MachinesHa Phuong
 
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
007 20151214 Deep Unsupervised Learning using Nonequlibrium ThermodynamicsHa Phuong
 
Tutorial of topological data analysis part 3(Mapper algorithm)
Tutorial of topological data analysis part 3(Mapper algorithm)Tutorial of topological data analysis part 3(Mapper algorithm)
Tutorial of topological data analysis part 3(Mapper algorithm)Ha Phuong
 
Tutorial of topological_data_analysis_part_1(basic)
Tutorial of topological_data_analysis_part_1(basic)Tutorial of topological_data_analysis_part_1(basic)
Tutorial of topological_data_analysis_part_1(basic)Ha Phuong
 
RESEARCH METHOD - SAMPLING
RESEARCH METHOD - SAMPLINGRESEARCH METHOD - SAMPLING
RESEARCH METHOD - SAMPLINGHafizah Hajimia
 
002 20151019 interconnected_network
002 20151019 interconnected_network002 20151019 interconnected_network
002 20151019 interconnected_networkHa Phuong
 
003 20151109 nn_faster_andfaster
003 20151109 nn_faster_andfaster003 20151109 nn_faster_andfaster
003 20151109 nn_faster_andfasterHa Phuong
 
004 20151116 deep_unsupervisedlearningusingnonequlibriumthermodynamics
004 20151116 deep_unsupervisedlearningusingnonequlibriumthermodynamics004 20151116 deep_unsupervisedlearningusingnonequlibriumthermodynamics
004 20151116 deep_unsupervisedlearningusingnonequlibriumthermodynamicsHa Phuong
 
Applied Statistics : Sampling method & central limit theorem
Applied Statistics : Sampling method & central limit theoremApplied Statistics : Sampling method & central limit theorem
Applied Statistics : Sampling method & central limit theoremwahidsajol
 

Viewers also liked (20)

010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process
 
016_20160722 Molecular Circuits For Dynamic Noise Filtering
016_20160722 Molecular Circuits For Dynamic Noise Filtering016_20160722 Molecular Circuits For Dynamic Noise Filtering
016_20160722 Molecular Circuits For Dynamic Noise Filtering
 
006 20151207 draws - Deep Recurrent Attentive Writer
006 20151207 draws - Deep Recurrent Attentive Writer006 20151207 draws - Deep Recurrent Attentive Writer
006 20151207 draws - Deep Recurrent Attentive Writer
 
015_20160422 Controlling Synchronous Patterns In Complex Networks
015_20160422 Controlling Synchronous Patterns In Complex Networks015_20160422 Controlling Synchronous Patterns In Complex Networks
015_20160422 Controlling Synchronous Patterns In Complex Networks
 
005 20151130 adversary_networks
005 20151130 adversary_networks005 20151130 adversary_networks
005 20151130 adversary_networks
 
018 20160902 Machine Learning Framework for Analysis of Transport through Com...
018 20160902 Machine Learning Framework for Analysis of Transport through Com...018 20160902 Machine Learning Framework for Analysis of Transport through Com...
018 20160902 Machine Learning Framework for Analysis of Transport through Com...
 
013_20160328_Topological_Measurement_Of_Protein_Compressibility
013_20160328_Topological_Measurement_Of_Protein_Compressibility013_20160328_Topological_Measurement_Of_Protein_Compressibility
013_20160328_Topological_Measurement_Of_Protein_Compressibility
 
008 20151221 Return of Frustrating Easy Domain Adaptation
008 20151221 Return of Frustrating Easy Domain Adaptation008 20151221 Return of Frustrating Easy Domain Adaptation
008 20151221 Return of Frustrating Easy Domain Adaptation
 
009_20150201_Structural Inference for Uncertain Networks
009_20150201_Structural Inference for Uncertain Networks009_20150201_Structural Inference for Uncertain Networks
009_20150201_Structural Inference for Uncertain Networks
 
011_20160321_Topological_data_analysis_of_contagion_map
011_20160321_Topological_data_analysis_of_contagion_map011_20160321_Topological_data_analysis_of_contagion_map
011_20160321_Topological_data_analysis_of_contagion_map
 
017_20160826 Thermodynamics Of Stochastic Turing Machines
017_20160826 Thermodynamics Of Stochastic Turing Machines017_20160826 Thermodynamics Of Stochastic Turing Machines
017_20160826 Thermodynamics Of Stochastic Turing Machines
 
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
 
Tutorial of topological data analysis part 3(Mapper algorithm)
Tutorial of topological data analysis part 3(Mapper algorithm)Tutorial of topological data analysis part 3(Mapper algorithm)
Tutorial of topological data analysis part 3(Mapper algorithm)
 
Tutorial of topological_data_analysis_part_1(basic)
Tutorial of topological_data_analysis_part_1(basic)Tutorial of topological_data_analysis_part_1(basic)
Tutorial of topological_data_analysis_part_1(basic)
 
RESEARCH METHOD - SAMPLING
RESEARCH METHOD - SAMPLINGRESEARCH METHOD - SAMPLING
RESEARCH METHOD - SAMPLING
 
002 20151019 interconnected_network
002 20151019 interconnected_network002 20151019 interconnected_network
002 20151019 interconnected_network
 
003 20151109 nn_faster_andfaster
003 20151109 nn_faster_andfaster003 20151109 nn_faster_andfaster
003 20151109 nn_faster_andfaster
 
004 20151116 deep_unsupervisedlearningusingnonequlibriumthermodynamics
004 20151116 deep_unsupervisedlearningusingnonequlibriumthermodynamics004 20151116 deep_unsupervisedlearningusingnonequlibriumthermodynamics
004 20151116 deep_unsupervisedlearningusingnonequlibriumthermodynamics
 
Applied Statistics : Sampling method & central limit theorem
Applied Statistics : Sampling method & central limit theoremApplied Statistics : Sampling method & central limit theorem
Applied Statistics : Sampling method & central limit theorem
 
mcmc
mcmcmcmc
mcmc
 

Similar to PRML Reading Chapter 11 - Sampling Method

Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxHaibinSu2
 
Monte Carlo Methods
Monte Carlo MethodsMonte Carlo Methods
Monte Carlo MethodsJames Bell
 
Probability distribution
Probability distributionProbability distribution
Probability distributionRanjan Kumar
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Fabian Pedregosa
 
A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsFrank Nielsen
 
2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlonozomuhamada
 
Lecture 3: Stochastic Hydrology
Lecture 3: Stochastic HydrologyLecture 3: Stochastic Hydrology
Lecture 3: Stochastic HydrologyAmro Elfeki
 
Phase-Type Distributions for Finite Interacting Particle Systems
Phase-Type Distributions for Finite Interacting Particle SystemsPhase-Type Distributions for Finite Interacting Particle Systems
Phase-Type Distributions for Finite Interacting Particle SystemsStefan Eng
 
Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationUmberto Picchini
 
Unbiased MCMC with couplings
Unbiased MCMC with couplingsUnbiased MCMC with couplings
Unbiased MCMC with couplingsPierre Jacob
 
Variational Bayes: A Gentle Introduction
Variational Bayes: A Gentle IntroductionVariational Bayes: A Gentle Introduction
Variational Bayes: A Gentle IntroductionFlavio Morelli
 
Digital control systems (dcs) lecture 18-19-20
Digital control systems (dcs) lecture 18-19-20Digital control systems (dcs) lecture 18-19-20
Digital control systems (dcs) lecture 18-19-20Ali Rind
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3MuhannadSaleh
 
Paris Lecture 4: Practical issues in Bayesian modeling
Paris Lecture 4: Practical issues in Bayesian modelingParis Lecture 4: Practical issues in Bayesian modeling
Paris Lecture 4: Practical issues in Bayesian modelingShravan Vasishth
 

Similar to PRML Reading Chapter 11 - Sampling Method (20)

Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptx
 
Monte Carlo Methods
Monte Carlo MethodsMonte Carlo Methods
Monte Carlo Methods
 
Probability distribution
Probability distributionProbability distribution
Probability distribution
 
Teknik Simulasi
Teknik SimulasiTeknik Simulasi
Teknik Simulasi
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributions
 
Input analysis
Input analysisInput analysis
Input analysis
 
2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo
 
Lecture 3: Stochastic Hydrology
Lecture 3: Stochastic HydrologyLecture 3: Stochastic Hydrology
Lecture 3: Stochastic Hydrology
 
A bit about мcmc
A bit about мcmcA bit about мcmc
A bit about мcmc
 
Phase-Type Distributions for Finite Interacting Particle Systems
Phase-Type Distributions for Finite Interacting Particle SystemsPhase-Type Distributions for Finite Interacting Particle Systems
Phase-Type Distributions for Finite Interacting Particle Systems
 
Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computation
 
Unbiased MCMC with couplings
Unbiased MCMC with couplingsUnbiased MCMC with couplings
Unbiased MCMC with couplings
 
Variational Bayes: A Gentle Introduction
Variational Bayes: A Gentle IntroductionVariational Bayes: A Gentle Introduction
Variational Bayes: A Gentle Introduction
 
Digital control systems (dcs) lecture 18-19-20
Digital control systems (dcs) lecture 18-19-20Digital control systems (dcs) lecture 18-19-20
Digital control systems (dcs) lecture 18-19-20
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3
 
Paris Lecture 4: Practical issues in Bayesian modeling
Paris Lecture 4: Practical issues in Bayesian modelingParis Lecture 4: Practical issues in Bayesian modeling
Paris Lecture 4: Practical issues in Bayesian modeling
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 

More from Ha Phuong

QTML2021 UAP Quantum Feature Map
QTML2021 UAP Quantum Feature MapQTML2021 UAP Quantum Feature Map
QTML2021 UAP Quantum Feature MapHa Phuong
 
CCS2019-opological time-series analysis with delay-variant embedding
CCS2019-opological time-series analysis with delay-variant embeddingCCS2019-opological time-series analysis with delay-variant embedding
CCS2019-opological time-series analysis with delay-variant embeddingHa Phuong
 
SIAM-AG21-Topological Persistence Machine of Phase Transition
SIAM-AG21-Topological Persistence Machine of Phase TransitionSIAM-AG21-Topological Persistence Machine of Phase Transition
SIAM-AG21-Topological Persistence Machine of Phase TransitionHa Phuong
 
001 20151005 ranking_nodesingrowingnetwork
001 20151005 ranking_nodesingrowingnetwork001 20151005 ranking_nodesingrowingnetwork
001 20151005 ranking_nodesingrowingnetworkHa Phuong
 
Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Ha Phuong
 
Prediction io–final 2014-jp-handout
Prediction io–final 2014-jp-handoutPrediction io–final 2014-jp-handout
Prediction io–final 2014-jp-handoutHa Phuong
 
A Study on Privacy Level in Publishing Data of Smart Tap Network
A Study on Privacy Level in Publishing Data of Smart Tap NetworkA Study on Privacy Level in Publishing Data of Smart Tap Network
A Study on Privacy Level in Publishing Data of Smart Tap NetworkHa Phuong
 

More from Ha Phuong (7)

QTML2021 UAP Quantum Feature Map
QTML2021 UAP Quantum Feature MapQTML2021 UAP Quantum Feature Map
QTML2021 UAP Quantum Feature Map
 
CCS2019-opological time-series analysis with delay-variant embedding
CCS2019-opological time-series analysis with delay-variant embeddingCCS2019-opological time-series analysis with delay-variant embedding
CCS2019-opological time-series analysis with delay-variant embedding
 
SIAM-AG21-Topological Persistence Machine of Phase Transition
SIAM-AG21-Topological Persistence Machine of Phase TransitionSIAM-AG21-Topological Persistence Machine of Phase Transition
SIAM-AG21-Topological Persistence Machine of Phase Transition
 
001 20151005 ranking_nodesingrowingnetwork
001 20151005 ranking_nodesingrowingnetwork001 20151005 ranking_nodesingrowingnetwork
001 20151005 ranking_nodesingrowingnetwork
 
Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)
 
Prediction io–final 2014-jp-handout
Prediction io–final 2014-jp-handoutPrediction io–final 2014-jp-handout
Prediction io–final 2014-jp-handout
 
A Study on Privacy Level in Publishing Data of Smart Tap Network
A Study on Privacy Level in Publishing Data of Smart Tap NetworkA Study on Privacy Level in Publishing Data of Smart Tap Network
A Study on Privacy Level in Publishing Data of Smart Tap Network
 

Recently uploaded

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

PRML Reading Chapter 11 - Sampling Method

  • 1. VC.M. Bishop’s PRML Tran Quoc Hoan @k09hthaduonght.wordpress.com/ 10 January 2016, PRML Reading, Hasegawa lab., Tokyo The University of Tokyo Chapter 11: Sampling Methods
  • 2. Introduction Introduction 2 Generating a random number is not easy! True Random 
 Number Pseudo-Random 
 Number Gather “entropy”, or seemingly 
 random data from the physical world Seed Random number PRG http://www.howtogeek.com/183051/htg-explains-how-computers-generate-random-numbers/ Ex. Mersenne Twister Generating a number follow a probability distribution is more difficult!
  • 3. For today’s meeting Agendas 3 • Cover PRML chapter 11 • Understand the general concept of sampling from desired distribution • Introduction to MCMC world • More about MCMC • Details in PaperAlert
  • 4. Outline Sampling Methods 4 Basic Sampling 
 Algorithms Markov Chain 
 Monte Carlo Gibbs Sampling Slice Sampling Hybrid Monte Carlo Estimating the Partition 
 Function Part I: General concept 
 of basic sampling Part II: MCMC world
  • 5. Progress… 5 Basic Sampling
 Algorithms Markov Chain
 Monte Carlo Gibbs Sampling Slice Sampling Hybrid Monte Carlo Estimating the
 Partition Function Sampling Methods
  • 6. Standard distributions 11.1 Basic Sampling Algorithms 6 • Goal: Sampling from desired distribution p(y) . • Assumption: can generate random in U[0,1] z y = h 1 (z) h(y) = Z y 1 p(x)dx Generate random Transform Uniformly distributed h is cumulative distribution of p 0  y < 1 h(y) = 1 exp( y) p(y) = exp( y) y = 1 ln(1 z) Ex. Consider exponential distribution where then and p(y) = p(z) dz dy If h-1 is easy to know
  • 7. Transformation method 711.1 Basic Sampling Algorithms
  • 8. Rejection Sampling 811.1 Basic Sampling Algorithms • Assumption 1: Sampling from p(z) is difficult but we are able to evaluate p(z) for any given value of z, up to some normalizing constant Z • Assumption 2: We know how to sample from a proposal distribution q(z) and there exist a constant k such that p(z) = 1 Zp ˜p(z) kq(z) ˜p(z) • Then we know algorithm to obtain independent samples from p(.) (11.13)
  • 9. Rejection Sampling 911.1 Basic Sampling Algorithms Generate z0 from proposal q(.) Consider constant k such that kq(z) cover p~(z) Generate u0 from U[0, kq(z0)] Reject z0 if Keep z0 if u0 > ˜p(z0) u0  ˜p(z0) • Efficiency of the method depend on the ratio between the grey area and the white area • Proof p(accept) = Z ˜p(z) kq(z) q(z)dz = 1 k Z ˜p(z)dz
  • 10. Rejection Sampling Example 1011.1 Basic Sampling Algorithms • Sampling from Gamma distribution (green curve) Gam(z|a, b) = ba za 1 exp( bz) (a) at z = (a-1)/b • Proposal distribution -> Cauchy distribution (red curve) q(z) = c0 1 + (z z0)2 d2 0 achieved by transforming z = d0 tan(⇡u) + z0 where u draw uniformly from [0, 1] • We need to find z0, c0, d0 such that q(z) is greater (or equal) everywhere to Gam(z|a,b) with smallest d0c0 (defines area) z0 = a 1 b , d2 0 = 2a 1, c0 = 1 ⇡d0
  • 11. Adaptive Rejection Sampling 1111.1 Basic Sampling Algorithms • The proposal distribution q(.) may be difficult to construct. Fig 11.6: If a sample point is rejected, it is added to the set of the grid points and used to refine the envelope distribution. Construct q(z) from initial grid points Generate z4 from q(z) Generate u0 from U[0, q(z4)] Reject z4 if Keep z4 if • Rejection sampling methods are inefficient if sampling in high dimension (exponential decrease of acceptance rate with dimensionality) u0  ˜p(z4) u0 > ˜p(z4) but it is used to refine the envelope z4
  • 12. Importance Sampling 1211.1 Basic Sampling Algorithms IntegralBasic idea: Transform the integral into an expectation over a simple, known distribution p(z) f(z) z q(z) Conditions: q(z) > 0 when f(z)p(z) ≠ 0 Easy to sample from q(z) E[f] = Z f(z)p(z)dz E[f] = Z f(z)p(z) q(z) q(z) dz E[f] = Z f(z) p(z) q(z) q(z)dz E[f] = 1 S X s w(s) f(z(s) ) Proposal Importance weight Monte Carlo correct the bias introduced by sampling from a wrong distribution • All the generated samples are retained Normalized w(s) / p(z(s) ) q(z(s)) z(s) ⇠ q(z)
  • 13. SIR(sampling-importance-resampling) 1311.1 Basic Sampling Algorithms • Rejection sampling: choosing q(z) and constant k is not suitable way • SIR: based on the use of a proposal distribution q(z) but avoids having to determine the constant k 1. Draw L samples z(1) , z(2) , ...z(L) from q(z) 2. Calculate the importance weight p(z(l) ) q(z(l)) 8l = 1...L 3. Normalize the weights to obtain w1...wL 4. Draw a second set of L samples from the discrete distribution (z(1) , z(2) , ...z(L) ) with probabilities (w1...wL) • The resulting L samples are distributed according to p(z) if L -> ∞
  • 14. SIR(sampling-importance-resampling) 1411.1 Basic Sampling Algorithms 1. Draw L samples z(1) , z(2) , ...z(L) from q(z) 2. Calculate the importance weight p(z(l) ) q(z(l)) 8l = 1...L 3. Normalize the weights to obtain w1...wL 4. Draw a second set of L samples from the discrete distribution (z(1) , z(2) , ...z(L) ) with probabilities (w1...wL) • Proof = P l I(z(l)  a)˜p(z(l) )/q(z(l) ) P l ˜p(z(l))/q(z(l)) p(z  a) = X l:z(l)a wl p(z  a) = R I(z  a){˜p(z)/q(z)}q(z)dz R {˜p(z)/q(z)}q(z)dz = R I(z  a)˜p(z)dz R ˜p(z)dz = Z I(z  a)p(z)dz I(F) = 1 if F is TRUE else 0 If L ! 1 then
  • 15. Sampling and the EM algorithm 1511.1 Basic Sampling Algorithms • Use some Monte Carlo method to approximate the expectation of the E-step Monte Carlo EM algorithm • The expected complete-data log likelihood, given by 
 (Z: hidden; X: observed; : parameters)✓ (11.28)Q(✓, ✓old ) = Z p(Z|X, ✓old ) ln p(Z, X|✓)dZ may be approximated by (where Z(l) are drawn from p(Z, X|✓old ) ) Q(✓, ✓old ) ⇡ 1 L LX l=1 ln p(Z(l) , X|✓) (11.29) Stochastic EM algorithm • Considering a finite mixture model, only one sample Z may be drawn at each E-step (makes a hard assignment of each data point to one of the components)
  • 16. IP Algorithm 1611.1 Basic Sampling Algorithms • For a full Bayesian treatment in which we wish to draw samples from the joint posterior p(✓, Z|X) IP algorithm • I-step. We wish to sample from p(Z|X) but we cannot do this directly. Notice that p(Z|X) = Z p(Z|✓, X)p(✓|X)d✓ (11.30) p(✓|X), and then use this to draw a sample Z(l) from p(Z|✓(l) , X) for l = 1...L we first draw a sample ✓(l) from the current estimate for • P-step. Given the relation p(✓|X) = Z p(✓|Z, X)p(Z|X)dZ (11.31) we use the samples {Z(l)} obtained from I-step to compute p(✓|X) ⇡ 1 L LX l=1 ln p(✓|Z(l) , X) (11.32)
  • 17. In Reviews… 1711.1 Basic Sampling Algorithms • Inverse function method - Analytical reliable but unable to deal with complicated distribution • Rejection sampling - Able to deal with complicated distribution but difficult to choose proposal distribution and constant k - Sometimes, it wastes samples due to rejection process • Adaptive rejection sampling - Use envelope function to reduce rejected samples. - Difficult to deal with high dimension, sharp peak distribution • Importance sampling - Approximate expectation with weights in proposal distribution, not sample from desired distribution • SIR - Combine rejection sampling and importance sampling • Monte Carlo EM • IP algorithm for data expand
  • 18. Progress… 18 Basic Sampling 
 Algorithms Markov Chain 
 Monte Carlo Gibbs Sampling Slice Sampling Hybrid Monte Carlo Estimating the Partition 
 Function Part I: General concept 
 of basic sampling Part II: Welcome to MCMC world Sampling Methods
  • 19. Markov Chain Monte Carlo (MCMC) 1911.2 Markov Chain Monte Carlo • MCMC: general strategy which allows sampling from a large class of distribution (based on the mechanism of Markov chains) • MCMC scales well with the dimensionality of the sample space Posterior distributionMLE Likelihood function MCMC Estimate valueWrong estimate Estimate top of mountain 
 (depend on initial value) Estimate posterior distribution 
 (approach to global optimal, not depend on initial value) Slice sampling Gibbs sampling Metropolis method Metropolis-Hastings Method Markov Chain Monte Carlo Inverse function Rejection sampling Adaptive rejection sampling Importance sampling SIR Data expand sampling
  • 20. MCMC: the idea 2011.2 Markov Chain Monte Carlo • Goal: to generate a set of samples from p(z) • Idea: to generate samples from a Markov Chain whose invariant distribution is p(z) 1. Knowing the current sample is z(τ), generate a candidate sample z* from a proposal distribution q(z|z(τ)) 2. Accept the sample according to an appropriate criterion. 3. If the candidate sample is accepted then z(τ+1) = z* otherwise z(τ+1) = z(τ) • The proposal distribution depends on the current state • Samples z(1),z(2),… form a Markov chain and the distribution of z(τ) tends to p(z) as τ -> ∞ • Assumption: We know how to evaluate (but not Zp)˜p(z) = Zpp(z)
  • 21. Metropolis Algorithm 2111.2 Markov Chain Monte Carlo • The proposal distribution is symmetric • The candidate sample is accepted with probability q(zA|zB) = q(zB|zA) A(z⇤ , z(⌧) ) = min ✓ 1, ˜p(z⇤ ) ˜p(z(⌧)) ◆ (11.33) Fig 11.9: The proposal distribution is an isotopic Gaussian distribution whose std = 0.2. Accepted steps in green, rejected steps in red, std contour is ellipse. 150 candidate samples, 43 rejected.
  • 22. Markov Chains 2211.2 Markov Chain Monte Carlo • Q: under what circumstances will a Markov chain converge to the desired distribution ? • First order Markov chain: series of random variables z(1), …,z(M) such that p(z(m+1) |z(1) , ..., z(m) ) = p(z(m+1) |z(m) ) 8m (11.37) • Markov chain specified by p(z(0)) and the transition probabilities Tm(z(m) , z(m+1) ) = p(z(m+1) |z(m) ) • A distribution p*(z) is said to be invariant for a Markov chain if p⇤ (z) = X z0 T(z0 , z)p⇤ (z0 ) with a sufficient condition is to choose the transitions to satisfy the property of detailed balance p⇤ (z)T(z, z0 ) = T(z0 , z)p⇤ (z0 ) (11.40)
  • 23. Markov Chains 2311.2 Markov Chain Monte Carlo
  • 24. Ergodicity 2411.2 Markov Chain Monte Carlo Unique invariant distribution if ‘forget’ starting point, z(0) Image source: Murray, MLSS 2009 slides
  • 25. Markov Chains 2511.2 Markov Chain Monte Carlo (11.40) • Goal: to generate a set of samples from p(z) • Idea: to generate samples from a Markov Chain whose invariant distribution is p(z) • How: choose the transition probability T( z, z’ ) satisfy the property of detailed balance for p(z) p(z)T(z, z0 ) = T(z0 , z)p(z0 ) • T( z, z’ ) can be constructed from a set of “base” transitions B1, B2, …,Bk T(z0 , z) = KX k=1 ↵kBk(z0 , z) T(z0 , z) = X z1 ... X zK 1 B1(z0 , z1)...BK 1(zK 2, zK 1)BK(zK 1, z) or (11.42) (11.43)
  • 26. The Metropolis-Hasting Algorithm 2611.2 Markov Chain Monte Carlo • Generalization of the Metropolis algorithm (the proposal distribution q is no longer symmetric). • Knowing the current sample is z(τ), generate a candidate sample z* from a proposal distribution q(z|z(τ)) • Accept it with probability Ak(z⇤ , z(⌧) ) = min ✓ 1, ˜p(z⇤ )qk(z(⌧) |z⇤ ) ˜p(z(⌧))qk(z⇤|z(⌧)) ◆ (11.44) where k labels the members of the set of possible transitions being considered.
  • 27. The Metropolis-Hasting Algorithm 2711.2 Markov Chain Monte Carlo • Prove that p(z) is the invariant distribution of the chain • Notice that the transition probability of this chain is defined as • We need to prove p(z)Tk(z, z0 ) = Tk(z0 , z)p(z0 ) Ak(z⇤ , z(⌧) ) = min ✓ 1, ˜p(z⇤ )qk(z(⌧) |z⇤ ) ˜p(z(⌧))qk(z⇤|z(⌧)) ◆ p(z) = ˜p(z)/Zp Proof Tk(z, z0 ) = qk(z0 |z)Ak(z0 , z) p(z)qk(z0 |z)Ak(z0 , z) = min(p(z)qk(z0 |z), p(z0 )qk(z|z0 )) Use = min(p(z0 )qk(z|z0 ), p(z)qk(z0 |z)) = p(z)qk(z|z0 )Ak(z, z0 ) (Q.E.D)
  • 28. The Metropolis-Hasting Algorithm 2811.2 Markov Chain Monte Carlo • Common choice for q: Gaussian centered on the current state ✓ small variance -> high rate of acceptation but slow exploration of the state space + non independent samples ✓ large variance -> high rate of rejection Fig 11.10: Use of an isotropic Gaussian proposal (blue circle) to sample from a Gaussian distribution (red). The scale ρ of the proposal should be on the order of σmin , but the algorithm may have low convergence (to explore the state space in the other direction -> (σmax/σmin)2 iterations required)
  • 29. Summary so far… 2911.2 Markov Chain Monte Carlo • We need approximate methods to solve sum/integrals • Monte Carlo does not explicitly depend on dimension, although simple methods work only in low dimensions • Markov Chain Monte Carlo (MCMC) can make local moves. By assuming less, it’s more applicable to higher dimensions • Simple computations => “easy” to implement (harder to diagnose)
  • 30. Progress… 30 Basic Sampling
 Algorithms Markov Chain
 Monte Carlo Gibbs Sampling Slice Sampling Hybrid Monte Carlo Estimating the
 Partition Function Sampling Methods
  • 31. Gibbs Sampling 3111.3 Gibbs Sampling • Sample each variable in turn, conditioned on the values of all other variables in the distribution (method with no rejection) ✓ Initialize {z1, z2, …, zM} ✓ For τ = 1,2,…,T pick each variable in sequently turn or randomly and resample z⌧+1 i / p(zi|z⌧ i) for i = 1...M Proof of validity • Consider a Metropolis-Hastings sampling step involving the variable zk in which the remaining variables zk remain fixed and the transition probability qk(z⇤ |z) = p(z⇤ k|zk) then, acceptance probability is Ak(z⇤ , z) = p(z⇤ )qk(z|z⇤ ) p(z)qk(z⇤|z) = p(z⇤ k|z⇤ k)p(z⇤ k)p(zk|z⇤ k) p(zk|zk)p(zk)p(z⇤ k|zk) = 1 where z⇤ k = zk
  • 32. Gibbs Sampling 3211.3 Gibbs Sampling Fig 11.11: Illustration of Gibbs sampling, by alternate updates of two variables (blue steps) whose distribution is a correlated Gaussian (red). The step size is governed by the standard deviation of the conditional distribution (green curve), and is O(l), leading to slow progress. The number of steps needed to obtain an independent sample from the distribution is O((L/l)2)
  • 33. Progress… 33 Basic Sampling
 Algorithms Markov Chain
 Monte Carlo Gibbs Sampling Slice Sampling Hybrid Monte Carlo Estimating the
 Partition Function Sampling Methods
  • 34. Auxiliary variables 3411.4 Slice Sampling • Collapsing: analytically integrate variables out • Auxiliary methods Introduce extra variables integrate by MCMC Explore where⇡(✓, h) Z ⇡(✓, h)dh = ⇡(✓)
  • 35. Slice Sampling 3511.4 Slice Sampling • Problem of Metropolis algorithm ( proposal q(z|z’) = q(z’|z) ) ✓ Step size is too small, slow convergence (random walk behavior) ✓ Step size is too large, high estimator variance (high rejection rate) • Idea: adapt step size automatically to suitable value • Technique: introduce variable u and sample (u, z) jointly. Ignoring u leads to the desired samples of p(z)
  • 36. Slice Sampling 3611.4 Slice Sampling • Sample z and u uniformly from area under the distribution ✓ Fix z, sample u uniform from ✓ Fix u, sample z uniform from the slice through the distribution • How to sample z from the slice slice [0, ˜p(z)] {z : ˜p(z) > u} ✓ Start with the region of width w containing z(τ) ✓ If end point in slice, then extend region by w in that direction ✓ Sample z’ uniform from region ✓ If z’ in slice, then accept as z(τ+1) ✓ If not: make z’ new end point of the region, and resample z’ Multivariate distribution: slice sampling within Gibbs sampler See next slides for more details
  • 37. Slice Sampling Idea 3711.4 Slice Sampling ˜p(z) (z, u) z Sample uniformly under curve ˜p(z) / p(z) p(u|z) = Uniform[0, ˜p(z)] p(z|u) / ( 1 if ˜p(z) u 0 if otherwise = Uniform on the slice u Slide from MCMC NIPS2015 tutorial
  • 38. Slice Sampling Idea 3811.4 Slice Sampling Rejection sampling p(z|u) using broader uniform z (z, u) u Unimodal conditionals Slide from MCMC NIPS2015 tutorial
  • 39. Slice Sampling Idea 3911.4 Slice Sampling Adaptive rejection sampling p(z|u) z (z, u) u Unimodal conditionals Slide from MCMC NIPS2015 tutorial
  • 40. Slice Sampling Idea 4011.4 Slice Sampling Quickly find new z and no rejection recorded z (z, u) u | Unimodal conditionals Slide from MCMC NIPS2015 tutorial
  • 41. Slice Sampling Idea 4111.4 Slice Sampling Multimodal conditionals ˜p(z) (z, u) u z Use updates that leave p(z|u) invariant - place bracket randomly around point - linearly step out until ends are off slice - sample on bracket, shrinking as before Slide from MCMC NIPS2015 tutorial
  • 42. Progress… 42 Basic Sampling
 Algorithms Markov Chain
 Monte Carlo Gibbs Sampling Slice Sampling Hybrid Monte Carlo Estimating the
 Partition Function Sampling Methods
  • 43. Hybrid Monte Carlo 4311.5 Hybrid Monte Carlo • Problem of Metropolis algorithm is the step size trade-off • Hybrid Monte Carlo is suitable in continuous state spaces ✓ Able to make large jumps in state space with low rejection rate ✓ Adopts physical system (Hamiltonian) dynamics rather than a probability distribution to propose future states in the Markov chain. • Goal: to sample from p(z) = 1 Zp exp( E(z)) where E(z) is considered as potential energy function of system over z
  • 44. Hamiltonian dynamics 4411.5 Hybrid Monte Carlo • Hamiltonian dynamics describe how kinetic energy is converted to potential energy (and vice versa) as an object moves throughout in time • Evolution of state variable z = {zi} under continuous time τ. • Momentum variables correspond to rate of change of state. ri = dzi d⌧ (11.53)Join (z, r) space is called phase space • For each location the object takes, there is an associated potential energy E(z), and for each momentum there is an associated kinetic energy K(r). Total energy of the system is constant and known as Hamiltonian H(z, r) = E(z) + K(r) and @ri @⌧ = @H @zi = @E(z) @zi @zi @⌧ = @H @ri = @K(r) @ri • Preserve volume in phase space div V = 0 with V = ✓ dz d⌧ , dr d⌧ ◆ (11.62)
  • 45. Simulating Hamiltonian dynamics 4511.5 Hybrid Monte Carlo @ri @⌧ = @H @zi = @E(z) @zi @zi @⌧ = @H @ri = @K(r) @ri • If we have expression for partial and a set of initial conditions (z0, r0), we can predict the location and momentum at any point in time. Leap Frog method (run for L steps to simulate dynamics over L x δ units of time) 1. Take a half step in time to update the momentum variable ri(⌧ + /2) = ri(⌧) ( /2) @E @zi(⌧) zi(⌧ + ) = zi(⌧) + @K @ri(⌧ + /2) 2. Take a full step in time to update the position variable 3. Take the remaining half step in time to finish updating the momentum variable ri(⌧ + ) = ri(⌧ + /2) ( /2) @E @zi(⌧ + )
  • 46. Simulating Hamiltonian oscillator 4611.5 Hybrid Monte Carlo F = kz K(v) = (mv)2 2m = v2 2 = r2 2 = K(r) Leap Frog equations 1. r(⌧ + /2) = r(⌧) ( /2)z(⌧) 2. z(⌧ + ) = z(⌧) + ( )r(⌧ + /2) 3. r(⌧ + ) = r(⌧ + /2) ( /2)z(⌧ + ) r z E+K H Energy Phase Space Img Ref. https://theclevermachine.wordpress.com/2012/11/18/ mcmc-hamiltonian-monte-carlo-a-k-a-hybrid-monte-carlo/ E(z) = Z Fdz = kz2 2 Harmonic Oscillator
  • 47. Target distribution 4711.5 Hybrid Monte Carlo • Consider canonical distribution p(✓) = 1 Zp exp( E(✓)) • Canonical distribution for the Hamiltonian dynamics energy function is p(z, r) / exp( H(z, r)) = exp( E(z) K(r)) / p(z)p(r) state z and momentum r are independently distributed • We can use Hamiltonian dynamics to sample from the joint canonical distribution over r and z and simply ignore the momentum contributions. idea of introducing auxiliary variables (r) to facilitate the Markov chain of (z) • A common choose K(r) = rT r 2 and E(z) = log p(z)
  • 48. Hybrid Monte Carlo 4811.5 Hybrid Monte Carlo • Combination of Metropolis algorithm and Hamiltonian Dynamics Algorithm to draw M samples from a target distribution 1. Set τ = 0 2. Generate an initial position state z(0) ~ π(0) 3. Repeat until τ = M Set τ = τ + 1 - Sample a new initial momentum variable from the momentum canonical distribution r0 ~ p(r) - Set z0 = z(τ - 1) - Run Leap Frog algorithm starting at [z0, r0] for L step and step size δ to obtain proposed states z* and r* - Calculate the Metropolis acceptance probability ↵ = min(1, exp{H(z0, r0) H(z⇤ , r⇤ )}) - Draw a random number u uniformly from [0, 1] If u ≤ α accept the position and set the next state z(τ) = z* else set z(τ)= z(τ-1)
  • 49. Hybrid Monte Carlo simulation 4911.5 Hybrid Monte Carlo Hamiltonian Monte Carlo for sampling a Bivariate Normal distribution E(z) = log(e zT ⌃ 1z 2 ) const p(z) = N(µ, ⌃) with µ = [0, 0] The MH algorithm converges much slower than HMC, and consecutive samples have much higher autocorrelation than samples drawn using HMC Img Source. https://theclevermachine.wordpress.com/2012/11/18/ mcmc-hamiltonian-monte-carlo-a-k-a-hybrid-monte-carlo/
  • 50. Detailed balance 5011.5 Hybrid Monte Carlo Transition probability going from R to R’ Transition probability going from R’ to R 1 ZH exp( H(R)) V 1 2 min{1, exp(H(R) H(R0 ))} 1 ZH exp( H(R0 )) V 1 2 min{1, exp(H(R0 ) H(R))} Update after sequence of L leapfrog iterations of step size δ
 the leapfrog integration preserves phase-space volume R R’ = time-reversible prob of choosing positive step size δ or negative step size -δ
  • 51. Progress… 51 Basic Sampling
 Algorithms Markov Chain
 Monte Carlo Gibbs Sampling Slice Sampling Hybrid Monte Carlo Estimating the
 Partition Function Sampling Methods
  • 52. Estimating the Partition Function 5211.6 Estimating the Partition Function • Most sampling algorithms require distribution up to the constant partition function ZE (not needed in order to draw samples from p(z)) pE(z) = 1 ZE exp{ E(z)} ZE = X z exp{ E(z)} • Partition function is useful for model comparison (because it represent for the probability of observed data). p(hidden|observed) = p(hidden, observed) p(observed) • For model comparison, we’re interested in ratio of partition functions
  • 53. Using importance sampling 5311.6 Estimating the Partition Function • Use importance sampling from proposal pG with energy G(z) ZE ZG = P z exp( E(z)) P z exp( G(z)) = P z exp( E(z) + G(z)) exp( G(z)) P z exp( G(z)) = EpG [exp( E(z) + G(z))] ' 1 L exp( E(z(l) ) + G(z(l) )) (11.72) sampled from pG • Problem: pG need match pE • Idea: we can use samples z(l) from pE from a Markov chain • If ZG is easy to compute we can estimate ZE pG(z) = 1 L LX l=1 T(z(l) , z) (11.73) where T gives the transition probabilities of the chain • We now define G(z) = -log pG(z) and use in (11.72)
  • 54. Chaining 5411.6 Estimating the Partition Function • Partition function ratio estimation requires matching distributions. • Partition function ZG needs to be evaluated exactly (but only simple distribution) => Poor matching with complicated distribution • Idea: use set of distributions between the simple p1 and complex pM ZM Z1 = Z2 Z1 Z3 Z2 ... ZM ZM 1 E↵(z) = (1 ↵)E1(z) + ↵EM (z) • The intermediate distributions interpolate from E1 to EM (11.74) (11.75) • Use single Markov chain run initially for the system p1 and then after some suitable number of steps moves on to the next distribution in the sequence.
  • 55. Summary 55 Basic Sampling 
 Algorithms Markov Chain 
 Monte Carlo Gibbs Sampling Slice Sampling Hybrid Monte Carlo Estimating the Partition 
 Function Part I: General concept 
 of basic sampling Part II: MCMC world Sampling Methods
  • 56. Papers Alert 56Sampling Methods • Markov Chain Monte Carlo Method without Detailed Balance http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.105.120603 • Hamiltonian Annealed Importance Sampling for partition function estimation http://arxiv.org/abs/1205.1925 • Hamiltonian Monte Carlo with Reduced Momentum Flips (2010) Hidemaro Suwa and Synge Todo (2012) Jascha Sohl-Dickstein, Benjamin J. Culpepper (2012) Jascha Sohl-Dickstein http://arxiv.org/abs/1205.1939 http://jmlr.org/proceedings/papers/v32/sohl-dickstein14.pdf • Hamiltonian Monte Carlo Without Detailed Balance (2014) Jascha Sohl-Dickstein • A Markov Jump Process for More Efficient Hamiltonian Monte Carlo (2015) Jascha Sohl-Dickstein http://arxiv.org/abs/1509.03808 http://jmlr.org/proceedings/papers/v37/salimans15.pdf • Markov Chain Monte Carlo and Variational Inference: Bridging the Gap (2015) Tim Salimans
  • 57. Observing Dark Worlds 57Dark Matter Worlds Halo Dark Matter bending the light from a background galaxy. In extreme cases the galaxy here is seen as the two arcs surrounding it https://www.kaggle.com/c/DarkWorlds
  • 58. Observing Dark Worlds 58Dark Matter Worlds Halo https://www.kaggle.com/c/DarkWorlds We observe that this stuff aggregates and forms massive structures called Dark Matter Halos. There are many galaxies behind a Dark Matter halo, their shapes will correlate with its position.
  • 59. Observing Dark Worlds 59Dark Matter Worlds Halo https://www.kaggle.com/c/DarkWorlds The task is then to use this “bending of light” to estimate where in the sky this dark matter is located.
  • 60. Observing Dark Worlds 60Dark Matter Worlds Halo https://www.kaggle.com/c/DarkWorlds• It is really one of statistics: given the noisy data (the elliptical galaxies) recover the model and parameters (position and mass of the dark matter) that generated them • Step 1: construct a prior distribution p(x) for halo positions (e.g. uniform) • Step 2: construct a probabilistic model for the data (observed ellipticities of the galaxies) p(e|x) p(ei|x) = N( X j=allhalos di,jmjf(ri,j), 2 ) http://timsalimans.com/observing-dark-worlds/ ✦ dij = tangential direction, i.e. the direction in which halo j bends the light of galaxy i ✦ mj is the mass of halo j ✦ f(rij) is a decreasing function in the euclidean distance rij between galaxy i and halo j. ✦For the large halos assign m as a log-uniform distribution in [40,180], and f(rij) = 1/max(rij, 240) ✦For the small halos, fixed the mass at 20 and f(rij) = 1/max(rij, 70)
  • 61. Observing Dark Worlds 61Dark Matter Worlds Halo • Step 3: Get posterior distribution for halo positions p(x|e) = p(e|x)p(x)/p(e) (simple random-walk Metropolis Hastings sampler to approximate the posterior distribution ) • Step 4: Minimization the expected loss ˜x = arg min prediction Ep(x|e)L(prediction, x) http://timsalimans.com/observing-dark-worlds/
  • 62. Dark Matter Worlds Halo Slide from MCMC NIPS2015 tutorial
  • 63. Dark Matter Worlds Halo Slide from MCMC NIPS2015 tutorial
  • 64. Dark Matter Worlds Halo Slide from MCMC NIPS2015 tutorial