REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models

REBAR: Low-variance, unbiased gradient estimates
for discrete latent variable models
Sangwoo Mo
KAIST AI Lab.
November 29, 2017
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 1 / 16

General Problem
Let z ∼ p(z|θ). Want to maximize
L(θ) = Ep(z)[f (z)1].
Example:
ELBO2
L(θ, φ) = Eqφ(z|x)[pθ(x|z)]
Policy Gradient
L(θ) = Epθ(τ)[R(τ)]
1
assume f (z) is independent to θ
2
omit KL term

General Problem
Let z ∼ p(z|θ). Want to maximize
L(θ) = Ep(z)[f (z)].
Want to optimize by gradient descent1. Need to compute
d
dθ
L(θ) =
d
dθ
Ep(z)[f (z)]
Caveat: We cannot simply put d
dθ inside since z depends on θ.
1
assume f (z) is diﬀerentiable

Background
REINFORCE:
d
dθ
Ep(z)[f (z)] =
d
dθ
f (z)p(z)dz
= f (z)
∂
∂θ
p(z)dz
= f (z)
∂
∂θ p(z)
p(z)
p(z)dz
= f (z)
∂
∂θ
log p(z)dz
= Ep(z) f (z)
∂
∂θ
log p(z)
It is unbiased, but variance is too high.

Background
Control variate: Subtract baseline c.
d
dθ
Ep(z)[f (z)] =
d
dθ
Ep(z,c)[f (z) − c] + Ep(z,c)[c]
= Ep(z,c) (f (z) − c)
∂
∂θ
log p(z) +
∂
∂θ
Ep(z,c)[c]
Qustion: How to choose proper1 c?
constant value e.g. Ep(z)[f (z)]
linear approximation of f arround Ep(z)[z]
1
i) c should be correlated to p(z), ii) if c
|=
θ, second term is eleminated

Background
Reparametrization trick: Assume z = g(θ, ).
d
dθ
Ep(z)[f (z)] =
d
dθ
f (z)p(z)dz
=
d
dθ
f (g(θ, ))p( )d
=
∂f
∂g
∂g
∂θ
p( )d
= Ep( )
∂f
∂g
∂g
∂θ
It is unbiased & low variance, and successful for continuous1 z
However, it is not directly applicable for discrete case
1
VAE assumes z ∼ N(µ, σ) and reparametrize it as z = µ + σ where ∼ N(0, 1)

Background
Gumbel-softmax trick:
It is well-known that z ∼ Cat(θ) is equivalent to
z = H(w) = arg maxi [log θi − log(− log( i ))]
where H is hard argmax, w = g(θ, ), and i ∼ Uniform(0, 1).
Instead of H, use softmax σλ(w) (with temperature λ).
Then σλ(g(θ, )) is diﬀerentiable reparametrization of z.
It is low variance, but biased.

REBAR
Motivation:
Gumbel-softmax is highly correlated biased estimator
Use Gumbel-softmax as control variate of REINFORCE
However, we can do more than na¨ıvely applying this idea

REBAR
Observation:
We can reduce variance of REINFORCE by marginalizing w over z.
∂
∂θ
Ep(w) [f (σλ(w))] = Ep(w) f (σλ(w))
∂
∂θ
log p(w)
= Ep(z) Ep(w|z) f (σλ(w))
∂
∂θ
(log p(w|z) + log p(z))
= Ep(z)
∂
∂θ
Ep(w|z) [f (σλ(w))]
+ Ep(z) Ep(w|z)[f (σλ(w))]
∂
∂θ
log p(z)

REBAR
Observation:
Here, the ﬁrst term can be reparametrized as
Ep(z)
∂
∂θ
Ep(w|z) [f (σλ(w))] = Ep(z) Ep(δ)
∂
∂θ
f (σλ(˜w))
where ˜w = ˜g(θ, z, δ)1 and δi ∼ Uniform(0, 1).
1
conditional distribution of g given z

REBAR
Putting it all together,
∂
∂θ
Ep(z)[f (z)] = E ,δ [f (H(w)) − ηf (σλ(˜w))]
∂
∂θ
log p(z)
z=H(w)
+ η
∂
∂θ
f (σλ(w)) − η
∂
∂θ
f (σλ(˜w))
where w = g(θ, ), ˜w = ˜g(θ, H(w), δ), and i , δi ∼ Uniform(0, 1).

Hyperparameter Optimization
Let r(η, λ) be the Monte Carlo REBAR estiamtor.
Since r is unbiased, E[r] does not depend on η and λ. Thus,
∂
∂η
Var(r) =
∂
∂η
E[r2
] − E[r]2
= E 2r
∂r
∂η
.
Now we can optimize η (and λ) to minimize variance.

Experiments
Minimize Ep(z)[(z − 0.45)2] where z ∼ Bernoulli(θ).
left: log variance / right: loss

Experiments
Maximize ELBO of Sigmoid Belief Network
log p(x|θ) ≥ Eq(z|x,θ)[log p(x, z|θ) − log q(z|x, θ)]
left: 2-layer linear / right: 1-layer nonlinear (log variance)

Experiments
Maximize ELBO of Sigmoid Belief Network
log p(x|θ) ≥ Eq(z|x,θ)[log p(x, z|θ) − log q(z|x, θ)]
left: 2-layer linear / right: 1-layer nonlinear (objective)

Questions?

REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models

Ähnlich wie REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models (20)

Mehr von Sangwoo Mo

Mehr von Sangwoo Mo (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models