Driving Behavioral Change for Information Management through Data-Driven Gree...
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models
1. REBAR: Low-variance, unbiased gradient estimates
for discrete latent variable models
Sangwoo Mo
KAIST AI Lab.
November 29, 2017
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 1 / 16
2. General Problem
Let z ∼ p(z|θ). Want to maximize
L(θ) = Ep(z)[f (z)1].
Example:
ELBO2
L(θ, φ) = Eqφ(z|x)[pθ(x|z)]
Policy Gradient
L(θ) = Epθ(τ)[R(τ)]
1
assume f (z) is independent to θ
2
omit KL term
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 2 / 16
3. General Problem
Let z ∼ p(z|θ). Want to maximize
L(θ) = Ep(z)[f (z)].
Want to optimize by gradient descent1. Need to compute
d
dθ
L(θ) =
d
dθ
Ep(z)[f (z)]
Caveat: We cannot simply put d
dθ inside since z depends on θ.
1
assume f (z) is differentiable
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 3 / 16
4. Background
REINFORCE:
d
dθ
Ep(z)[f (z)] =
d
dθ
f (z)p(z)dz
= f (z)
∂
∂θ
p(z)dz
= f (z)
∂
∂θ p(z)
p(z)
p(z)dz
= f (z)
∂
∂θ
log p(z)dz
= Ep(z) f (z)
∂
∂θ
log p(z)
It is unbiased, but variance is too high.
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 4 / 16
5. Background
Control variate: Subtract baseline c.
d
dθ
Ep(z)[f (z)] =
d
dθ
Ep(z,c)[f (z) − c] + Ep(z,c)[c]
= Ep(z,c) (f (z) − c)
∂
∂θ
log p(z) +
∂
∂θ
Ep(z,c)[c]
Qustion: How to choose proper1 c?
constant value e.g. Ep(z)[f (z)]
linear approximation of f arround Ep(z)[z]
1
i) c should be correlated to p(z), ii) if c
|=
θ, second term is eleminated
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 5 / 16
6. Background
Reparametrization trick: Assume z = g(θ, ).
d
dθ
Ep(z)[f (z)] =
d
dθ
f (z)p(z)dz
=
d
dθ
f (g(θ, ))p( )d
=
∂f
∂g
∂g
∂θ
p( )d
= Ep( )
∂f
∂g
∂g
∂θ
It is unbiased & low variance, and successful for continuous1 z
However, it is not directly applicable for discrete case
1
VAE assumes z ∼ N(µ, σ) and reparametrize it as z = µ + σ where ∼ N(0, 1)
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 6 / 16
7. Background
Gumbel-softmax trick:
It is well-known that z ∼ Cat(θ) is equivalent to
z = H(w) = arg maxi [log θi − log(− log( i ))]
where H is hard argmax, w = g(θ, ), and i ∼ Uniform(0, 1).
Instead of H, use softmax σλ(w) (with temperature λ).
Then σλ(g(θ, )) is differentiable reparametrization of z.
It is low variance, but biased.
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 7 / 16
8. REBAR
Motivation:
Gumbel-softmax is highly correlated biased estimator
Use Gumbel-softmax as control variate of REINFORCE
However, we can do more than na¨ıvely applying this idea
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 8 / 16
9. REBAR
Observation:
We can reduce variance of REINFORCE by marginalizing w over z.
∂
∂θ
Ep(w) [f (σλ(w))] = Ep(w) f (σλ(w))
∂
∂θ
log p(w)
= Ep(z) Ep(w|z) f (σλ(w))
∂
∂θ
(log p(w|z) + log p(z))
= Ep(z)
∂
∂θ
Ep(w|z) [f (σλ(w))]
+ Ep(z) Ep(w|z)[f (σλ(w))]
∂
∂θ
log p(z)
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 9 / 16
10. REBAR
Observation:
Here, the first term can be reparametrized as
Ep(z)
∂
∂θ
Ep(w|z) [f (σλ(w))] = Ep(z) Ep(δ)
∂
∂θ
f (σλ(˜w))
where ˜w = ˜g(θ, z, δ)1 and δi ∼ Uniform(0, 1).
1
conditional distribution of g given z
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 10 / 16
11. REBAR
Putting it all together,
∂
∂θ
Ep(z)[f (z)] = E ,δ [f (H(w)) − ηf (σλ(˜w))]
∂
∂θ
log p(z)
z=H(w)
+ η
∂
∂θ
f (σλ(w)) − η
∂
∂θ
f (σλ(˜w))
where w = g(θ, ), ˜w = ˜g(θ, H(w), δ), and i , δi ∼ Uniform(0, 1).
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 11 / 16
12. Hyperparameter Optimization
Let r(η, λ) be the Monte Carlo REBAR estiamtor.
Since r is unbiased, E[r] does not depend on η and λ. Thus,
∂
∂η
Var(r) =
∂
∂η
E[r2
] − E[r]2
= E 2r
∂r
∂η
.
Now we can optimize η (and λ) to minimize variance.
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 12 / 16
13. Experiments
Minimize Ep(z)[(z − 0.45)2] where z ∼ Bernoulli(θ).
left: log variance / right: loss
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 13 / 16
14. Experiments
Maximize ELBO of Sigmoid Belief Network
log p(x|θ) ≥ Eq(z|x,θ)[log p(x, z|θ) − log q(z|x, θ)]
left: 2-layer linear / right: 1-layer nonlinear (log variance)
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 14 / 16
15. Experiments
Maximize ELBO of Sigmoid Belief Network
log p(x|θ) ≥ Eq(z|x,θ)[log p(x, z|θ) − log q(z|x, θ)]
left: 2-layer linear / right: 1-layer nonlinear (objective)
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 15 / 16