A note on the density of Gumbel-softmax

•

1 like•348 views

This note explicates some details of the discussion given in Appendix B of E. Jang, S. Gu, and B. Poole. Categorical representation with Gumbel-softmax. ICLR, 2017.

Data & Analytics

A note on the density of Gumbel-softmax
Tomonari MASADA @ Nagasaki University
May 29, 2019
This note explicates some details of the discussion given in Appendix B of [1].
The Gumbel-softmax trick gives a k-dimensional sample vector y = (y1, . . . , yk) ∈ ∆k−1
whose entries
are obtained as
yi =
exp((log(πi) + gi)/τ)
k
j=1 exp((log(πj) + gj)/τ)
for i = 1, . . . , k, (1)
by using g1, . . . , gk, which are i.i.d samples drawn from Gumbel(0, 1).
Deﬁne xi = log(πi). Then y is rewritten as
yi =
exp((xi + gi)/τ)
k
j=1 exp((xj + gj)/τ)
for i = 1, . . . , k, (2)
Divide both numerator and denominator by exp((xk + gk)/τ).
yi =
exp((xi + gi − (xk + gk))/τ)
k
j=1 exp((xj + gj − (xk + gk))/τ)
for i = 1, . . . , k, (3)
Deﬁne ui = xi + gi − (xk + gk) for i = 1, . . . , k − 1, where gi ∼ Gumbel(0, 1). When gk is given,
gi = ui−xi+(xk+gk) and ui can thus be regarded as a sample from the Gumbel whose mean is xi−(xk+gk)
and scale parameter is 1. Therefore, p(ui|gk) = e−{(ui−xi+(xk+gk))+e−(ui−xi+(xk+gk))
}
. Consequently, the
density p(u1, . . . , uk−1) is given as follows:
p(u1, . . . , uk−1) =
∞
−∞
dgkp(u1, . . . , uk−1|gk)p(gk)
=
∞
−∞
dgkp(gk)
k−1
i=1
p(ui|gk)
=
∞
−∞
dgke−gk−e−gk
k−1
i=1
exi−ui−xk−gk−exi−ui−xk−gk
(4)
Perform a change of variables with v = e−gk
. Then dv
dgk
= −e−gk
. Therefore, dv = −e−gk
dgk and
dgk = −dvegk
= −dv/v.
p(u1, . . . , uk−1) =
0
∞
(−dv)e−v
k−1
i=1
vexi−ui−xk−vexi−ui−xk
=
k−1
i=1
exi−ui−xk
∞
0
dve−v
vk−1
k−1
i=1
e−vexi−ui−xk
= e−(k−1)xk
k−1
i=1
exi−ui
∞
0
dvvk−1
e−v(1+e−xk k−1
i=1 (exi−ui ))
(5)
Recall the following fact related to Gamma integral:
∞
0
xz−1
e−ax
dx =
∞
0
y
a
z−1
e−y dy
a
=
1
a
z ∞
0
yz−1
e−y
dy = a−z
Γ(z) (6)
1

Therefore,
p(u1, . . . , uk−1) = e−(k−1)xk
k−1
i=1
exi−ui
∞
0
dvvk−1
e−v(1+e−xk k−1
i=1 (exi−ui ))
= e−kxk
exk
k−1
i=1
exi−ui
1 + e−xk
k−1
i=1
(exi−ui
)
−k
Γ(k)
= exk
k−1
i=1
exi−ui
exk
+
k−1
i=1
(exi−ui
)
−k
Γ(k)
= exp xk +
k−1
i=1
(xi − ui) exk
+
k−1
i=1
(exi−ui
)
−k
Γ(k) (7)
Deﬁne uk = 0. Then
p(u1, . . . , uk−1) = Γ(k)
k
i=1
exp(xi − ui)
k
i=1
exp(xi − ui)
−k
(8)
A k-dimensional sample vector y = (y1, . . . , yk) ∈ ∆k−1
is obtained from u1, . . . , uk−1 by applying a
deterministic transformation h:
hi(u1, . . . , uk−1) =
exp(ui/τ)
1 +
k−1
j=1 exp(uj/τ)
for i = 1, . . . , k − 1 (9)
as follows:
yi = hi(u1, . . . , uk−1) for i = 1, . . . , k − 1 (10)
Note that yk is ﬁxed given y1, . . . , yk−1:
yk =
1
1 +
k−1
j=1 exp(uj/τ)
= 1 −
k−1
j=1
yj (11)
By using the change of variables we can obtain the density function for y:
p(y) = p(h−1
(y1, . . . , yk−1)) det
∂(h−1
1 (y1, . . . , yk−1), . . . , h−1
k−1(y1, . . . , yk−1))
∂(y1, . . . , yk−1)
(12)
The inverse of h is obtained as follows:
yi =
exp(ui/τ)
1 +
k−1
j=1 exp(uj/τ)
yi = yk exp(ui/τ) from yk = 1
1+ k−1
j=1 exp(uj /τ)
log yi = log yk + ui/τ
∴ h−1
i (y1, . . . , yk−1) = ui = τ × (log yi − log yk) = τ × log yi − log 1 −
k−1
j=1
yj (13)
Therefore, we obtain the Jacobian:
∂h−1
i (y1, . . . , yk−1)
∂yi
= τ ×
1
yi
−
1
yk
∂yk
∂yi
= τ ×
1
yi
+
1
yk
(14)
∂h−1
i (y1, . . . , yk−1)
∂yj
= τ × −
1
yk
∂yk
∂yj
= τ ×
1
yk
for j = i (15)
Eqs. from (24) to (28) are easy to understand.
References
[1] E. Jang, S. Gu, and B. Poole. Categorical representation with Gumbel-softmax. ICLR, 2017.
2

What's hot

Quantitative norm convergence of some ergodic averagesVjekoslavKovac1

Specific Finite Groups(General)Shane Nicklas

QMC Error SAMSI Tutorial Aug 2017Fred J. Hickernell

Specific Finite Groups(General)Shane Nicklas

Scattering theory analogues of several classical estimates in Fourier analysisVjekoslavKovac1

735虹毅 Hongyi 李 LI

Estimates for a class of non-standard bilinear multipliersVjekoslavKovac1

A Szemerédi-type theorem for subsets of the unit cubeVjekoslavKovac1

Trilinear embedding for divergence-form operatorsVjekoslavKovac1

The partial derivative of the binary Cross-entropy loss functionTobiasRoeschl

GradStudentSeminarSept30Ryan White

On maximal and variational Fourier restrictionVjekoslavKovac1

A sharp nonlinear Hausdorff-Young inequality for small potentialsVjekoslavKovac1

Some Examples of Scaling SetsVjekoslavKovac1

Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...VjekoslavKovac1

Murphy: Machine learning A probabilistic perspective: Ch.9Daisuke Yoneoka

Accelerating Pseudo-Marginal MCMC using Gaussian ProcessesMatt Moores

MDL/Bayesian Criteria based on Universal Coding/MeasureJoe Suzuki

F10Balaji Srikaanth

Genomic GraphicsDr. Volkan OBAN

What's hot (20)

Quantitative norm convergence of some ergodic averages

Specific Finite Groups(General)

QMC Error SAMSI Tutorial Aug 2017

Specific Finite Groups(General)

Scattering theory analogues of several classical estimates in Fourier analysis

735

Estimates for a class of non-standard bilinear multipliers

A Szemerédi-type theorem for subsets of the unit cube

Trilinear embedding for divergence-form operators

The partial derivative of the binary Cross-entropy loss function

GradStudentSeminarSept30

On maximal and variational Fourier restriction

A sharp nonlinear Hausdorff-Young inequality for small potentials

Some Examples of Scaling Sets

Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...

Murphy: Machine learning A probabilistic perspective: Ch.9

Accelerating Pseudo-Marginal MCMC using Gaussian Processes

MDL/Bayesian Criteria based on Universal Coding/Measure

F10

Genomic Graphics

Similar to A note on the density of Gumbel-softmax

Fast parallelizable scenario-based stochastic optimizationPantelis Sopasakis

Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor

fouriertransform.pdfssuser4dafea

FUNCTION EX 1 PROBLEMS WITH SOLUTION UPTO JEE LEVELMohanSonawane

exponen dan logaritmaHanifa Zulfitri

The Fundamental Solution of an Extension to a Generalized Laplace EquationJohnathan Gray

5.8 Permutations (handout)Jan Plaza

Ece3075 a 8Aiman Malik

Best Approximation in Real Linear 2-Normed SpacesIOSR Journals

Lesson 3 Operation on FunctionsShann Ashequielle Blasurca

Dominación y extensiones óptimas de operadores con rango esencial compacto en...esasancpe

Convexity in the Theory of the Gamma Function.pdfPeterEsnayderBarranz

Cubic Spline InterpolationVARUN KUMAR

Tales on two commuting transformations or flowsVjekoslavKovac1

FormulasZHERILL

S. Duplij, Y. Hong, F. Li. Uq(sl(m+1))-module algebra structures on the coord...Steven Duplij (Stepan Douplii)

34032 green funcansarixxx

Applications of Numerical Functional Analysis in Atomless Finite Measure SpacesQUESTJOURNAL

Chapter1 functionsSofia Mahmood

Chapter1 functionsMonie Joey

Similar to A note on the density of Gumbel-softmax (20)

Fast parallelizable scenario-based stochastic optimization

Welcome to International Journal of Engineering Research and Development (IJERD)

fouriertransform.pdf

FUNCTION EX 1 PROBLEMS WITH SOLUTION UPTO JEE LEVEL

exponen dan logaritma

The Fundamental Solution of an Extension to a Generalized Laplace Equation

5.8 Permutations (handout)

Ece3075 a 8

Best Approximation in Real Linear 2-Normed Spaces

Lesson 3 Operation on Functions

Dominación y extensiones óptimas de operadores con rango esencial compacto en...

Convexity in the Theory of the Gamma Function.pdf

Cubic Spline Interpolation

Tales on two commuting transformations or flows

Formulas

S. Duplij, Y. Hong, F. Li. Uq(sl(m+1))-module algebra structures on the coord...

34032 green func

Applications of Numerical Functional Analysis in Atomless Finite Measure Spaces

Chapter1 functions

Recently uploaded

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823

➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...amitlee9823

➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7Call Girls in Nagpur High Profile Call Girls

Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal

Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics

hybrid Seed Production In Chilli & Capsicum.pptx9to5mart

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795

Probability Grade 10 Third Quarter LessonsJoseMangaJr1

Capstone Project on IBM Data Analytics ProgramMoniSankarHazra

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823

Recently uploaded (20)

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...

➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...

➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7

Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure

Detecting Credit Card Fraud: A Machine Learning Approach

hybrid Seed Production In Chilli & Capsicum.pptx

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed

Probability Grade 10 Third Quarter Lessons

Capstone Project on IBM Data Analytics Program

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand

A note on the density of Gumbel-softmax

1. A note on the density of Gumbel-softmax Tomonari MASADA @ Nagasaki University May 29, 2019 This note explicates some details of the discussion given in Appendix B of [1]. The Gumbel-softmax trick gives a k-dimensional sample vector y = (y1, . . . , yk) ∈ ∆k−1 whose entries are obtained as yi = exp((log(πi) + gi)/τ) k j=1 exp((log(πj) + gj)/τ) for i = 1, . . . , k, (1) by using g1, . . . , gk, which are i.i.d samples drawn from Gumbel(0, 1). Deﬁne xi = log(πi). Then y is rewritten as yi = exp((xi + gi)/τ) k j=1 exp((xj + gj)/τ) for i = 1, . . . , k, (2) Divide both numerator and denominator by exp((xk + gk)/τ). yi = exp((xi + gi − (xk + gk))/τ) k j=1 exp((xj + gj − (xk + gk))/τ) for i = 1, . . . , k, (3) Deﬁne ui = xi + gi − (xk + gk) for i = 1, . . . , k − 1, where gi ∼ Gumbel(0, 1). When gk is given, gi = ui−xi+(xk+gk) and ui can thus be regarded as a sample from the Gumbel whose mean is xi−(xk+gk) and scale parameter is 1. Therefore, p(ui|gk) = e−{(ui−xi+(xk+gk))+e−(ui−xi+(xk+gk)) } . Consequently, the density p(u1, . . . , uk−1) is given as follows: p(u1, . . . , uk−1) = ∞ −∞ dgkp(u1, . . . , uk−1|gk)p(gk) = ∞ −∞ dgkp(gk) k−1 i=1 p(ui|gk) = ∞ −∞ dgke−gk−e−gk k−1 i=1 exi−ui−xk−gk−exi−ui−xk−gk (4) Perform a change of variables with v = e−gk . Then dv dgk = −e−gk . Therefore, dv = −e−gk dgk and dgk = −dvegk = −dv/v. p(u1, . . . , uk−1) = 0 ∞ (−dv)e−v k−1 i=1 vexi−ui−xk−vexi−ui−xk = k−1 i=1 exi−ui−xk ∞ 0 dve−v vk−1 k−1 i=1 e−vexi−ui−xk = e−(k−1)xk k−1 i=1 exi−ui ∞ 0 dvvk−1 e−v(1+e−xk k−1 i=1 (exi−ui )) (5) Recall the following fact related to Gamma integral: ∞ 0 xz−1 e−ax dx = ∞ 0 y a z−1 e−y dy a = 1 a z ∞ 0 yz−1 e−y dy = a−z Γ(z) (6) 1

2. Therefore, p(u1, . . . , uk−1) = e−(k−1)xk k−1 i=1 exi−ui ∞ 0 dvvk−1 e−v(1+e−xk k−1 i=1 (exi−ui )) = e−kxk exk k−1 i=1 exi−ui 1 + e−xk k−1 i=1 (exi−ui ) −k Γ(k) = exk k−1 i=1 exi−ui exk + k−1 i=1 (exi−ui ) −k Γ(k) = exp xk + k−1 i=1 (xi − ui) exk + k−1 i=1 (exi−ui ) −k Γ(k) (7) Deﬁne uk = 0. Then p(u1, . . . , uk−1) = Γ(k) k i=1 exp(xi − ui) k i=1 exp(xi − ui) −k (8) A k-dimensional sample vector y = (y1, . . . , yk) ∈ ∆k−1 is obtained from u1, . . . , uk−1 by applying a deterministic transformation h: hi(u1, . . . , uk−1) = exp(ui/τ) 1 + k−1 j=1 exp(uj/τ) for i = 1, . . . , k − 1 (9) as follows: yi = hi(u1, . . . , uk−1) for i = 1, . . . , k − 1 (10) Note that yk is ﬁxed given y1, . . . , yk−1: yk = 1 1 + k−1 j=1 exp(uj/τ) = 1 − k−1 j=1 yj (11) By using the change of variables we can obtain the density function for y: p(y) = p(h−1 (y1, . . . , yk−1)) det ∂(h−1 1 (y1, . . . , yk−1), . . . , h−1 k−1(y1, . . . , yk−1)) ∂(y1, . . . , yk−1) (12) The inverse of h is obtained as follows: yi = exp(ui/τ) 1 + k−1 j=1 exp(uj/τ) yi = yk exp(ui/τ) from yk = 1 1+ k−1 j=1 exp(uj /τ) log yi = log yk + ui/τ ∴ h−1 i (y1, . . . , yk−1) = ui = τ × (log yi − log yk) = τ × log yi − log 1 − k−1 j=1 yj (13) Therefore, we obtain the Jacobian: ∂h−1 i (y1, . . . , yk−1) ∂yi = τ × 1 yi − 1 yk ∂yk ∂yi = τ × 1 yi + 1 yk (14) ∂h−1 i (y1, . . . , yk−1) ∂yj = τ × − 1 yk ∂yk ∂yj = τ × 1 yk for j = i (15) Eqs. from (24) to (28) are easy to understand. References [1] E. Jang, S. Gu, and B. Poole. Categorical representation with Gumbel-softmax. ICLR, 2017. 2

A note on the density of Gumbel-softmax

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A note on the density of Gumbel-softmax

Similar to A note on the density of Gumbel-softmax (20)

More from Tomonari Masada

More from Tomonari Masada (20)

Recently uploaded

Recently uploaded (20)

A note on the density of Gumbel-softmax