SlideShare ist ein Scribd-Unternehmen logo
1 von 3
Downloaden Sie, um offline zu lesen
Reparameterization of Discrete Variables for Latent LSTM
Allocation
Tomonari MASADA @ Nagasaki University
September 1, 2017
1 ELBO
In latent LSTM allocation, the topic assignments zd = {zd,1, . . . , zd,Nd
} for each document d are drawn
from the categorical distribution whose parameters are obtained as a softmax output of LSTM.
Based on the description of the generative process given in the paper [2], we obtain the full joint
distribution as follows:
p({w1, . . . , wD}, {z1, . . . , zD}, Ļ†; LSTM, Ī²) = p(Ļ†; Ī²)
D
d=1
p(wd, zd, Ļ†; LSTM, Ī²) (1)
We maximize the evidence p({w1, . . . , wD}; LSTM, Ī²), which is obtained as below.
p({w1, . . . , wD}; LSTM, Ī²) =
{z1,...,zD}
p({w1, . . . , wD}, {z1, . . . , zD}, Ļ†; LSTM, Ī²)dĻ†
=
{z1,...,zD}
p(Ļ†; Ī²)
d
p(wd, zd|Ļ†; LSTM)dĻ†, (2)
where
p(wd, zd|Ļ†; LSTM) = p(wd|zd, Ļ†)p(zd; LSTM)
=
t
p(wd,t|zd,t, Ļ†)p(zd,t|zd,1:tāˆ’1; LSTM) (3)
Jensenā€™s inequality gives the following lower bound of the log of the evidence:
log p({w1, . . . , wD}; LSTM, Ī²) = log
{z1,...,zD}
p(Ļ†; Ī²)
d
p(wd, zd|Ļ†; LSTM)dĻ†
= log
{z1,...,zD}
q({z1, . . . , zD}, Ļ†)
p(Ļ†; Ī²) d p(wd, zd|Ļ†; LSTM)
q({z1, . . . , zD}, Ļ†)
dĻ†
ā‰„
{z1,...,zD}
q({z1, . . . , zD}, Ļ†) log
p(Ļ†; Ī²) d p(wd, zd|Ļ†; LSTM)
q({z1, . . . , zD}, Ļ†)
dĻ†
ā‰” L (4)
Let this lower bound, i.e., ELBO, be denoted by L.
We assume that the variational posterior q({z1, . . . , zD}, Ļ†) factorizes as k q(Ļ†k) Ɨ d q(zd). The
q(Ļ†k) are Dirichlet distributions whose parameters are Ī¾k = {Ī¾k,1 . . . , Ī¾k,V }.
Then the ELBO L can be rewritten as below.
L = q(Ļ†) log p(Ļ†; Ī²)dĻ† +
d zd
q(zd) log p(zd; LSTM) +
d zd
q(zd)q(Ļ†) log p(wd|zd, Ļ†)dĻ†
āˆ’
d zd
q(zd) log q(zd) āˆ’ q(Ļ†) log q(Ļ†)dĻ† (5)
1
The second term of L in Eq. (5) can be rewritten as below.
zd
q(zd) log p(zd; LSTM) =
zd
q(zd)
t
log p(zd,t|zd,1:tāˆ’1; LSTM)
=
zd
q(zd) log p(zd,1; LSTM) + log p(zd,2|zd,1; LSTM) + log p(zd,3|zd,1, zd,2; LSTM)
+ Ā· Ā· Ā· + log p(zd,Nd
|zd,1:Ndāˆ’1; LSTM) (6)
The evaluation of Eq. (6) is intractable. However, for each t, we can reparameterize zd,t as zd,t =
gĪ¶d,t
(zd,1:tāˆ’1, d,t), where zd,t is represented as a one-hot vector zd,t [1]. That is, when zd,t = k,
gk,Ī¶d,t
(zd,1:tāˆ’1, d,t) = 1 and gj,Ī¶d,t
(zd,1:tāˆ’1, d,t) = 0 for j = k.
Then the expectation with respect to the hidden variables can be rewritten as follows.
EqĪ¶d
(zd)
t
log p(zd,t|zd,1:tāˆ’1; LSTM) = Ep( d)
t
log p(gĪ¶d,t
(zd,1:tāˆ’1, d,t)| d,1:tāˆ’1; Ī¶d,1:tāˆ’1, LSTM) (7)
We deļ¬ne gk,Ī¶d,t
(zd,1:tāˆ’1, d,t) ā‰” 1 if
kāˆ’1
j=1 Ī¶d,t,j
K
j=1 Ī¶d,t,j
ā‰¤ d,t <
k
j=1 Ī¶d,t,j
K
j=1 Ī¶d,t,j
and gk,Ī¶d,t
(zd,1:tāˆ’1, d,t) ā‰” 0
otherwise.
log p(zd,t|zd,1:tāˆ’1; LSTM) = log
K
k=1
zd,t,kĪød,t,k = log
K
k=1
gk,Ī¶d,t
(zd,1:tāˆ’1, d,t)Īød,t,k (8)
where Īød,t is the softmax output of LSTM and thus is a function of zd,1:tāˆ’1.
We assume that d,t āˆ¼ U(0, 1).
EqĪ¶d,t
(zd) log p(zd,t|zd,1:tāˆ’1; LSTM)
= EqĪ¶d,t
(z
t
d )
log
K
k=1
gk,Ī¶d,t
(zd,1:tāˆ’1, d,t)Īød,t,kd d,t
= Ep(
t
d )
K
k=1
Ī¶d,t,k(zd,1:tāˆ’1)
j Ī¶d,t,j(zd,1:tāˆ’1)
log Īød,t,k(zd,1:tāˆ’1)
z
t
d =h
Ī¶
t
d
(zd,t,
t
d )
(9)
The third term of L in Eq. (5) can be rewritten as below.
d zd
q(zd)q(Ļ†) log p(wd|zd, Ļ†)dĻ† =
d
q(Ļ†)
zd
q(zd)
t
log Ļ†zd,t,wd,t
dĻ†
=
d zd
q(zd)
t
q(Ļ†) log Ļ†zd,t,wd,t
dĻ†
=
d zd
q(zd)
t
ĪØ(Ī¾zd,t,wd,t
) āˆ’ ĪØ
v
Ī¾zd,t,v (10)
By using the reparameterization described above, we obtain the following.
d zd
q(zd)q(Ļ†) log p(wd|zd, Ļ†)dĻ†
=
d
Ep(
t
d )
K
k=1
Ī¶d,t,k(zd,1:tāˆ’1)
j Ī¶d,t,j(zd,1:tāˆ’1)
ĪØ(Ī¾k,wd,t
) āˆ’ ĪØ
v
Ī¾k,v
z
t
d =h
Ī¶
t
d
(zd,t,
t
d )
(11)
2
The ļ¬rst term of L in Eq. (5) can be rewritten as below.
q(Ļ†) log p(Ļ†; Ī²)dĻ† =
k
q(Ļ†k) log p(Ļ†k; Ī²)dĻ†k
= K log Ī“(V Ī²) āˆ’ KV log Ī“(Ī²) +
k v
(Ī² āˆ’ 1) q(Ļ†k) log Ļ†k,vdĻ†k
= K log Ī“(V Ī²) āˆ’ KV log Ī“(Ī²) + (Ī² āˆ’ 1)
k v
ĪØ(Ī¾k,v) āˆ’ ĪØ
v
Ī¾k,v (12)
The fourth term of L in Eq. (5) can be rewritten as below with the reparameterization described above.
d zd
q(zd) log q(zd)
=
D
d=1
Ep(
t
d )
K
k=1
Ī¶d,t,k(zd,1:tāˆ’1)
j Ī¶d,t,j(zd,1:tāˆ’1)
log
Ī¶d,t,k(zd,1:tāˆ’1)
j Ī¶d,t,j(zd,1:tāˆ’1) z
t
d =h
Ī¶
t
d
(zd,t,
t
d )
(13)
The last term of L can be rewritten as below.
q(Ļ†) log q(Ļ†)dĻ† =
k
q(Ļ†k) log q(Ļ†k)dĻ†k
=
k
log Ī“
v
Ī¾k,v āˆ’
k v
log Ī“(Ī¾k,v) +
k v
(Ī¾k,v āˆ’ 1) ĪØ(Ī¾k,v) āˆ’ ĪØ
v
Ī¾k,v (14)
2 Inference
For simplicity, we assume that the Ī¶d,t do not depend on {Ī¶d,t : t = t}. Further we let Ī³d,t,k denote
Ī¶d,t,k
j Ī¶d,t,j
. Then the partial diļ¬€erentiation of L with respect to Ī³d,t,k is
āˆ‚L
āˆ‚Ī³d,t,k
= Ep(
t
d )
log Īød,t,k(zd,1:tāˆ’1)
z
t
d =h
Ī¶
t
d
(
t
d )
+ ĪØ(Ī¾k,wd,t
) āˆ’ ĪØ
v
Ī¾k,v āˆ’ log Ī³d,t,k + const. (15)
The ļ¬rst term of Eq. (15) can be approximated by drawing samples Ė†zd,t āˆ¼ Categorical(Ī³d,t ) for t < t.
Then we estimate Īød,t,k(zd,1:tāˆ’1) by LSTM forward pass. By solving āˆ‚L
āˆ‚Ī³d,t,k
= 0, we obtain
Ī³d,t,k āˆ Ļ†k,wd,t
log Īød,t,k(Ė†zd,1:tāˆ’1), (16)
where Ļ†k,wd,t
ā‰”
exp(ĪØ(Ī¾k,wd,t
))
exp(ĪØ( v Ī¾k,v)) .
For Ī¾k,v, we obtain the estimation Ī² + d {t:wd,t=v} Ī³d,t,k as usual.
Let Īød,t,k denote p(zd,t = k|Ė†zd,1:tāˆ’1; LSTM), which is a softmax output of LSTM. The partial diļ¬€eren-
tiation of L with respect to any LSTM parameter is
āˆ‚L
āˆ‚LSTM
=
dāˆˆB
Nd
t=1
K
k=1
Ī³d,t,k
āˆ‚
āˆ‚LSTM
log Īød,t,k =
dāˆˆB
Nd
t=1
K
k=1
Ī³d,t,k
Īød,t,k
āˆ‚Īød,t,k
āˆ‚LSTM
(17)
References
[1] Seiya Tokui and Issei Sato. Reparameterization trick for discrete variables. CoRR, abs/1611.01239,
2016.
[2] Manzil Zaheer, Amr Ahmed, and Alexander J. Smola. Latent LSTM allocation: Joint clustering
and non-linear dynamic modeling of sequence data. In Doina Precup and Yee Whye Teh, editors,
Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of
Machine Learning Research, pages 3967ā€“3976, International Convention Centre, Sydney, Australia,
06ā€“11 Aug 2017. PMLR.
3

Weitere Ƥhnliche Inhalte

Mehr von Tomonari Masada

Document Modeling with Implicit Approximate Posterior Distributions
Document Modeling with Implicit Approximate Posterior DistributionsDocument Modeling with Implicit Approximate Posterior Distributions
Document Modeling with Implicit Approximate Posterior Distributions
Tomonari Masada
Ā 
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
Tomonari Masada
Ā 
Supplementary material for my following paper: Infinite Latent Process Decomp...
Supplementary material for my following paper: Infinite Latent Process Decomp...Supplementary material for my following paper: Infinite Latent Process Decomp...
Supplementary material for my following paper: Infinite Latent Process Decomp...
Tomonari Masada
Ā 
Nonparametric Factor Analysis with Beta Process Priors ć®å¼č§£čŖ¬
Nonparametric Factor Analysis with Beta Process Priors ć®å¼č§£čŖ¬Nonparametric Factor Analysis with Beta Process Priors ć®å¼č§£čŖ¬
Nonparametric Factor Analysis with Beta Process Priors ć®å¼č§£čŖ¬
Tomonari Masada
Ā 

Mehr von Tomonari Masada (20)

Expectation propagation for latent Dirichlet allocation
Expectation propagation for latent Dirichlet allocationExpectation propagation for latent Dirichlet allocation
Expectation propagation for latent Dirichlet allocation
Ā 
Mini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic ModelingMini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic Modeling
Ā 
A note on variational inference for the univariate Gaussian
A note on variational inference for the univariate GaussianA note on variational inference for the univariate Gaussian
A note on variational inference for the univariate Gaussian
Ā 
Document Modeling with Implicit Approximate Posterior Distributions
Document Modeling with Implicit Approximate Posterior DistributionsDocument Modeling with Implicit Approximate Posterior Distributions
Document Modeling with Implicit Approximate Posterior Distributions
Ā 
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka CompositionLDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
Ā 
Topic modeling with Poisson factorization (2)
Topic modeling with Poisson factorization (2)Topic modeling with Poisson factorization (2)
Topic modeling with Poisson factorization (2)
Ā 
Poisson factorization
Poisson factorizationPoisson factorization
Poisson factorization
Ā 
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic ModelA Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
Ā 
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
Ā 
Word count in Husserliana Volumes 1 to 28
Word count in Husserliana Volumes 1 to 28Word count in Husserliana Volumes 1 to 28
Word count in Husserliana Volumes 1 to 28
Ā 
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
Ā 
FDSE2015
FDSE2015FDSE2015
FDSE2015
Ā 
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
Ā 
A Note on BPTT for LSTM LM
A Note on BPTT for LSTM LMA Note on BPTT for LSTM LM
A Note on BPTT for LSTM LM
Ā 
The detailed derivation of the derivatives in Table 2 of Marginalized Denoisi...
The detailed derivation of the derivatives in Table 2 of Marginalized Denoisi...The detailed derivation of the derivatives in Table 2 of Marginalized Denoisi...
The detailed derivation of the derivatives in Table 2 of Marginalized Denoisi...
Ā 
A Note on PCVB0 for HDP-LDA
A Note on PCVB0 for HDP-LDAA Note on PCVB0 for HDP-LDA
A Note on PCVB0 for HDP-LDA
Ā 
ChronoSAGE: Diversifying Topic Modeling Chronologically
ChronoSAGE: Diversifying Topic Modeling ChronologicallyChronoSAGE: Diversifying Topic Modeling Chronologically
ChronoSAGE: Diversifying Topic Modeling Chronologically
Ā 
A Topic Model for Traffic Speed Data Analysis
A Topic Model for Traffic Speed Data AnalysisA Topic Model for Traffic Speed Data Analysis
A Topic Model for Traffic Speed Data Analysis
Ā 
Supplementary material for my following paper: Infinite Latent Process Decomp...
Supplementary material for my following paper: Infinite Latent Process Decomp...Supplementary material for my following paper: Infinite Latent Process Decomp...
Supplementary material for my following paper: Infinite Latent Process Decomp...
Ā 
Nonparametric Factor Analysis with Beta Process Priors ć®å¼č§£čŖ¬
Nonparametric Factor Analysis with Beta Process Priors ć®å¼č§£čŖ¬Nonparametric Factor Analysis with Beta Process Priors ć®å¼č§£čŖ¬
Nonparametric Factor Analysis with Beta Process Priors ć®å¼č§£čŖ¬
Ā 

KĆ¼rzlich hochgeladen

Call Girls in Ramesh Nagar Delhi šŸ’Æ Call Us šŸ”9953056974 šŸ” Escort Service
Call Girls in Ramesh Nagar Delhi šŸ’Æ Call Us šŸ”9953056974 šŸ” Escort ServiceCall Girls in Ramesh Nagar Delhi šŸ’Æ Call Us šŸ”9953056974 šŸ” Escort Service
Call Girls in Ramesh Nagar Delhi šŸ’Æ Call Us šŸ”9953056974 šŸ” Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
Ā 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
Ā 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
Ā 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
Ā 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
Ā 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
Ā 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
KreezheaRecto
Ā 

KĆ¼rzlich hochgeladen (20)

Call Girls in Ramesh Nagar Delhi šŸ’Æ Call Us šŸ”9953056974 šŸ” Escort Service
Call Girls in Ramesh Nagar Delhi šŸ’Æ Call Us šŸ”9953056974 šŸ” Escort ServiceCall Girls in Ramesh Nagar Delhi šŸ’Æ Call Us šŸ”9953056974 šŸ” Escort Service
Call Girls in Ramesh Nagar Delhi šŸ’Æ Call Us šŸ”9953056974 šŸ” Escort Service
Ā 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Ā 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
Ā 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Ā 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Ā 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
Ā 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
Ā 
Top Rated Pune Call Girls Budhwar Peth āŸŸ 6297143586 āŸŸ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth āŸŸ 6297143586 āŸŸ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth āŸŸ 6297143586 āŸŸ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth āŸŸ 6297143586 āŸŸ Call Me For Genuine Se...
Ā 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
Ā 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
Ā 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
Ā 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
Ā 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
Ā 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
Ā 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
Ā 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
Ā 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Ā 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
Ā 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
Ā 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Ā 

Reparameterization of Discrete Variables for Latent LSTM Allocation

  • 1. Reparameterization of Discrete Variables for Latent LSTM Allocation Tomonari MASADA @ Nagasaki University September 1, 2017 1 ELBO In latent LSTM allocation, the topic assignments zd = {zd,1, . . . , zd,Nd } for each document d are drawn from the categorical distribution whose parameters are obtained as a softmax output of LSTM. Based on the description of the generative process given in the paper [2], we obtain the full joint distribution as follows: p({w1, . . . , wD}, {z1, . . . , zD}, Ļ†; LSTM, Ī²) = p(Ļ†; Ī²) D d=1 p(wd, zd, Ļ†; LSTM, Ī²) (1) We maximize the evidence p({w1, . . . , wD}; LSTM, Ī²), which is obtained as below. p({w1, . . . , wD}; LSTM, Ī²) = {z1,...,zD} p({w1, . . . , wD}, {z1, . . . , zD}, Ļ†; LSTM, Ī²)dĻ† = {z1,...,zD} p(Ļ†; Ī²) d p(wd, zd|Ļ†; LSTM)dĻ†, (2) where p(wd, zd|Ļ†; LSTM) = p(wd|zd, Ļ†)p(zd; LSTM) = t p(wd,t|zd,t, Ļ†)p(zd,t|zd,1:tāˆ’1; LSTM) (3) Jensenā€™s inequality gives the following lower bound of the log of the evidence: log p({w1, . . . , wD}; LSTM, Ī²) = log {z1,...,zD} p(Ļ†; Ī²) d p(wd, zd|Ļ†; LSTM)dĻ† = log {z1,...,zD} q({z1, . . . , zD}, Ļ†) p(Ļ†; Ī²) d p(wd, zd|Ļ†; LSTM) q({z1, . . . , zD}, Ļ†) dĻ† ā‰„ {z1,...,zD} q({z1, . . . , zD}, Ļ†) log p(Ļ†; Ī²) d p(wd, zd|Ļ†; LSTM) q({z1, . . . , zD}, Ļ†) dĻ† ā‰” L (4) Let this lower bound, i.e., ELBO, be denoted by L. We assume that the variational posterior q({z1, . . . , zD}, Ļ†) factorizes as k q(Ļ†k) Ɨ d q(zd). The q(Ļ†k) are Dirichlet distributions whose parameters are Ī¾k = {Ī¾k,1 . . . , Ī¾k,V }. Then the ELBO L can be rewritten as below. L = q(Ļ†) log p(Ļ†; Ī²)dĻ† + d zd q(zd) log p(zd; LSTM) + d zd q(zd)q(Ļ†) log p(wd|zd, Ļ†)dĻ† āˆ’ d zd q(zd) log q(zd) āˆ’ q(Ļ†) log q(Ļ†)dĻ† (5) 1
  • 2. The second term of L in Eq. (5) can be rewritten as below. zd q(zd) log p(zd; LSTM) = zd q(zd) t log p(zd,t|zd,1:tāˆ’1; LSTM) = zd q(zd) log p(zd,1; LSTM) + log p(zd,2|zd,1; LSTM) + log p(zd,3|zd,1, zd,2; LSTM) + Ā· Ā· Ā· + log p(zd,Nd |zd,1:Ndāˆ’1; LSTM) (6) The evaluation of Eq. (6) is intractable. However, for each t, we can reparameterize zd,t as zd,t = gĪ¶d,t (zd,1:tāˆ’1, d,t), where zd,t is represented as a one-hot vector zd,t [1]. That is, when zd,t = k, gk,Ī¶d,t (zd,1:tāˆ’1, d,t) = 1 and gj,Ī¶d,t (zd,1:tāˆ’1, d,t) = 0 for j = k. Then the expectation with respect to the hidden variables can be rewritten as follows. EqĪ¶d (zd) t log p(zd,t|zd,1:tāˆ’1; LSTM) = Ep( d) t log p(gĪ¶d,t (zd,1:tāˆ’1, d,t)| d,1:tāˆ’1; Ī¶d,1:tāˆ’1, LSTM) (7) We deļ¬ne gk,Ī¶d,t (zd,1:tāˆ’1, d,t) ā‰” 1 if kāˆ’1 j=1 Ī¶d,t,j K j=1 Ī¶d,t,j ā‰¤ d,t < k j=1 Ī¶d,t,j K j=1 Ī¶d,t,j and gk,Ī¶d,t (zd,1:tāˆ’1, d,t) ā‰” 0 otherwise. log p(zd,t|zd,1:tāˆ’1; LSTM) = log K k=1 zd,t,kĪød,t,k = log K k=1 gk,Ī¶d,t (zd,1:tāˆ’1, d,t)Īød,t,k (8) where Īød,t is the softmax output of LSTM and thus is a function of zd,1:tāˆ’1. We assume that d,t āˆ¼ U(0, 1). EqĪ¶d,t (zd) log p(zd,t|zd,1:tāˆ’1; LSTM) = EqĪ¶d,t (z t d ) log K k=1 gk,Ī¶d,t (zd,1:tāˆ’1, d,t)Īød,t,kd d,t = Ep( t d ) K k=1 Ī¶d,t,k(zd,1:tāˆ’1) j Ī¶d,t,j(zd,1:tāˆ’1) log Īød,t,k(zd,1:tāˆ’1) z t d =h Ī¶ t d (zd,t, t d ) (9) The third term of L in Eq. (5) can be rewritten as below. d zd q(zd)q(Ļ†) log p(wd|zd, Ļ†)dĻ† = d q(Ļ†) zd q(zd) t log Ļ†zd,t,wd,t dĻ† = d zd q(zd) t q(Ļ†) log Ļ†zd,t,wd,t dĻ† = d zd q(zd) t ĪØ(Ī¾zd,t,wd,t ) āˆ’ ĪØ v Ī¾zd,t,v (10) By using the reparameterization described above, we obtain the following. d zd q(zd)q(Ļ†) log p(wd|zd, Ļ†)dĻ† = d Ep( t d ) K k=1 Ī¶d,t,k(zd,1:tāˆ’1) j Ī¶d,t,j(zd,1:tāˆ’1) ĪØ(Ī¾k,wd,t ) āˆ’ ĪØ v Ī¾k,v z t d =h Ī¶ t d (zd,t, t d ) (11) 2
  • 3. The ļ¬rst term of L in Eq. (5) can be rewritten as below. q(Ļ†) log p(Ļ†; Ī²)dĻ† = k q(Ļ†k) log p(Ļ†k; Ī²)dĻ†k = K log Ī“(V Ī²) āˆ’ KV log Ī“(Ī²) + k v (Ī² āˆ’ 1) q(Ļ†k) log Ļ†k,vdĻ†k = K log Ī“(V Ī²) āˆ’ KV log Ī“(Ī²) + (Ī² āˆ’ 1) k v ĪØ(Ī¾k,v) āˆ’ ĪØ v Ī¾k,v (12) The fourth term of L in Eq. (5) can be rewritten as below with the reparameterization described above. d zd q(zd) log q(zd) = D d=1 Ep( t d ) K k=1 Ī¶d,t,k(zd,1:tāˆ’1) j Ī¶d,t,j(zd,1:tāˆ’1) log Ī¶d,t,k(zd,1:tāˆ’1) j Ī¶d,t,j(zd,1:tāˆ’1) z t d =h Ī¶ t d (zd,t, t d ) (13) The last term of L can be rewritten as below. q(Ļ†) log q(Ļ†)dĻ† = k q(Ļ†k) log q(Ļ†k)dĻ†k = k log Ī“ v Ī¾k,v āˆ’ k v log Ī“(Ī¾k,v) + k v (Ī¾k,v āˆ’ 1) ĪØ(Ī¾k,v) āˆ’ ĪØ v Ī¾k,v (14) 2 Inference For simplicity, we assume that the Ī¶d,t do not depend on {Ī¶d,t : t = t}. Further we let Ī³d,t,k denote Ī¶d,t,k j Ī¶d,t,j . Then the partial diļ¬€erentiation of L with respect to Ī³d,t,k is āˆ‚L āˆ‚Ī³d,t,k = Ep( t d ) log Īød,t,k(zd,1:tāˆ’1) z t d =h Ī¶ t d ( t d ) + ĪØ(Ī¾k,wd,t ) āˆ’ ĪØ v Ī¾k,v āˆ’ log Ī³d,t,k + const. (15) The ļ¬rst term of Eq. (15) can be approximated by drawing samples Ė†zd,t āˆ¼ Categorical(Ī³d,t ) for t < t. Then we estimate Īød,t,k(zd,1:tāˆ’1) by LSTM forward pass. By solving āˆ‚L āˆ‚Ī³d,t,k = 0, we obtain Ī³d,t,k āˆ Ļ†k,wd,t log Īød,t,k(Ė†zd,1:tāˆ’1), (16) where Ļ†k,wd,t ā‰” exp(ĪØ(Ī¾k,wd,t )) exp(ĪØ( v Ī¾k,v)) . For Ī¾k,v, we obtain the estimation Ī² + d {t:wd,t=v} Ī³d,t,k as usual. Let Īød,t,k denote p(zd,t = k|Ė†zd,1:tāˆ’1; LSTM), which is a softmax output of LSTM. The partial diļ¬€eren- tiation of L with respect to any LSTM parameter is āˆ‚L āˆ‚LSTM = dāˆˆB Nd t=1 K k=1 Ī³d,t,k āˆ‚ āˆ‚LSTM log Īød,t,k = dāˆˆB Nd t=1 K k=1 Ī³d,t,k Īød,t,k āˆ‚Īød,t,k āˆ‚LSTM (17) References [1] Seiya Tokui and Issei Sato. Reparameterization trick for discrete variables. CoRR, abs/1611.01239, 2016. [2] Manzil Zaheer, Amr Ahmed, and Alexander J. Smola. Latent LSTM allocation: Joint clustering and non-linear dynamic modeling of sequence data. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 3967ā€“3976, International Convention Centre, Sydney, Australia, 06ā€“11 Aug 2017. PMLR. 3