TopicRNN: A Recurrent Model for Documents

•

2 gefällt mir•431 views

TopicRNN is a generative model for documents that: 1. Draws a topic vector from a standard normal distribution and uses it to generate words in a document. 2. Computes a lower bound on the log marginal likelihood of words and stop word indicators. 3. Approximates the expected values in the lower bound using samples from an inference network that models the approximate posterior distribution over topics.

Daten & Analysen

A Note on TopicRNN
Tomonari MASADA @ Nagasaki University
July 13, 2017
1 Model
TopicRNN is a generative model proposed by [1], whose generative story for a particular document x1:T
is given as below.
1. Draw a topic vector θ ∼ N(0, I).
2. Given word y1:t−1, for the tth word yt in the document,
(a) Compute hidden state ht = fW (xt, ht−1), where we let xt yt−1.
(b) Draw stop word indicator lt ∼ Bernoulli(σ(Γ ht)), with σ the sigmoid function.
(c) Draw word yt ∼ p(yt|ht, θ, lt, B), where
p(yt = i|ht, θ, lt, B) ∝ exp(vi ht + (1 − lt)bi θ) .
2 Lower bound
The log marginal likelihood of the word sequence y1:T and the stop word indicators l1:T is
log p(y1:T , l1:T |h1:T ) = log p(θ)
T
t=1
p(yt|ht, lt, θ; W)p(lt|ht; Γ)dθ (1)
A lower bound can be obtained as follows:
log p(y1:T , l1:T |h1:T ) = log p(θ)
T
t=1
p(yt|ht, lt, θ; W)p(lt|ht; Γ)dθ
= log q(θ)
p(θ)
T
t=1 p(yt|ht, lt, θ; W)p(lt|ht; Γ)
q(θ)
dθ
≥ q(θ) log
p(θ)
T
t=1 p(yt|ht, lt, θ; W)p(lt|ht; Γ)
q(θ)
dθ
= q(θ) log p(θ)dθ +
T
t=1
q(θ) log p(yt|ht, lt, θ; W)dθ +
T
t=1
q(θ) log p(lt|ht; Γ)dθ − q(θ) log q(θ)dθ
L(y1:T , l1:T |q(θ), Θ) (2)
3 Approximate posterior
The form of q(θ) is chosen to be an inference network using a feed-forward neural network. Each expec-
tation in Eq. (2) is approximated with the samples from q(θ|Xc), where Xc denotes the term-frequency
representation of y1:T excluding stop words. The density of the approximate posterior q(θ|Xc) is speciﬁed
as follows:
q(θ|Xc) = N(θ; µ(Xc), diag(σ2
(Xc))), (3)
µ(Xc) = W1g(Xc) + a1, (4)
log σ(Xc) = W2g(Xc) + a2, (5)
where g(·) denotes the feed-forward neural network. Eq. (3) gives the reparameterization of θk as θk =
µk(Xc) + kσk(Xc) for k = 1, . . . , K, where k is a sample from the standard normal distribution N(0, 1).
1

4 Monte Carlo integration
We can now rewrite each term of the lower bound L(y1:T , l1:T |q(θ), Θ) in Eq. (2) as below, where the θ(s)
s
denote the samples drawn from the approximate posterior q(θ|Xc).
The ﬁrst term:
q(θ) log p(θ)dθ ≈
1
S
S
s=1
log p(θ(s)
) =
1
S
S
s=1
K
k=1
log
1
√
2π
exp −
θ
(s)
k
2
2
= −
K log(2π)
2
−
1
2
K
k=1
s θ
(s)
k
2
S
(6)
Each addend of the second term:
q(θ) log p(yt|ht, lt, θ; W)dθ ≈
1
S
S
s=1
log
exp(vyt
ht + (1 − lt)byt
θ(s)
)
C
j=1 exp(vj ht + (1 − lt)bj θ(s))
= vyt
ht + (1 − lt)byt
S
s=1 θ(s)
S
−
1
S
S
s=1
log
C
j=1
exp vj ht + (1 − lt)bj θ(s)
(7)
Each addend of the third term:
q(θ) log p(lt|ht; Γ)dθ = lt log(σ(Γ ht)) + (1 − lt) log(1 − σ(Γ ht)) (8)
The fourth term:
q(θ) log q(θ)dθ ≈
1
S
S
s=1
K
k=1
log
1
2πσ2
k(Xc)
exp −
(θ
(s)
k − µk(Xc))2
2σ2
k(Xc)
= −
K log(2π)
2
−
K
k=1
log(σk(Xc)) −
1
S
S
s=1
K
k=1
θ
(s)
k − µk(Xc)
2
2σ2
k(Xc)
(9)
5 Objective to be maximized
Each of the s samples (i.e., θ(s)
for s = 1, . . . , S) is obtained as θ(s)
= µ(Xc)+ (s)
◦σ(Xc) via the reparam-
eterization, where the
(s)
k s are drawn from the standard normal, and ◦ is the element-wise multiplication.
Consequently, the lower bound L(y1:T , l1:T |q(θ), Θ) to be maximized is obtained as follows:
L(y1:T , l1:T |q(θ), Θ) = −
1
2
K
k=1
s µk(Xc) +
(s)
k σk(Xc)
2
S
+
T
t=1
vyt
ht +
1
S
S
s=1
T
t=1
(1 − lt)byt
µ(Xc) + (s)
◦ σ(Xc)
−
T
t=1
1
S
S
s=1
log
C
j=1
exp vj ht + (1 − lt)bj µ(Xc) + (s)
◦ σ(Xc)
+
T
t=1
lt log(σ(Γ ht)) + (1 − lt) log(1 − σ(Γ ht))
+
K
k=1
log(σk(Xc)) + const. (10)
References
[1] Adji Bousso Dieng, Chong Wang, Jianfeng Gao, and John Paisley. TopicRNN: A Recurrent Neural
Network with Long-Range Semantic Dependency. ICLR, 2017.
2

Weitere ähnliche Inhalte

Was ist angesagt?

Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)MeetupDataScienceRoma

Specific Finite Groups(General)Shane Nicklas

lecture 4sajinsc

A One-Pass Triclustering Approach: Is There any Room for Big Data?Dmitrii Ignatov

Goldberg-Coxeter construction for 3- or 4-valent plane mapsMathieu Dutour Sikiric

Specific Finite Groups(General)Shane Nicklas

On maximal and variational Fourier restrictionVjekoslavKovac1

Bayesian Inference and Uncertainty Quantification for Inverse ProblemsMatt Moores

R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...Matt Moores

Kumegawa russiaKazuki Kumegawa

Low-rank tensor approximation (Introduction)Alexander Litvinenko

Faster Practical Block Compression for Rank/Select DictionariesRakuten Group, Inc.

Prim's Algorithm on minimum spanning treeoneous

Hierarchical matrices for approximating large covariance matries and computin...Alexander Litvinenko

On the-approximate-solution-of-a-nonlinear-singular-integral-equationCemal Ardil

Fdtd ppt for mineAnimikhGoswami

2.6 all pairsshortestpathKrish_ver2

A Commutative Alternative to Fractional Calculus on k-Differentiable FunctionsMatt Parker

Fast Identification of Heavy Hitters by Cached and Packed Group TestingRakuten Group, Inc.

Was ist angesagt? (20)

Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)

Specific Finite Groups(General)

lecture 4

A One-Pass Triclustering Approach: Is There any Room for Big Data?

Goldberg-Coxeter construction for 3- or 4-valent plane maps

Specific Finite Groups(General)

On maximal and variational Fourier restriction

Bayesian Inference and Uncertainty Quantification for Inverse Problems

R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...

Kumegawa russia

Low-rank tensor approximation (Introduction)

Faster Practical Block Compression for Rank/Select Dictionaries

Prim's Algorithm on minimum spanning tree

Hierarchical matrices for approximating large covariance matries and computin...

On the-approximate-solution-of-a-nonlinear-singular-integral-equation

Fdtd ppt for mine

2.6 all pairsshortestpath

A Commutative Alternative to Fractional Calculus on k-Differentiable Functions

Fast Identification of Heavy Hitters by Cached and Packed Group Testing

Ähnlich wie TopicRNN: A Recurrent Model for Documents

On Twisted Paraproducts and some other Multilinear Singular IntegralsVjekoslavKovac1

MLP輪読スパース8章トレースノルム正則化Akira Tanimoto

Hybrid Atlas Models of Financial Equity Markettomoyukiichiba

Tele4653 l1Vin Voro

SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMSTahia ZERIZER

Murphy: Machine learning A probabilistic perspective: Ch.9Daisuke Yoneoka

residueRob Arnold

A Note on BPTT for LSTM LMTomonari Masada

Univariate Financial Time Series AnalysisAnissa ATMANI

Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Chiheb Ben Hammouda

Presentation OCIP2014Fabian Froehlich

Low rank tensor approximation of probability density and characteristic funct...Alexander Litvinenko

Métodos computacionales para el estudio de modelos epidemiológicos con incer...Facultad de Informática UCM

Tele4653 l5Vin Voro

$Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...$ $Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...$

Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Alexander Litvinenko

2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...The Statistical and Applied Mathematical Sciences Institute

Response Surface in Tensor Train format for Uncertainty QuantificationAlexander Litvinenko

$2014 spring crunch seminar (SDE/levy/fractional/spectral method)$ $2014 spring crunch seminar (SDE/levy/fractional/spectral method)$

2014 spring crunch seminar (SDE/levy/fractional/spectral method)Zheng Mengdi

22nd BSS meeting poster Samuel Gbari

Mid term solutionGopi Saiteja

Ähnlich wie TopicRNN: A Recurrent Model for Documents (20)

On Twisted Paraproducts and some other Multilinear Singular Integrals

MLP輪読スパース8章トレースノルム正則化

Hybrid Atlas Models of Financial Equity Market

Tele4653 l1

SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS

Murphy: Machine learning A probabilistic perspective: Ch.9

residue

A Note on BPTT for LSTM LM

Univariate Financial Time Series Analysis

Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...

Presentation OCIP2014

Low rank tensor approximation of probability density and characteristic funct...

Métodos computacionales para el estudio de modelos epidemiológicos con incer...

Tele4653 l5

$Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...$ $Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...$

Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...

2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...

Response Surface in Tensor Train format for Uncertainty Quantification

$2014 spring crunch seminar (SDE/levy/fractional/spectral method)$ $2014 spring crunch seminar (SDE/levy/fractional/spectral method)$

2014 spring crunch seminar (SDE/levy/fractional/spectral method)

22nd BSS meeting poster

Mid term solution

Mehr von Tomonari Masada

Learning Latent Space Energy Based Prior Modelの解説Tomonari Masada

Denoising Diffusion Probabilistic Modelsの重要な式の解説Tomonari Masada

Context-dependent Token-wise Variational Autoencoder for Topic ModelingTomonari Masada

A note on the density of Gumbel-softmaxTomonari Masada

トピックモデルの基礎と応用Tomonari Masada

Expectation propagation for latent Dirichlet allocationTomonari Masada

Mini-batch Variational Inference for Time-Aware Topic ModelingTomonari Masada

A note on variational inference for the univariate GaussianTomonari Masada

Document Modeling with Implicit Approximate Posterior DistributionsTomonari Masada

LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka CompositionTomonari Masada

A Note on ZINB-VAETomonari Masada

A Simple Stochastic Gradient Variational Bayes for the Correlated Topic ModelTomonari Masada

A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationTomonari Masada

Word count in Husserliana Volumes 1 to 28Tomonari Masada

A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationTomonari Masada

FDSE2015Tomonari Masada

A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...Tomonari Masada

The detailed derivation of the derivatives in Table 2 of Marginalized Denoisi...Tomonari Masada

A Note on PCVB0 for HDP-LDATomonari Masada

ChronoSAGE: Diversifying Topic Modeling ChronologicallyTomonari Masada

Mehr von Tomonari Masada (20)

Learning Latent Space Energy Based Prior Modelの解説

Denoising Diffusion Probabilistic Modelsの重要な式の解説

Context-dependent Token-wise Variational Autoencoder for Topic Modeling

A note on the density of Gumbel-softmax

トピックモデルの基礎と応用

Expectation propagation for latent Dirichlet allocation

Mini-batch Variational Inference for Time-Aware Topic Modeling

A note on variational inference for the univariate Gaussian

Document Modeling with Implicit Approximate Posterior Distributions

LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition

A Note on ZINB-VAE

A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model

A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

Word count in Husserliana Volumes 1 to 28

A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

FDSE2015

A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...

The detailed derivation of the derivatives in Table 2 of Marginalized Denoisi...

A Note on PCVB0 for HDP-LDA

ChronoSAGE: Diversifying Topic Modeling Chronologically

Kürzlich hochgeladen

在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证nhjeo1gg

Multiple time frame trading analysis -brianshannon.pdfchwongval

Easter Eggs From Star Wars and in cars 1 and 217djon017

办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss

How we prevented account sharing with MFAAndrei Kaleshka

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster

Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly

Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics

在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证nhjeo1gg

办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La

办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一z xss

Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ

Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2

Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson

办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La

Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03

NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort

Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics

ASML's Taxonomy Adventure by Daniel Cantervoginip

Kürzlich hochgeladen (20)

在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证

Multiple time frame trading analysis -brianshannon.pdf

Easter Eggs From Star Wars and in cars 1 and 2

办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree

How we prevented account sharing with MFA

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024

Generative AI for Social Good at Open Data Science East 2024

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf

在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证

办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一

办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一

Advanced Machine Learning for Business Professionals

Identifying Appropriate Test Statistics Involving Population Mean

Defining Constituents, Data Vizzes and Telling a Data Story

办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一

Top 5 Best Data Analytics Courses In Queens

NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi

Heart Disease Classification Report: A Data Analysis Project

ASML's Taxonomy Adventure by Daniel Canter

TopicRNN: A Recurrent Model for Documents

1. A Note on TopicRNN Tomonari MASADA @ Nagasaki University July 13, 2017 1 Model TopicRNN is a generative model proposed by [1], whose generative story for a particular document x1:T is given as below. 1. Draw a topic vector θ ∼ N(0, I). 2. Given word y1:t−1, for the tth word yt in the document, (a) Compute hidden state ht = fW (xt, ht−1), where we let xt yt−1. (b) Draw stop word indicator lt ∼ Bernoulli(σ(Γ ht)), with σ the sigmoid function. (c) Draw word yt ∼ p(yt|ht, θ, lt, B), where p(yt = i|ht, θ, lt, B) ∝ exp(vi ht + (1 − lt)bi θ) . 2 Lower bound The log marginal likelihood of the word sequence y1:T and the stop word indicators l1:T is log p(y1:T , l1:T |h1:T ) = log p(θ) T t=1 p(yt|ht, lt, θ; W)p(lt|ht; Γ)dθ (1) A lower bound can be obtained as follows: log p(y1:T , l1:T |h1:T ) = log p(θ) T t=1 p(yt|ht, lt, θ; W)p(lt|ht; Γ)dθ = log q(θ) p(θ) T t=1 p(yt|ht, lt, θ; W)p(lt|ht; Γ) q(θ) dθ ≥ q(θ) log p(θ) T t=1 p(yt|ht, lt, θ; W)p(lt|ht; Γ) q(θ) dθ = q(θ) log p(θ)dθ + T t=1 q(θ) log p(yt|ht, lt, θ; W)dθ + T t=1 q(θ) log p(lt|ht; Γ)dθ − q(θ) log q(θ)dθ L(y1:T , l1:T |q(θ), Θ) (2) 3 Approximate posterior The form of q(θ) is chosen to be an inference network using a feed-forward neural network. Each expectation in Eq. (2) is approximated with the samples from q(θ|Xc), where Xc denotes the term-frequency representation of y1:T excluding stop words. The density of the approximate posterior q(θ|Xc) is speciﬁed as follows: q(θ|Xc) = N(θ; µ(Xc), diag(σ2 (Xc))), (3) µ(Xc) = W1g(Xc) + a1, (4) log σ(Xc) = W2g(Xc) + a2, (5) where g(·) denotes the feed-forward neural network. Eq. (3) gives the reparameterization of θk as θk = µk(Xc) + kσk(Xc) for k = 1, . . . , K, where k is a sample from the standard normal distribution N(0, 1). 1

2. 4 Monte Carlo integration We can now rewrite each term of the lower bound L(y1:T , l1:T |q(θ), Θ) in Eq. (2) as below, where the θ(s) s denote the samples drawn from the approximate posterior q(θ|Xc). The ﬁrst term: q(θ) log p(θ)dθ ≈ 1 S S s=1 log p(θ(s) ) = 1 S S s=1 K k=1 log 1 √ 2π exp − θ (s) k 2 2 = − K log(2π) 2 − 1 2 K k=1 s θ (s) k 2 S (6) Each addend of the second term: q(θ) log p(yt|ht, lt, θ; W)dθ ≈ 1 S S s=1 log exp(vyt ht + (1 − lt)byt θ(s) ) C j=1 exp(vj ht + (1 − lt)bj θ(s)) = vyt ht + (1 − lt)byt S s=1 θ(s) S − 1 S S s=1 log C j=1 exp vj ht + (1 − lt)bj θ(s) (7) Each addend of the third term: q(θ) log p(lt|ht; Γ)dθ = lt log(σ(Γ ht)) + (1 − lt) log(1 − σ(Γ ht)) (8) The fourth term: q(θ) log q(θ)dθ ≈ 1 S S s=1 K k=1 log 1 2πσ2 k(Xc) exp − (θ (s) k − µk(Xc))2 2σ2 k(Xc) = − K log(2π) 2 − K k=1 log(σk(Xc)) − 1 S S s=1 K k=1 θ (s) k − µk(Xc) 2 2σ2 k(Xc) (9) 5 Objective to be maximized Each of the s samples (i.e., θ(s) for s = 1, . . . , S) is obtained as θ(s) = µ(Xc)+ (s) ◦σ(Xc) via the reparameterization, where the (s) k s are drawn from the standard normal, and ◦ is the element-wise multiplication. Consequently, the lower bound L(y1:T , l1:T |q(θ), Θ) to be maximized is obtained as follows: L(y1:T , l1:T |q(θ), Θ) = − 1 2 K k=1 s µk(Xc) + (s) k σk(Xc) 2 S + T t=1 vyt ht + 1 S S s=1 T t=1 (1 − lt)byt µ(Xc) + (s) ◦ σ(Xc) − T t=1 1 S S s=1 log C j=1 exp vj ht + (1 − lt)bj µ(Xc) + (s) ◦ σ(Xc) + T t=1 lt log(σ(Γ ht)) + (1 − lt) log(1 − σ(Γ ht)) + K k=1 log(σk(Xc)) + const. (10) References [1] Adji Bousso Dieng, Chong Wang, Jianfeng Gao, and John Paisley. TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency. ICLR, 2017. 2

TopicRNN: A Recurrent Model for Documents

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie TopicRNN: A Recurrent Model for Documents

Ähnlich wie TopicRNN: A Recurrent Model for Documents (20)

Mehr von Tomonari Masada

Mehr von Tomonari Masada (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

TopicRNN: A Recurrent Model for Documents