A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

•Als PPTX, PDF herunterladen•

0 gefällt mir•7,591 views

Tomonari Masada

ICCSA 2016 @ Beijing

Ingenieurwesen

A Simple Stochastic Gradient
Variational Bayes
for Latent Dirichlet Allocation
Tomonari MASADA (正田备也)
Nagasaki University (长崎大学)
masada@nagasaki-u.ac.jp

Aim
•Obtain an informative summary of a large
set of documents
•by extracting word lists, each relating to a
specific topic
 Topic modeling
2

Contribution
•We propose a new posterior estimation for
latent Dirichlet allocation (LDA) [Blei+ 03]
•by applying stochastic gradient variational Bayes
(SGVB) [Kingma+ 14] to LDA
4

LDA [Blei+ 03]
• Achieve a clustering of word tokens by assigning each word
token to one among the 𝐾 topics.
• 𝑧 𝑑𝑖: the topic to which the 𝑖-th word token in document 𝑑 is
assigned.
• 𝜃 𝑑𝑘: How often the topic 𝑘 is talked about in document 𝑑?
• Topic probability distribution in each document
• 𝜙 𝑘𝑣: How often the word 𝑣 is used to talk about the topic 𝑘?
• Word probability distribution for each topic
discrete variables
continuous variables
6

Variational Bayesian (VB) inference
= maximization of evidence lower bound (ELBO)
•VB tries to approximate the true posterior.
•An approximate posterior is introduced when ELBO is
obtained by applying Jensen's inequality:
• 𝒛: discrete hidden variables (topic assignments)
• 𝚯: continuous hidden variables (multinomial parameters)
evidence approximate posterior 𝑞(𝒛, 𝚯)
7

Factorization assumption
•We assume the approximate posterior 𝑞 𝒛, 𝚯
factorizes as 𝑞 𝒛 𝑞 𝚯 to make the inference
tractable.
•Then ELBO can be written as
8

Stochastic gradient variational Bayes
(SGVB) [Kingma+ 14]
•A general framework for estimating evidence
lower bound (ELBO) in variational Bayes (VB)
•Only applicable to continuous distributions
𝑞 𝚯
9

(SGVB) Monte Carlo integration
•By using Monte Carlo integration, ELBO can be
estimated with 𝐿 random samples as
• The discrete part 𝑞 𝒛 is estimated in a similar manner
to the original VB for LDA [Blei+ 03].
10

(SGVB) Reparameterization
• SGVB can be applied "under certain mild conditions."
• We use the logistic normal distributions for approximating
the true posterior of
𝜃 𝑑𝑘: per-doc topic probability distributions, and
𝜙 𝑘𝑣: per-topic word probability distributions.
• We can efficiently sample from the logistic normal with
reparameterization.
11

"Stochastic" gradient VB
•The expectation integrations in ELBO are estimated
by Monte Carlo method.
•The derivatives of ELBO depend on random samples.
•Randomness is incorporated into maximization.
• SGVB = VB where gradients are stochastic.
• (Observation) It seems easier to avoid poor local minima.
14

without randomness
= with zero standard deviation
•A special case of the proposed method is quite
similar to CVB0 [Asuncion+ 09].
•Our method has a context.
15

Data sets for evaluation
# docs
# vocabulary
words
NYT 99,932 46,263
MOVIE 27,859 62,408
NSF 128,818 21,471
MED 125,490 42,830
16

Not that efficient in time…
•500 iters for NYT data set when 𝐾 = 200
•LNV: 43 hours
•CGS: 14 hours
•VB: 23 hours
•However, parallelization with GPU works.
•(preparing an implementation with TensorFlow)
21

Conclusion
•We incorporate randomness into variational
inference for LDA by applying SGVB to LDA.
•The proposed method gives perplexities
comparable to the existing inferences for LDA.
22

Future work
•SGVB is a general framework for devising a
posterior inference for probabilistic models.
•We've already applied SGVB to CTM [Blei+ 05].
• This will be poster-presented at APWeb'16.
•SGVB is also applicable to other document models.
• NVDM [Miao+ 16]: document modeling with MLP
23

Empfohlen

A Simple Stochastic Gradient Variational Bayes for the Correlated Topic ModelTomonari Masada

A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationTomonari Masada

Interactive Latent Dirichlet AllocationQuentin Pleplé

Topic ModelsClaudia Wagner

Author Topic ModelFReeze FRancis

Context-dependent Token-wise Variational Autoencoder for Topic ModelingTomonari Masada

Сергей Кольцов —НИУ ВШЭ —ICBDA 2015rusbase

Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Sean Golliher

Empfohlen

A Simple Stochastic Gradient Variational Bayes for the Correlated Topic ModelTomonari Masada

A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationTomonari Masada

Interactive Latent Dirichlet AllocationQuentin Pleplé

Topic ModelsClaudia Wagner

Author Topic ModelFReeze FRancis

Context-dependent Token-wise Variational Autoencoder for Topic ModelingTomonari Masada

Сергей Кольцов —НИУ ВШЭ —ICBDA 2015rusbase

Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Sean Golliher

text summarization using amramit nagarkoti

Probabilistic information retrieval models & systemsSelman Bozkır

Ir 09Mohammed Romi

Skip gram and cbowhyunyoung Lee

TopicModels_BleiPaper_Summary.pptxKalpit Desai

..Ans 1Vimmi Kaushal

Overloadingadil raja

A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...marxliouville

Ire finalDhruva Das

Tiancheng Zhao - 2017 - Learning Discourse-level Diversity for Neural Dialog...Association for Computational Linguistics

2021 03-02-distributed representations-of_words_and_phrasesJAEMINJEONG5

Word_Embedding.pptxNameetDaga1

Science in text miningTanay Chowdhury

What is word2vec?Traian Rebedea

Learning group variational inferenceShuai Zhang

Bayesian Multi-topic Microarray Analysis with Hyperparameter ReestimationTomonari Masada

Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰ssuserc35c0e

Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...AIST

A Topic Model for Traffic Speed Data AnalysisTomonari Masada

Expressive Querying of Semantic Databases with Incremental Query RewritingAlexandre Riazanov

NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre...ssuser4b1f48

Learning Latent Space Energy Based Prior Modelの解説Tomonari Masada

Weitere ähnliche Inhalte

Was ist angesagt?

text summarization using amramit nagarkoti

Probabilistic information retrieval models & systemsSelman Bozkır

Ir 09Mohammed Romi

Skip gram and cbowhyunyoung Lee

TopicModels_BleiPaper_Summary.pptxKalpit Desai

..Ans 1Vimmi Kaushal

Overloadingadil raja

A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...marxliouville

Ire finalDhruva Das

Was ist angesagt? (9)

text summarization using amr

Probabilistic information retrieval models & systems

Ir 09

Skip gram and cbow

TopicModels_BleiPaper_Summary.pptx

..Ans 1

Overloading

A+Novel+Approach+Based+On+Prototypes+And+Rough+Sets+For+Document+And+Feature+...

Ire final

Ähnlich wie A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

Tiancheng Zhao - 2017 - Learning Discourse-level Diversity for Neural Dialog...Association for Computational Linguistics

2021 03-02-distributed representations-of_words_and_phrasesJAEMINJEONG5

Word_Embedding.pptxNameetDaga1

Science in text miningTanay Chowdhury

What is word2vec?Traian Rebedea

Learning group variational inferenceShuai Zhang

Bayesian Multi-topic Microarray Analysis with Hyperparameter ReestimationTomonari Masada

Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰ssuserc35c0e

Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...AIST

A Topic Model for Traffic Speed Data AnalysisTomonari Masada

Expressive Querying of Semantic Databases with Incremental Query RewritingAlexandre Riazanov

NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre...ssuser4b1f48

Ähnlich wie A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation (12)

Tiancheng Zhao - 2017 - Learning Discourse-level Diversity for Neural Dialog...

2021 03-02-distributed representations-of_words_and_phrases

Word_Embedding.pptx

Science in text mining

What is word2vec?

Learning group variational inference

Bayesian Multi-topic Microarray Analysis with Hyperparameter Reestimation

Fasttext(Enriching Word Vectors with Subword Information) 논문 리뷰

Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...

A Topic Model for Traffic Speed Data Analysis

Expressive Querying of Semantic Databases with Incremental Query Rewriting

NS-CUK Seminar: H.B.Kim, Review on "subgraph2vec: Learning Distributed Repre...

Mehr von Tomonari Masada

Learning Latent Space Energy Based Prior Modelの解説Tomonari Masada

Denoising Diffusion Probabilistic Modelsの重要な式の解説Tomonari Masada

A note on the density of Gumbel-softmaxTomonari Masada

トピックモデルの基礎と応用Tomonari Masada

Expectation propagation for latent Dirichlet allocationTomonari Masada

Mini-batch Variational Inference for Time-Aware Topic ModelingTomonari Masada

A note on variational inference for the univariate GaussianTomonari Masada

Document Modeling with Implicit Approximate Posterior DistributionsTomonari Masada

LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka CompositionTomonari Masada

A Note on ZINB-VAETomonari Masada

A Note on Latent LSTM AllocationTomonari Masada

A Note on TopicRNNTomonari Masada

Topic modeling with Poisson factorization (2)Tomonari Masada

Poisson factorizationTomonari Masada

Word count in Husserliana Volumes 1 to 28Tomonari Masada

FDSE2015Tomonari Masada

A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...Tomonari Masada

A Note on BPTT for LSTM LMTomonari Masada

The detailed derivation of the derivatives in Table 2 of Marginalized Denoisi...Tomonari Masada

A Note on PCVB0 for HDP-LDATomonari Masada

Mehr von Tomonari Masada (20)

Learning Latent Space Energy Based Prior Modelの解説

Denoising Diffusion Probabilistic Modelsの重要な式の解説

A note on the density of Gumbel-softmax

トピックモデルの基礎と応用

Expectation propagation for latent Dirichlet allocation

Mini-batch Variational Inference for Time-Aware Topic Modeling

A note on variational inference for the univariate Gaussian

Document Modeling with Implicit Approximate Posterior Distributions

LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition

A Note on ZINB-VAE

A Note on Latent LSTM Allocation

A Note on TopicRNN

Topic modeling with Poisson factorization (2)

Poisson factorization

Word count in Husserliana Volumes 1 to 28

FDSE2015

A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...

A Note on BPTT for LSTM LM

The detailed derivation of the derivatives in Table 2 of Marginalized Denoisi...

A Note on PCVB0 for HDP-LDA

Kürzlich hochgeladen

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEslot gacor bisa pakai pulsa

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal

UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat

Introduction to Multiple Access Protocol.pptxupamatechverse

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Porous Ceramics seminar and technical writingrakeshbaidya232001

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth

HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95

UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia

Introduction to IEEE STANDARDS and its different types.pptxupamatechverse

the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile

SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome

Kürzlich hochgeladen (20)

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts

Introduction to Multiple Access Protocol.pptx

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts

Porous Ceramics seminar and technical writing

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV

UNIT-III FMM. DIMENSIONAL ANALYSIS

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)

Introduction to IEEE STANDARDS and its different types.pptx

the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...

SPICE PARK APR2024 ( 6,793 SPICE Models )

A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation

1. A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation Tomonari MASADA (正田备也) Nagasaki University (长崎大学) masada@nagasaki-u.ac.jp

2. Aim •Obtain an informative summary of a large set of documents •by extracting word lists, each relating to a specific topic  Topic modeling 2

3. 3

4. Contribution •We propose a new posterior estimation for latent Dirichlet allocation (LDA) [Blei+ 03] •by applying stochastic gradient variational Bayes (SGVB) [Kingma+ 14] to LDA 4

5. 5

6. LDA [Blei+ 03] • Achieve a clustering of word tokens by assigning each word token to one among the 𝐾 topics. • 𝑧 𝑑𝑖: the topic to which the 𝑖-th word token in document 𝑑 is assigned. • 𝜃 𝑑𝑘: How often the topic 𝑘 is talked about in document 𝑑? • Topic probability distribution in each document • 𝜙 𝑘𝑣: How often the word 𝑣 is used to talk about the topic 𝑘? • Word probability distribution for each topic discrete variables continuous variables 6

7. Variational Bayesian (VB) inference = maximization of evidence lower bound (ELBO) •VB tries to approximate the true posterior. •An approximate posterior is introduced when ELBO is obtained by applying Jensen's inequality: • 𝒛: discrete hidden variables (topic assignments) • 𝚯: continuous hidden variables (multinomial parameters) evidence approximate posterior 𝑞(𝒛, 𝚯) 7

8. Factorization assumption •We assume the approximate posterior 𝑞 𝒛, 𝚯 factorizes as 𝑞 𝒛 𝑞 𝚯 to make the inference tractable. •Then ELBO can be written as 8

9. Stochastic gradient variational Bayes (SGVB) [Kingma+ 14] •A general framework for estimating evidence lower bound (ELBO) in variational Bayes (VB) •Only applicable to continuous distributions 𝑞 𝚯 9

10. (SGVB) Monte Carlo integration •By using Monte Carlo integration, ELBO can be estimated with 𝐿 random samples as • The discrete part 𝑞 𝒛 is estimated in a similar manner to the original VB for LDA [Blei+ 03]. 10

11. (SGVB) Reparameterization • SGVB can be applied "under certain mild conditions." • We use the logistic normal distributions for approximating the true posterior of 𝜃 𝑑𝑘: per-doc topic probability distributions, and 𝜙 𝑘𝑣: per-topic word probability distributions. • We can efficiently sample from the logistic normal with reparameterization. 11

12. Maximize ELBO using gradient ascent 12

13. 13

14. "Stochastic" gradient VB •The expectation integrations in ELBO are estimated by Monte Carlo method. •The derivatives of ELBO depend on random samples. •Randomness is incorporated into maximization. • SGVB = VB where gradients are stochastic. • (Observation) It seems easier to avoid poor local minima. 14

15. without randomness = with zero standard deviation •A special case of the proposed method is quite similar to CVB0 [Asuncion+ 09]. •Our method has a context. 15

16. Data sets for evaluation # docs # vocabulary words NYT 99,932 46,263 MOVIE 27,859 62,408 NSF 128,818 21,471 MED 125,490 42,830 16

17. 17

18. 18

19. 19

20. 20

21. Not that efficient in time… •500 iters for NYT data set when 𝐾 = 200 •LNV: 43 hours •CGS: 14 hours •VB: 23 hours •However, parallelization with GPU works. •(preparing an implementation with TensorFlow) 21

22. Conclusion •We incorporate randomness into variational inference for LDA by applying SGVB to LDA. •The proposed method gives perplexities comparable to the existing inferences for LDA. 22

23. Future work •SGVB is a general framework for devising a posterior inference for probabilistic models. •We've already applied SGVB to CTM [Blei+ 05]. • This will be poster-presented at APWeb'16. •SGVB is also applicable to other document models. • NVDM [Miao+ 16]: document modeling with MLP 23

24. 24