This document summarizes the lda2vec model, which combines aspects of word2vec and LDA. Word2vec learns word embeddings based on local context, while LDA learns document-level topic mixtures. Lda2vec models words based on both their local context and global document topic mixtures to leverage both approaches. It represents documents as mixtures over sparse topic vectors similar to LDA to maintain interpretability. This allows it to predict words based on local context and global document content.
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Â
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
1. A word is worth a
thousand vectors
(word2vec, lda, and introducing lda2vec)
Christopher Moody
@ Stitch Fix
2. About
@chrisemoody
Caltech Physics
PhD. in astrostats supercomputing
sklearn t-SNE contributor
Data Labs at Stitch Fix
github.com/cemoody
Gaussian Processes t-SNE
chainer
deep learning
Tensor Decomposition
3. Credit
Large swathes of this talk are from
previous presentations by:
âą Tomas Mikolov
âą David Blei
âą Christopher Olah
âą Radim Rehurek
âą Omer Levy & Yoav Goldberg
âą Richard Socher
âą Xin Rong
âą Tim Hopper
8. w
ord2vec
âThe fox jumped over the lazy dogâ
Maximize the likelihood of seeing the words given the word over.
P(the|over)
P(fox|over)
P(jumped|over)
P(the|over)
P(lazy|over)
P(dog|over)
âŠinstead of maximizing the likelihood of co-occurrence counts.
11. w
ord2vec
Twist: we have two vectors for every word.
Should depend on whether itâs the input or the output.
Also a context window around every input word.
âThe fox jumped over the lazy dogâ
P(vOUT|vIN)
12. w
ord2vec
âThe fox jumped over the lazy dogâ
vIN
P(vOUT|vIN)
Twist: we have two vectors for every word.
Should depend on whether itâs the input or the output.
Also a context window around every input word.
13. w
ord2vec
âThe fox jumped over the lazy dogâ
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word.
Should depend on whether itâs the input or the output.
Also a context window around every input word.
14. w
ord2vec
âThe fox jumped over the lazy dogâ
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word.
Should depend on whether itâs the input or the output.
Also a context window around every input word.
15. w
ord2vec
âThe fox jumped over the lazy dogâ
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word.
Should depend on whether itâs the input or the output.
Also a context window around every input word.
16. w
ord2vec
âThe fox jumped over the lazy dogâ
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word.
Should depend on whether itâs the input or the output.
Also a context window around every input word.
17. w
ord2vec
âThe fox jumped over the lazy dogâ
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word.
Should depend on whether itâs the input or the output.
Also a context window around every input word.
18. w
ord2vec
âThe fox jumped over the lazy dogâ
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word.
Should depend on whether itâs the input or the output.
Also a context window around every input word.
19. w
ord2vec
P(vOUT|vIN)
âThe fox jumped over the lazy dogâ
vIN
Twist: we have two vectors for every word.
Should depend on whether itâs the input or the output.
Also a context window around every input word.
20. w
ord2vec
âThe fox jumped over the lazy dogâ
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word.
Should depend on whether itâs the input or the output.
Also a context window around every input word.
21. w
ord2vec
âThe fox jumped over the lazy dogâ
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word.
Should depend on whether itâs the input or the output.
Also a context window around every input word.
22. w
ord2vec
âThe fox jumped over the lazy dogâ
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word.
Should depend on whether itâs the input or the output.
Also a context window around every input word.
23. w
ord2vec
âThe fox jumped over the lazy dogâ
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word.
Should depend on whether itâs the input or the output.
Also a context window around every input word.
24. w
ord2vec
âThe fox jumped over the lazy dogâ
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word.
Should depend on whether itâs the input or the output.
Also a context window around every input word.
25. w
ord2vec
âThe fox jumped over the lazy dogâ
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word.
Should depend on whether itâs the input or the output.
Also a context window around every input word.
32. w
ord2vec
But weâd like to measure a probability.
softmax(vin . vout â [-1,1])
objective
â [0,1]
33. w
ord2vec
But weâd like to measure a probability.
softmax(vin . vout â [-1,1])
Probability of choosing 1 of N discrete items.
Mapping from vector space to a multinomial over words.
objective
35. w
ord2vec
But weâd like to measure a probability.
exp(vin . vout â [-1,1])
ÎŁexp(vin . vk)
softmax =
objective
Normalization term over all words
k â V
36. w
ord2vec
But weâd like to measure a probability.
exp(vin . vout â [-1,1])
ÎŁexp(vin . vk)
softmax = = P(vout|vin)
objective
k â V
37. w
ord2vec
Learn by gradient descent on the softmax prob.
For every example we see update vin
vin := vin + P(vout|vin)
objective
vout := vout + P(vout|vin)
79. The goal:
Use all of this context to learn
interpretable topics.
P(vOUT |vIN)word2vec
@chrisemoody
80. word2vec
LDA P(vOUT |vDOC)
The goal:
Use all of this context to learn
interpretable topics.
this document is
80% high fashion
this document is
60% style
@chrisemoody
81. word2vec
LDA
The goal:
Use all of this context to learn
interpretable topics.
this zip code is
80% hot climate
this zip code is
60% outdoors wear
@chrisemoody
82. word2vec
LDA
The goal:
Use all of this context to learn
interpretable topics.
this client is
80% sporty
this client is
60% casual wear
@chrisemoody
86. lda2vec
âPS! Thank you for such an awesome topâdoc_id=1846
vIN vOUT
vDOC
can we predict a word both locally and globally ?
P(vOUT |vIN+ vDOC)
87. lda2vec
doc_id=1846
vIN vOUT
vDOC
*very similar to the Paragraph Vectors / doc2vec
can we predict a word both locally and globally ?
âPS! Thank you for such an awesome topâ
P(vOUT |vIN+ vDOC)
102. lda2vec
Letâs make vDOC sparse
{a, b, câŠ} ~ dirichlet(alpha)
vDOC = a vreligion + b vpolitics +âŠ
103. lda2vec
Letâs make vDOC sparse
{a, b, câŠ} ~ dirichlet(alpha)
vDOC = a vreligion + b vpolitics +âŠ
104. word2vec
LDA
P(vOUT |vIN + vDOC)lda2vec
The goal:
Use all of this context to learn
interpretable topics.
@chrisemoody
this document is
80% high fashion
this document is
60% style
105. word2vec
LDA
P(vOUT |vIN+ vDOC + vZIP)lda2vec
The goal:
Use all of this context to learn
interpretable topics.
@chrisemoody
106. word2vec
LDA
P(vOUT |vIN+ vDOC + vZIP)lda2vec
The goal:
Use all of this context to learn
interpretable topics.
this zip code is
80% hot climate
this zip code is
60% outdoors wear
@chrisemoody
107. word2vec
LDA
P(vOUT |vIN+ vDOC + vZIP +vCLIENTS)lda2vec
The goal:
Use all of this context to learn
interpretable topics.
this client is
80% sporty
this client is
60% casual wear
@chrisemoody
108. word2vec
LDA
P(vOUT |vIN+ vDOC + vZIP +vCLIENTS)
P(sold | vCLIENTS)
lda2vec
The goal:
Use all of this context to learn
interpretable topics.
@chrisemoody
Can also make the topics
supervised so that they predict
an outcome.
110. âPS! Thank you for such an awesome ideaâ
@chrisemoody
doc_id=1846
Can we model topics to sentences?
lda2lstm
111. âPS! Thank you for such an awesome ideaâ
@chrisemoody
doc_id=1846
Can we represent the internal LSTM
states as a dirichlet mixture?
112. Can we model topics to sentences?
lda2lstm
âPS! Thank you for such an awesome ideaâdoc_id=1846
@chrisemoody
Can we model topics to images?
lda2ae
TJ Torres
119. Crazy
Approaches
Paragraph Vectors
(Just extend the context window)
Content dependency
(Change the window grammatically)
Social word2vec (deepwalk)
(Sentence is a walk on the graph)
Spotify
(Sentence is a playlist of song_ids)
Stitch Fix
(Sentence is a shipment of ïŹve items)
120.
121. CBOW
âThe fox jumped over the lazy dogâ
Guess the word
given the context
~20x faster.
(this is the alternative.)
vOUT
vIN vINvIN vINvIN vIN
SkipGram
âThe fox jumped over the lazy dogâ
vOUT vOUT
vIN
vOUT vOUT vOUTvOUT
Guess the context
given the word
Better at syntax.
(this is the one we went over)
127. What I didnât mention
A lot of text (only if you have a specialized vocabulary)
Cleaning the text
Memory & performance
Traditional databases arenât well-suited
False positives
129. All of the following ideas will change what
âwordsâ and âcontextâ represent.
130. paragraph
vector
What about summarizing documents?
On the day he took ofïŹce, President Obama reached out to Americaâs enemies,
offering in his ïŹrst inaugural address to extend a hand if you are willing to unclench
your ïŹst. More than six years later, he has arrived at a moment of truth in testing that
131. On the day he took ofïŹce, President Obama reached out to Americaâs enemies,
offering in his ïŹrst inaugural address to extend a hand if you are willing to unclench
your ïŹst. More than six years later, he has arrived at a moment of truth in testing that
The framework nuclear agreement he reached with Iran on Thursday did not provide
the deïŹnitive answer to whether Mr. Obamaâs audacious gamble will pay off. The ïŹst
Iran has shaken at the so-called Great Satan since 1979 has not completely relaxed.
paragraph
vector
Normal skipgram extends C words before, and C words after.
IN
OUT OUT
132. On the day he took ofïŹce, President Obama reached out to Americaâs enemies,
offering in his ïŹrst inaugural address to extend a hand if you are willing to unclench
your ïŹst. More than six years later, he has arrived at a moment of truth in testing that
The framework nuclear agreement he reached with Iran on Thursday did not provide
the deïŹnitive answer to whether Mr. Obamaâs audacious gamble will pay off. The ïŹst
Iran has shaken at the so-called Great Satan since 1979 has not completely relaxed.
paragraph
vector
A document vector simply extends the context to the whole document.
IN
OUT OUT
OUT OUTdoc_1347
139. context
dependent
context
Levy
&
G
oldberg
2014
Also show that SGNS is simply factorizing:
w * c = PMI(w, c) - log k
This is completely amazing!
Intuition: positive associations (canada, snow)
stronger in humans than negative associations
(what is the opposite of Canada?)
140. deepwalk
Perozzi
etal2014
learn word vectors from
sentences
âThe fox jumped over the lazy dogâ
vOUT vOUT vOUT vOUT vOUTvOUT
âwordsâ are graph vertices
âsentencesâ are random walks on the
graph
word2vec