Based on the Author-Topic model, this passage was most likely written by:- Scholkopf_B: Support vector machines and kernels are prominent topics associated with this author. Words like "kernel", "support", and "vector" appear in a topic strongly associated with Scholkopf. - Vapnik_V: SVM and kernels are also closely associated with this author. - Cristianini_N: Kernels and SVMs appear in topics linked to this author as well.So the most likely authors based on the language model are Scholkopf_B, Vapnik_V, and Cristianini_N. The passage discusses kernels and support vectors, core concepts in SVM modeling that these
This was my final project back in 2009, in the class of Natural Language Processing at the CS department in University of Pittsburgh, PA, USA, class taught by professor Rebecca Hwa.
It has many details on the backup slides about LDA, hyperparameters, how to calculate the distributions based on MLE, etc.
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
Ähnlich wie Based on the Author-Topic model, this passage was most likely written by:- Scholkopf_B: Support vector machines and kernels are prominent topics associated with this author. Words like "kernel", "support", and "vector" appear in a topic strongly associated with Scholkopf. - Vapnik_V: SVM and kernels are also closely associated with this author. - Cristianini_N: Kernels and SVMs appear in topics linked to this author as well.So the most likely authors based on the language model are Scholkopf_B, Vapnik_V, and Cristianini_N. The passage discusses kernels and support vectors, core concepts in SVM modeling that these
Intelligent Methods in Models of Text Information Retrieval: Implications for...inscit2006
Ähnlich wie Based on the Author-Topic model, this passage was most likely written by:- Scholkopf_B: Support vector machines and kernels are prominent topics associated with this author. Words like "kernel", "support", and "vector" appear in a topic strongly associated with Scholkopf. - Vapnik_V: SVM and kernels are also closely associated with this author. - Cristianini_N: Kernels and SVMs appear in topics linked to this author as well.So the most likely authors based on the language model are Scholkopf_B, Vapnik_V, and Cristianini_N. The passage discusses kernels and support vectors, core concepts in SVM modeling that these (20)
Based on the Author-Topic model, this passage was most likely written by:- Scholkopf_B: Support vector machines and kernels are prominent topics associated with this author. Words like "kernel", "support", and "vector" appear in a topic strongly associated with Scholkopf. - Vapnik_V: SVM and kernels are also closely associated with this author. - Cristianini_N: Kernels and SVMs appear in topics linked to this author as well.So the most likely authors based on the language model are Scholkopf_B, Vapnik_V, and Cristianini_N. The passage discusses kernels and support vectors, core concepts in SVM modeling that these
1. LDA on Social Bookmarking
Systems: an experiment on
CiteUlike
Introduction to Natural Language Processing, CS2731
Professor Rebecca Hwa
University of Pittsburgh
Denis Parra-Santander
December 16th 2009
1
2. Outline
2
Outline
Topic ModelingJoke (check mood
of people…)
LDAIntroduction
Sorry I’m
nervous…
Motivation
Smart
Statement…
Definitions
Monte Carlo: a
great place pass
your vacations
DIRICHLET:
[diʀiˈkleː]
Uuuh…
Uuuh…
Experiments
Results
END
Evaluation
method
3. Topic modeling: Evolution
LSA [Deerwester et al. 90]: find “latent”
structure or “concepts” in a text corpus.:
◦ Compare texts using a vector-based
representation that is learned from a corpus.
Relies on SVD (for dimensionality reduction)
PLSA [Hoffman 99] extends LSA by adding
the idea of mixture decomposition derived
from a latent class model.
LDA [Blei et al. 2004] : extends PLSA by
using a generative model, in particular, by
adding a Dirichlet prior.
3
4. Document 22
LDA : Generative Model* (I/II)
4
Words:
Information
About catalog
pricing changes
2009 welcome
looking hands-on
science ideas try
kitchen
• LDA assumes that each
word in the document
was generated by a
distribution of topics over
words.
Topic 15:
science
experiment
learning
ideas
practice
information
Topic 9:
catalog
shopping
buy
internet
checkout
cart
• Paired with an inference
mechanism (Gibbs
sampling), learns per-
document distribution
over topics, per-topic
distributions over words
…
*Original slide by Daniel Ramage, Stanford University
5. LDA I/II : Graphical Model
5
Graphical model representations
Compact notation:
Cat
w1 w2 w3 w4 wn
…
*Original slide by Roger Levy, UCSD
Cat
w1
n
“generate a word from Cat n times”
a “plate”
6. LDA II/II : Graphical Model
6
Nd D
zi
wi
θ (d)
φ (j)
α
β
θ (d) ∼ Dirichlet(α)
zi ∼ Discrete(θ (d) )φ(j) ∼ Dirichlet(β)
wi ∼ Discrete(φ(zi) )
T
distribution over topics
for each document
topic assignment
for each word
distribution over words
for each topic
word generated from
assigned topic
Dirichlet priors
*Original slide by Roger Levy, UCSD
7. Learning the parameters
7
Maximum likelihood estimation (EM)
◦ e.g. Hofmann (1999)
Deterministic approximate algorithms
◦ variational EM; Blei, Ng & Jordan (2001; 2003)
◦ expectation propagation; Minka & Lafferty
(2002)
Markov chain Monte Carlo
◦ full Gibbs sampler; Pritchard et al. (2000)
◦ collapsed Gibbs sampler; Griffiths & Steyvers
(2004)
*Original slide by Roger Levy, UCSD
8. My Experiments
IdentifyTopics in a collection of documents
from a social bookmarking system
(citeULike) [Ramage et al. 2008]
Objective: Clusterise documents by LDA
QUESTION: If the documents have, in
addition to title and text, USERTAGS… how
can they help/influence/improve topic
identification/clustering?
8
9. Tools available
Many implementations of LDA based on
Gibbs sampling:
LingPipe (Java)
Mallet (Java)
STMT (Scala) – I chose this one
9
10. The Dataset
Initially
◦ Corpus: ~45k documents,
◦ Definition of 99 topics (queries)
◦ Gold-std : Identification document-topic by
expert feedback, defining a ground-truth
But then, then gold-standard and RAM…
◦ Not all documents were relevant
◦ Unable to train model with 45k, 20k and10k
And then, the tags: not all the documents in
gold-standard had associated tags (#>2)
◦ Finally:Training with 1.1k documents
◦ Experiments on 212 documents
10
13. Perplexity
13
# of topics
38 52 99
Content Tags 1860.7642 1880.7974 1270.8032
Title +
text
2526.7589 2447.5477 2755.1329
Using Stanford Topic Modeling Toolbox (STMT)
Training with ~1.1k documents, 80% training, 20% to calculate
pp.
14. F1 ( & precision/recall)
F-1, in parenthesis precision and recall
14
# of topics
38 52 99
Tags 0.139
(0.118/0.167)
0.168
(0.187/0.152)
0.215
(0.267/0.18)
Title +
text
0.1252
(0.122/0.128)
0.157
(0.151/0.163)
0.156
(0.198/0.129)
15. Conclusions
Results are not the same than “motivational”
paper, though are consistent with their
conclusions (dataset is very domain-specific)
Pending: combining tags and documents, in
particular MM-LDA
Importance to NLP: extensions of the model
have been used to:
◦ learn syntactic and semantic factors that guide
word choice
◦ Identify authorship
◦ Many others ()
15
17. “Invent new worlds and watch your
word;
The adjective, when it doesn’t give life,
kills…”
Ars Poetica
Vicente Huidobro
“Inventa nuevos mundos y cuida tu palabra;
El adjetivo, cuando no da vida, mata…”
17
18. References
Heinrich, G. (2008). Parameter estimation for text analysis,.Technical report,
University of Leipzig.
Ramage, D., P. Heymann, C. D. Manning, and H. G. Molina (2009). Clustering the
tagged web. In WSDM '09: Proceedings of the Second ACM International
Conference onWeb Search and Data Mining, NewYork, NY, USA, pp. 54-63.
ACM.
Steyvers, M. and T. Griffiths (2007). Probabilistic Topic Models. Lawrence Erlbaum
Associates.
18
20. LSA: 3 claims (2 match with LDA)
Semantic Information can be derived from
a word-document co-ocurrence matrix
Dimensionality reduction is an essential
part of this derivation
Words and documents can be
represented as points in an Euclidean
Space => different than LDA: semantic
properties of words and docs are
expressed in terms of probabilistic topics
20
22. Inverting the generative model
Maximum likelihood estimation (EM)
◦ e.g. Hofmann (1999)
Deterministic approximate algorithms
◦ variational EM; Blei, Ng & Jordan (2001; 2003)
◦ expectation propagation; Minka & Lafferty (2002)
Markov chain Monte Carlo
◦ full Gibbs sampler; Pritchard et al. (2000)
◦ collapsed Gibbs sampler; Griffiths & Steyvers (2004)
23. The collapsed Gibbs sampler
Using conjugacy of Dirichlet and multinomial
distributions, integrate out continuous parameters
Defines a distribution on discrete ensembles z
ΦΦΦ= ∫
∆
dpPP
T
W
)(),|()|( zwzw
ΘΘΘ= ∫
∆
dpPP
D
T
)()|()( zz
∑
=
z
zzw
zzw
wz
)()|(
)()|(
)|(
PP
PP
P
∏
∑
∏
= +Γ
Γ
Γ
+Γ
=
T
j
w
j
w
W
w
j
w
n
Wn
1
)(
)(
)(
)(
)(
)(
β
β
β
β
∏
∑
∏
= +Γ
Γ
Γ
+Γ
=
D
d
j
d
j
T
j
d
j
n
Tn
1
)(
)(
)(
)(
)(
)(
α
α
α
α
24. The collapsed Gibbs sampler
Sample each zi conditioned on z-i
This is nicer than your average Gibbs sampler:
◦ memory: counts can be cached in two sparse matrices
◦ optimization: no special functions, simple arithmetic
◦ the distributions on Φ and Θ are analytic given z and w, and can
later be found for each sample
α
α
β
β
Tn
n
Wn
n
zP i
i
i
i
i
d
d
j
z
z
w
ii
+
+
+
+
∝
••
− )(
)(
)(
)(
),|( zw
27. Nu U
zi
wi
θ(u)
φ (j)
α
β
θ(u)|su=0 ∼ Delta(θ(u-1))
θ(u)|su=1 ∼ Dirichlet(α)
zi ∼ Discrete(θ (u) )
φ(j) ∼ Dirichlet(β)
wi ∼ Discrete(φ(zi) )
T
Extension: a model for meetings
su
θ(u-1)
…
(Purver, Kording, Griffiths, & Tenenbaum, 2006)
28. Sample of ICSI meeting corpus
(25 meetings)
no it's o_k.
it's it'll work.
well i can do that.
but then i have to end the presentation in the middle so i can go back to open up javabayes.
o_k fine.
here let's see if i can.
alright.
very nice.
is that better.
yeah.
o_k.
uh i'll also get rid of this click to add notes.
o_k. perfect
NEW TOPIC (not supplied to algorithm)
so then the features we decided or we decided we were talked about.
right.
uh the the prosody the discourse verb choice.
you know we had a list of things like to go and to visit and what not.
the landmark-iness of uh.
i knew you'd like that.
30. Comparison with human judgments
Topics recovered are much more coherent than those found
using random segmentation, no segmentation, or an HMM
31. Learning the number of topics
Can use standard Bayes factor methods to
evaluate models of different dimensionality
◦ e.g. importance sampling via MCMC
Alternative: nonparametric Bayes
◦ fixed number of topics per document,
unbounded number of topics per corpus
(Blei, Griffiths, Jordan, & Tenenbaum, 2004)
◦ unbounded number of topics for both (the
hierarchical Dirichlet process)
(Teh, Jordan, Beal, & Blei, 2004)
32. The Author-Topic model
(Rosen-Zvi, Griffiths,Smyth, & Steyvers, 2004)
Nd D
zi
wi
θ (a)
φ (j)
α
β
θ (a) ∼ Dirichlet(α)
zi ∼ Discrete(θ (xi) )
φ(j) ∼ Dirichlet(β)
wi ∼ Discrete(φ(zi) )
T
xi
A
xi ∼ Uniform(A(d) )
each author has a
distribution over topics
the author of each word is
chosen uniformly at random
34. Who wrote what?
A method1 is described which like the kernel1 trick1 in support1 vector1 machines1 SVMs1 lets
us generalize distance1 based2 algorithms to operate in feature1 spaces usually nonlinearly
related to the input1 spaceThis is done by identifying a class of kernels1 which can be
represented as norm1 based2 distances1 in Hilbert spaces It turns1 out that common kernel1
algorithms such as SVMs1 and kernel1 PCA1 are actually really distance1 based2 algorithms and
can be run2 with that class of kernels1 too As well as providing1 a useful new insight1 into how
these algorithms work the present2 work can form the basis1 for conceiving new algorithms
This paper presents2 a comprehensive approach for model2 based2 diagnosis2 which includes
proposals for characterizing and computing2 preferred2 diagnoses2 assuming that the system2
description2 is augmented with a system2 structure2 a directed2 graph2 explicating the
interconnections between system2 components2 Specifically we first introduce the notion of a
consequence2 which is a syntactically2 unconstrained propositional2 sentence2 that
characterizes all consistency2 based2 diagnoses2 and show2 that standard2 characterizations of
diagnoses2 such as minimal conflicts1 correspond to syntactic2 variations1 on a consequence2
Second we propose a new syntactic2 variation on the consequence2 known as negation2 normal
form NNF and discuss its merits compared to standard variationsThird we introduce a basic
algorithm2 for computing consequences in NNF given a structured system2 description We
show that if the system2 structure2 does not contain cycles2 then there is always a linear size2
consequence2 in NNF which can be computed in linear time2 For arbitrary1 system2 structures2
we show a precise connection between the complexity2 of computing2 consequences and the
topology of the underlying system2 structure2 Finally we present2 an algorithm2 that
enumerates2 the preferred2 diagnoses2 characterized by a consequence2 The algorithm2 is
shown1 to take linear time2 in the size2 of the consequence2 if the preference criterion1 satisfies
some general conditions
Written by
(1) Scholkopf_B
Written by
(2) Darwiche_A
35. Analysis of PNAS abstracts
Test topic models with a real database
of scientific papers from PNAS
All 28,154 abstracts from 1991-2001
All words occurring in at least five
abstracts, not on “stop” list (20,551)
Total of 3,026,970 tokens in corpus
(Griffiths & Steyvers, 2004)
37. Cold topics Hot topics
2
SPECIES
GLOBAL
CLIMATE
CO2
WATER
ENVIRONMENTAL
YEARS
MARINE
CARBON
DIVERSITY
OCEAN
EXTINCTION
TERRESTRIAL
COMMUNITY
ABUNDANCE
134
MICE
DEFICIENT
NORMAL
GENE
NULL
MOUSE
TYPE
HOMOZYGOUS
ROLE
KNOCKOUT
DEVELOPMENT
GENERATED
LACKING
ANIMALS
REDUCED
179
APOPTOSIS
DEATH
CELL
INDUCED
BCL
CELLS
APOPTOTIC
CASPASE
FAS
SURVIVAL
PROGRAMMED
MEDIATED
INDUCTION
CERAMIDE
EXPRESSION
37
CDNA
AMINO
SEQUENCE
ACID
PROTEIN
ISOLATED
ENCODING
CLONED
ACIDS
IDENTITY
CLONE
EXPRESSED
ENCODES
RAT
HOMOLOGY
289
KDA
PROTEIN
PURIFIED
MOLECULAR
MASS
CHROMATOGRAPHY
POLYPEPTIDE
GEL
SDS
BAND
APPARENT
LABELED
IDENTIFIED
FRACTION
DETECTED
75
ANTIBODY
ANTIBODIES
MONOCLONAL
ANTIGEN
IGG
MAB
SPECIFIC
EPITOPE
HUMAN
MABS
RECOGNIZED
SERA
EPITOPES
DIRECTED
NEUTRALIZING
39. Effects of hyperparameters
α and β control the relative sparsity of Φ and Θ
◦ smaller α, fewer topics per document
◦ smaller β, fewer words per topic
Good assignments z compromise in sparsity
logΓ(x)
x
∏
∑
∏
= +Γ
Γ
Γ
+Γ
=
T
j
w
j
w
W
w
j
w
n
Wn
P
1
)(
)(
)(
)(
)(
)(
)|(
β
β
β
β
zw
∏
∑
∏
= +Γ
Γ
Γ
+Γ
=
D
d
j
d
j
T
j
d
j
n
Tn
P
1
)(
)(
)(
)(
)(
)(
)(
α
α
α
α
z