SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Downloaden Sie, um offline zu lesen
LDA on Social Bookmarking
Systems: an experiment on
CiteUlike
Introduction to Natural Language Processing, CS2731
Professor Rebecca Hwa
University of Pittsburgh
Denis Parra-Santander
December 16th 2009
1
Outline
2
Outline
Topic ModelingJoke (check mood
of people…)
LDAIntroduction
Sorry I’m
nervous…
Motivation
Smart
Statement…
Definitions
Monte Carlo: a
great place pass
your vacations
DIRICHLET:
[diʀiˈkleː]
Uuuh…
Uuuh…
Experiments
Results
END
Evaluation
method
Topic modeling: Evolution
 LSA [Deerwester et al. 90]: find “latent”
structure or “concepts” in a text corpus.:
◦ Compare texts using a vector-based
representation that is learned from a corpus.
Relies on SVD (for dimensionality reduction)
 PLSA [Hoffman 99] extends LSA by adding
the idea of mixture decomposition derived
from a latent class model.
 LDA [Blei et al. 2004] : extends PLSA by
using a generative model, in particular, by
adding a Dirichlet prior.
3
Document 22
LDA : Generative Model* (I/II)
4
Words:
Information
About catalog
pricing changes
2009 welcome
looking hands-on
science ideas try
kitchen
• LDA assumes that each
word in the document
was generated by a
distribution of topics over
words.
Topic 15:
science
experiment
learning
ideas
practice
information
Topic 9:
catalog
shopping
buy
internet
checkout
cart
• Paired with an inference
mechanism (Gibbs
sampling), learns per-
document distribution
over topics, per-topic
distributions over words
…
*Original slide by Daniel Ramage, Stanford University
LDA I/II : Graphical Model
5
 Graphical model representations
 Compact notation:
Cat
w1 w2 w3 w4 wn
…
*Original slide by Roger Levy, UCSD
Cat
w1
n
“generate a word from Cat n times”
a “plate”
LDA II/II : Graphical Model
6
Nd D
zi
wi
θ (d)
φ (j)
α
β
θ (d) ∼ Dirichlet(α)
zi ∼ Discrete(θ (d) )φ(j) ∼ Dirichlet(β)
wi ∼ Discrete(φ(zi) )
T
distribution over topics
for each document
topic assignment
for each word
distribution over words
for each topic
word generated from
assigned topic
Dirichlet priors
*Original slide by Roger Levy, UCSD
Learning the parameters
7
 Maximum likelihood estimation (EM)
◦ e.g. Hofmann (1999)
 Deterministic approximate algorithms
◦ variational EM; Blei, Ng & Jordan (2001; 2003)
◦ expectation propagation; Minka & Lafferty
(2002)
 Markov chain Monte Carlo
◦ full Gibbs sampler; Pritchard et al. (2000)
◦ collapsed Gibbs sampler; Griffiths & Steyvers
(2004)
*Original slide by Roger Levy, UCSD
My Experiments
 IdentifyTopics in a collection of documents
from a social bookmarking system
(citeULike) [Ramage et al. 2008]
 Objective: Clusterise documents by LDA
 QUESTION: If the documents have, in
addition to title and text, USERTAGS… how
can they help/influence/improve topic
identification/clustering?
8
Tools available
 Many implementations of LDA based on
Gibbs sampling:
 LingPipe (Java)
 Mallet (Java)
 STMT (Scala) – I chose this one
9
The Dataset
 Initially
◦ Corpus: ~45k documents,
◦ Definition of 99 topics (queries)
◦ Gold-std : Identification document-topic by
expert feedback, defining a ground-truth
 But then, then gold-standard and RAM…
◦ Not all documents were relevant
◦ Unable to train model with 45k, 20k and10k
 And then, the tags: not all the documents in
gold-standard had associated tags (#>2)
◦ Finally:Training with 1.1k documents
◦ Experiments on 212 documents
10
Evaluation: Pair-wise precision / recall
11*Original slide by Daniel Ramage, Stanford University
… results …
12
Perplexity
13
# of topics
38 52 99
Content Tags 1860.7642 1880.7974 1270.8032
Title +
text
2526.7589 2447.5477 2755.1329
Using Stanford Topic Modeling Toolbox (STMT)
Training with ~1.1k documents, 80% training, 20% to calculate
pp.
F1 ( & precision/recall)
 F-1, in parenthesis precision and recall
14
# of topics
38 52 99
Tags 0.139
(0.118/0.167)
0.168
(0.187/0.152)
0.215
(0.267/0.18)
Title +
text
0.1252
(0.122/0.128)
0.157
(0.151/0.163)
0.156
(0.198/0.129)
Conclusions
 Results are not the same than “motivational”
paper, though are consistent with their
conclusions (dataset is very domain-specific)
 Pending: combining tags and documents, in
particular MM-LDA
 Importance to NLP: extensions of the model
have been used to:
◦ learn syntactic and semantic factors that guide
word choice
◦ Identify authorship
◦ Many others ()
15
… and to finish …
Thanks!
And…
16
“Invent new worlds and watch your
word;
The adjective, when it doesn’t give life,
kills…”
Ars Poetica
Vicente Huidobro
“Inventa nuevos mundos y cuida tu palabra;
El adjetivo, cuando no da vida, mata…”
17
References
Heinrich, G. (2008). Parameter estimation for text analysis,.Technical report,
University of Leipzig.
Ramage, D., P. Heymann, C. D. Manning, and H. G. Molina (2009). Clustering the
tagged web. In WSDM '09: Proceedings of the Second ACM International
Conference onWeb Search and Data Mining, NewYork, NY, USA, pp. 54-63.
ACM.
Steyvers, M. and T. Griffiths (2007). Probabilistic Topic Models. Lawrence Erlbaum
Associates.
18
Backup Slides
19
LSA: 3 claims (2 match with LDA)
 Semantic Information can be derived from
a word-document co-ocurrence matrix
 Dimensionality reduction is an essential
part of this derivation
 Words and documents can be
represented as points in an Euclidean
Space => different than LDA: semantic
properties of words and docs are
expressed in terms of probabilistic topics
20
21
Parameter estimation and Gibbs
Sampling (3 Slides)
Inverting the generative model
 Maximum likelihood estimation (EM)
◦ e.g. Hofmann (1999)
 Deterministic approximate algorithms
◦ variational EM; Blei, Ng & Jordan (2001; 2003)
◦ expectation propagation; Minka & Lafferty (2002)
 Markov chain Monte Carlo
◦ full Gibbs sampler; Pritchard et al. (2000)
◦ collapsed Gibbs sampler; Griffiths & Steyvers (2004)
The collapsed Gibbs sampler
 Using conjugacy of Dirichlet and multinomial
distributions, integrate out continuous parameters
 Defines a distribution on discrete ensembles z
ΦΦΦ= ∫
∆
dpPP
T
W
)(),|()|( zwzw
ΘΘΘ= ∫
∆
dpPP
D
T
)()|()( zz
∑
=
z
zzw
zzw
wz
)()|(
)()|(
)|(
PP
PP
P
∏
∑
∏
= +Γ
Γ
Γ
+Γ
=
T
j
w
j
w
W
w
j
w
n
Wn
1
)(
)(
)(
)(
)(
)(
β
β
β
β
∏
∑
∏
= +Γ
Γ
Γ
+Γ
=
D
d
j
d
j
T
j
d
j
n
Tn
1
)(
)(
)(
)(
)(
)(
α
α
α
α
The collapsed Gibbs sampler
 Sample each zi conditioned on z-i
 This is nicer than your average Gibbs sampler:
◦ memory: counts can be cached in two sparse matrices
◦ optimization: no special functions, simple arithmetic
◦ the distributions on Φ and Θ are analytic given z and w, and can
later be found for each sample
α
α
β
β
Tn
n
Wn
n
zP i
i
i
i
i
d
d
j
z
z
w
ii
+
+
+
+
∝
••
− )(
)(
)(
)(
),|( zw
Gibbs Sampling from PTM paper
25
26
Extensions and Applications
Nu U
zi
wi
θ(u)
φ (j)
α
β
θ(u)|su=0 ∼ Delta(θ(u-1))
θ(u)|su=1 ∼ Dirichlet(α)
zi ∼ Discrete(θ (u) )
φ(j) ∼ Dirichlet(β)
wi ∼ Discrete(φ(zi) )
T
Extension: a model for meetings
su
θ(u-1)
…
(Purver, Kording, Griffiths, & Tenenbaum, 2006)
Sample of ICSI meeting corpus
(25 meetings)
 no it's o_k.
 it's it'll work.
 well i can do that.
 but then i have to end the presentation in the middle so i can go back to open up javabayes.
 o_k fine.
 here let's see if i can.
 alright.
 very nice.
 is that better.
 yeah.
 o_k.
 uh i'll also get rid of this click to add notes.
 o_k. perfect
 NEW TOPIC (not supplied to algorithm)
 so then the features we decided or we decided we were talked about.
 right.
 uh the the prosody the discourse verb choice.
 you know we had a list of things like to go and to visit and what not.
 the landmark-iness of uh.
 i knew you'd like that.
Topic segmentation applied to meetings
Inferred
Segmentation
Inferred Topics
Comparison with human judgments
Topics recovered are much more coherent than those found
using random segmentation, no segmentation, or an HMM
Learning the number of topics
 Can use standard Bayes factor methods to
evaluate models of different dimensionality
◦ e.g. importance sampling via MCMC
 Alternative: nonparametric Bayes
◦ fixed number of topics per document,
unbounded number of topics per corpus
(Blei, Griffiths, Jordan, & Tenenbaum, 2004)
◦ unbounded number of topics for both (the
hierarchical Dirichlet process)
(Teh, Jordan, Beal, & Blei, 2004)
The Author-Topic model
(Rosen-Zvi, Griffiths,Smyth, & Steyvers, 2004)
Nd D
zi
wi
θ (a)
φ (j)
α
β
θ (a) ∼ Dirichlet(α)
zi ∼ Discrete(θ (xi) )
φ(j) ∼ Dirichlet(β)
wi ∼ Discrete(φ(zi) )
T
xi
A
xi ∼ Uniform(A(d) )
each author has a
distribution over topics
the author of each word is
chosen uniformly at random
Four example topics from NIPS
WORD PROB. WORD PROB. WORD PROB. WORD PROB.
LIKELIHOOD 0.0539 RECOGNITION 0.0400 REINFORCEMENT 0.0411 KERNEL 0.0683
MIXTURE 0.0509 CHARACTER 0.0336 POLICY 0.0371 SUPPORT 0.0377
EM 0.0470 CHARACTERS 0.0250 ACTION 0.0332 VECTOR 0.0257
DENSITY 0.0398 TANGENT 0.0241 OPTIMAL 0.0208 KERNELS 0.0217
GAUSSIAN 0.0349 HANDWRITTEN 0.0169 ACTIONS 0.0208 SET 0.0205
ESTIMATION 0.0314 DIGITS 0.0159 FUNCTION 0.0178 SVM 0.0204
LOG 0.0263 IMAGE 0.0157 REWARD 0.0165 SPACE 0.0188
MAXIMUM 0.0254 DISTANCE 0.0153 SUTTON 0.0164 MACHINES 0.0168
PARAMETERS 0.0209 DIGIT 0.0149 AGENT 0.0136 REGRESSION 0.0155
ESTIMATE 0.0204 HAND 0.0126 DECISION 0.0118 MARGIN 0.0151
AUTHOR PROB. AUTHOR PROB. AUTHOR PROB. AUTHOR PROB.
Tresp_V 0.0333 Simard_P 0.0694 Singh_S 0.1412 Smola_A 0.1033
Singer_Y 0.0281 Martin_G 0.0394 Barto_A 0.0471 Scholkopf_B 0.0730
Jebara_T 0.0207 LeCun_Y 0.0359 Sutton_R 0.0430 Burges_C 0.0489
Ghahramani_Z 0.0196 Denker_J 0.0278 Dayan_P 0.0324 Vapnik_V 0.0431
Ueda_N 0.0170 Henderson_D 0.0256 Parr_R 0.0314 Chapelle_O 0.0210
Jordan_M 0.0150 Revow_M 0.0229 Dietterich_T 0.0231 Cristianini_N 0.0185
Roweis_S 0.0123 Platt_J 0.0226 Tsitsiklis_J 0.0194 Ratsch_G 0.0172
Schuster_M 0.0104 Keeler_J 0.0192 Randlov_J 0.0167 Laskov_P 0.0169
Xu_L 0.0098 Rashid_M 0.0182 Bradtke_S 0.0161 Tipping_M 0.0153
Saul_L 0.0094 Sackinger_E 0.0132 Schwartz_A 0.0142 Sollich_P 0.0141
TOPIC 19 TOPIC 24 TOPIC 29 TOPIC 87
Who wrote what?
A method1 is described which like the kernel1 trick1 in support1 vector1 machines1 SVMs1 lets
us generalize distance1 based2 algorithms to operate in feature1 spaces usually nonlinearly
related to the input1 spaceThis is done by identifying a class of kernels1 which can be
represented as norm1 based2 distances1 in Hilbert spaces It turns1 out that common kernel1
algorithms such as SVMs1 and kernel1 PCA1 are actually really distance1 based2 algorithms and
can be run2 with that class of kernels1 too As well as providing1 a useful new insight1 into how
these algorithms work the present2 work can form the basis1 for conceiving new algorithms
This paper presents2 a comprehensive approach for model2 based2 diagnosis2 which includes
proposals for characterizing and computing2 preferred2 diagnoses2 assuming that the system2
description2 is augmented with a system2 structure2 a directed2 graph2 explicating the
interconnections between system2 components2 Specifically we first introduce the notion of a
consequence2 which is a syntactically2 unconstrained propositional2 sentence2 that
characterizes all consistency2 based2 diagnoses2 and show2 that standard2 characterizations of
diagnoses2 such as minimal conflicts1 correspond to syntactic2 variations1 on a consequence2
Second we propose a new syntactic2 variation on the consequence2 known as negation2 normal
form NNF and discuss its merits compared to standard variationsThird we introduce a basic
algorithm2 for computing consequences in NNF given a structured system2 description We
show that if the system2 structure2 does not contain cycles2 then there is always a linear size2
consequence2 in NNF which can be computed in linear time2 For arbitrary1 system2 structures2
we show a precise connection between the complexity2 of computing2 consequences and the
topology of the underlying system2 structure2 Finally we present2 an algorithm2 that
enumerates2 the preferred2 diagnoses2 characterized by a consequence2 The algorithm2 is
shown1 to take linear time2 in the size2 of the consequence2 if the preference criterion1 satisfies
some general conditions
Written by
(1) Scholkopf_B
Written by
(2) Darwiche_A
Analysis of PNAS abstracts
 Test topic models with a real database
of scientific papers from PNAS
 All 28,154 abstracts from 1991-2001
 All words occurring in at least five
abstracts, not on “stop” list (20,551)
 Total of 3,026,970 tokens in corpus
(Griffiths & Steyvers, 2004)
FORCE
SURFACE
MOLECULES
SOLUTION
SURFACES
MICROSCOPY
WATER
FORCES
PARTICLES
STRENGTH
POLYMER
IONIC
ATOMIC
AQUEOUS
MOLECULAR
PROPERTIES
LIQUID
SOLUTIONS
BEADS
MECHANICAL
HIV
VIRUS
INFECTED
IMMUNODEFICIENCY
CD4
INFECTION
HUMAN
VIRAL
TAT
GP120
REPLICATION
TYPE
ENVELOPE
AIDS
REV
BLOOD
CCR5
INDIVIDUALS
ENV
PERIPHERAL
MUSCLE
CARDIAC
HEART
SKELETAL
MYOCYTES
VENTRICULAR
MUSCLES
SMOOTH
HYPERTROPHY
DYSTROPHIN
HEARTS
CONTRACTION
FIBERS
FUNCTION
TISSUE
RAT
MYOCARDIAL
ISOLATED
MYOD
FAILURE
STRUCTURE
ANGSTROM
CRYSTAL
RESIDUES
STRUCTURES
STRUCTURAL
RESOLUTION
HELIX
THREE
HELICES
DETERMINED
RAY
CONFORMATION
HELICAL
HYDROPHOBIC
SIDE
DIMENSIONAL
INTERACTIONS
MOLECULE
SURFACE
NEURONS
BRAIN
CORTEX
CORTICAL
OLFACTORY
NUCLEUS
NEURONAL
LAYER
RAT
NUCLEI
CEREBELLUM
CEREBELLAR
LATERAL
CEREBRAL
LAYERS
GRANULE
LABELED
HIPPOCAMPUS
AREAS
THALAMIC
A selection of topics
TUMOR
CANCER
TUMORS
HUMAN
CELLS
BREAST
MELANOMA
GROWTH
CARCINOMA
PROSTATE
NORMAL
CELL
METASTATIC
MALIGNANT
LUNG
CANCERS
MICE
NUDE
PRIMARY
OVARIAN
Cold topics Hot topics
2
SPECIES
GLOBAL
CLIMATE
CO2
WATER
ENVIRONMENTAL
YEARS
MARINE
CARBON
DIVERSITY
OCEAN
EXTINCTION
TERRESTRIAL
COMMUNITY
ABUNDANCE
134
MICE
DEFICIENT
NORMAL
GENE
NULL
MOUSE
TYPE
HOMOZYGOUS
ROLE
KNOCKOUT
DEVELOPMENT
GENERATED
LACKING
ANIMALS
REDUCED
179
APOPTOSIS
DEATH
CELL
INDUCED
BCL
CELLS
APOPTOTIC
CASPASE
FAS
SURVIVAL
PROGRAMMED
MEDIATED
INDUCTION
CERAMIDE
EXPRESSION
37
CDNA
AMINO
SEQUENCE
ACID
PROTEIN
ISOLATED
ENCODING
CLONED
ACIDS
IDENTITY
CLONE
EXPRESSED
ENCODES
RAT
HOMOLOGY
289
KDA
PROTEIN
PURIFIED
MOLECULAR
MASS
CHROMATOGRAPHY
POLYPEPTIDE
GEL
SDS
BAND
APPARENT
LABELED
IDENTIFIED
FRACTION
DETECTED
75
ANTIBODY
ANTIBODIES
MONOCLONAL
ANTIGEN
IGG
MAB
SPECIFIC
EPITOPE
HUMAN
MABS
RECOGNIZED
SERA
EPITOPES
DIRECTED
NEUTRALIZING
38
The effect of Alpha and beta as
hyperparameters
Effects of hyperparameters
 α and β control the relative sparsity of Φ and Θ
◦ smaller α, fewer topics per document
◦ smaller β, fewer words per topic
 Good assignments z compromise in sparsity
logΓ(x)
x
∏
∑
∏
= +Γ
Γ
Γ
+Γ
=
T
j
w
j
w
W
w
j
w
n
Wn
P
1
)(
)(
)(
)(
)(
)(
)|(
β
β
β
β
zw
∏
∑
∏
= +Γ
Γ
Γ
+Γ
=
D
d
j
d
j
T
j
d
j
n
Tn
P
1
)(
)(
)(
)(
)(
)(
)(
α
α
α
α
z
Varying α
decreasing α increases sparsity
Varying β
decreasing β
increases sparsity ?
Multi-Multinomial LDA (MM-LDA)
42
Ramage 2009 results
43

Weitere ähnliche Inhalte

Was ist angesagt?

Dependent Types in Natural Language Semantics
Dependent Types in Natural Language SemanticsDependent Types in Natural Language Semantics
Dependent Types in Natural Language SemanticsDaisuke BEKKI
 
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...AAT Taiwan
 
Composing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type SemanticsComposing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type SemanticsDaisuke BEKKI
 
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge GraphJoint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge GraphFedorNikolaev
 
Calculating Projections via Type Checking
Calculating Projections via Type CheckingCalculating Projections via Type Checking
Calculating Projections via Type CheckingDaisuke BEKKI
 
Bayesian Nonparametric Topic Modeling Hierarchical Dirichlet Processes
Bayesian Nonparametric Topic Modeling Hierarchical Dirichlet ProcessesBayesian Nonparametric Topic Modeling Hierarchical Dirichlet Processes
Bayesian Nonparametric Topic Modeling Hierarchical Dirichlet ProcessesJinYeong Bak
 
Blei lafferty2009
Blei lafferty2009Blei lafferty2009
Blei lafferty2009Ajay Ohri
 
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...Jeff Z. Pan
 
Prolog as-meta-lang-pres
Prolog as-meta-lang-presProlog as-meta-lang-pres
Prolog as-meta-lang-presGuy Wiener
 
The Maze of Deletion in Ontology Stream Reasoning
The Maze of Deletion in Ontology Stream Reasoning The Maze of Deletion in Ontology Stream Reasoning
The Maze of Deletion in Ontology Stream Reasoning Jeff Z. Pan
 
Topic models
Topic modelsTopic models
Topic modelsAjay Ohri
 
Package-based Description Logics – Preliminary Results
Package-based Description Logics – Preliminary ResultsPackage-based Description Logics – Preliminary Results
Package-based Description Logics – Preliminary ResultsJie Bao
 
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...National Institute of Informatics
 

Was ist angesagt? (17)

Dependent Types in Natural Language Semantics
Dependent Types in Natural Language SemanticsDependent Types in Natural Language Semantics
Dependent Types in Natural Language Semantics
 
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...
Making Chinese Art Accessible to Western Users- A Brief Report from AAT Taiwa...
 
Composing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type SemanticsComposing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type Semantics
 
Canini09a
Canini09aCanini09a
Canini09a
 
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge GraphJoint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
 
Calculating Projections via Type Checking
Calculating Projections via Type CheckingCalculating Projections via Type Checking
Calculating Projections via Type Checking
 
Trivandrum
TrivandrumTrivandrum
Trivandrum
 
Icml12
Icml12Icml12
Icml12
 
Bayesian Nonparametric Topic Modeling Hierarchical Dirichlet Processes
Bayesian Nonparametric Topic Modeling Hierarchical Dirichlet ProcessesBayesian Nonparametric Topic Modeling Hierarchical Dirichlet Processes
Bayesian Nonparametric Topic Modeling Hierarchical Dirichlet Processes
 
Blei lafferty2009
Blei lafferty2009Blei lafferty2009
Blei lafferty2009
 
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
 
Prolog as-meta-lang-pres
Prolog as-meta-lang-presProlog as-meta-lang-pres
Prolog as-meta-lang-pres
 
The Maze of Deletion in Ontology Stream Reasoning
The Maze of Deletion in Ontology Stream Reasoning The Maze of Deletion in Ontology Stream Reasoning
The Maze of Deletion in Ontology Stream Reasoning
 
Topicmodels
TopicmodelsTopicmodels
Topicmodels
 
Topic models
Topic modelsTopic models
Topic models
 
Package-based Description Logics – Preliminary Results
Package-based Description Logics – Preliminary ResultsPackage-based Description Logics – Preliminary Results
Package-based Description Logics – Preliminary Results
 
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
 

Andere mochten auch

SetFusion Visual Hybrid Recommender - IUI 2014
SetFusion Visual Hybrid Recommender -  IUI 2014SetFusion Visual Hybrid Recommender -  IUI 2014
SetFusion Visual Hybrid Recommender - IUI 2014Denis Parra Santander
 
Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...
Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...
Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...Denis Parra Santander
 
Currents steps to be a researcher and faculty
Currents steps to be a researcher and facultyCurrents steps to be a researcher and faculty
Currents steps to be a researcher and facultyDenis Parra Santander
 
A Hybrid Peer Recommender System for a Online Community Teachers
A Hybrid Peer Recommender System for a Online Community TeachersA Hybrid Peer Recommender System for a Online Community Teachers
A Hybrid Peer Recommender System for a Online Community TeachersDenis Parra Santander
 
Evaluation of Collaborative Filtering Algorithms for Recommending Articles on...
Evaluation of Collaborative Filtering Algorithms for Recommending Articles on...Evaluation of Collaborative Filtering Algorithms for Recommending Articles on...
Evaluation of Collaborative Filtering Algorithms for Recommending Articles on...Denis Parra Santander
 
Walk the Talk: Analyzing the relation between implicit and explicit feedback ...
Walk the Talk: Analyzing the relation between implicit and explicit feedback ...Walk the Talk: Analyzing the relation between implicit and explicit feedback ...
Walk the Talk: Analyzing the relation between implicit and explicit feedback ...Denis Parra Santander
 
Implicit Feedback Recommendation via Implicit-to-Explicit Ordinal Logistic Re...
Implicit Feedback Recommendation via Implicit-to-Explicit Ordinal Logistic Re...Implicit Feedback Recommendation via Implicit-to-Explicit Ordinal Logistic Re...
Implicit Feedback Recommendation via Implicit-to-Explicit Ordinal Logistic Re...Denis Parra Santander
 
Research on Recommender Systems: Beyond Ratings and Lists
Research on Recommender Systems: Beyond Ratings and ListsResearch on Recommender Systems: Beyond Ratings and Lists
Research on Recommender Systems: Beyond Ratings and ListsDenis Parra Santander
 
The Effect of Different Set-based Visualizations on User Exploration of Reco...
The Effect of Different Set-based  Visualizations on User Exploration of Reco...The Effect of Different Set-based  Visualizations on User Exploration of Reco...
The Effect of Different Set-based Visualizations on User Exploration of Reco...Denis Parra Santander
 
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Denis Parra Santander
 
Data Fusion for Dealing with the Recommendation Problem
Data Fusion for Dealing with the Recommendation ProblemData Fusion for Dealing with the Recommendation Problem
Data Fusion for Dealing with the Recommendation ProblemDenis Parra Santander
 
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vecword2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec👋 Christopher Moody
 

Andere mochten auch (15)

SetFusion Visual Hybrid Recommender - IUI 2014
SetFusion Visual Hybrid Recommender -  IUI 2014SetFusion Visual Hybrid Recommender -  IUI 2014
SetFusion Visual Hybrid Recommender - IUI 2014
 
Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...
Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...
Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...
 
Currents steps to be a researcher and faculty
Currents steps to be a researcher and facultyCurrents steps to be a researcher and faculty
Currents steps to be a researcher and faculty
 
A Hybrid Peer Recommender System for a Online Community Teachers
A Hybrid Peer Recommender System for a Online Community TeachersA Hybrid Peer Recommender System for a Online Community Teachers
A Hybrid Peer Recommender System for a Online Community Teachers
 
Twitter in Academic Conferences
Twitter in Academic ConferencesTwitter in Academic Conferences
Twitter in Academic Conferences
 
Evaluation of Collaborative Filtering Algorithms for Recommending Articles on...
Evaluation of Collaborative Filtering Algorithms for Recommending Articles on...Evaluation of Collaborative Filtering Algorithms for Recommending Articles on...
Evaluation of Collaborative Filtering Algorithms for Recommending Articles on...
 
Walk the Talk: Analyzing the relation between implicit and explicit feedback ...
Walk the Talk: Analyzing the relation between implicit and explicit feedback ...Walk the Talk: Analyzing the relation between implicit and explicit feedback ...
Walk the Talk: Analyzing the relation between implicit and explicit feedback ...
 
Implicit Feedback Recommendation via Implicit-to-Explicit Ordinal Logistic Re...
Implicit Feedback Recommendation via Implicit-to-Explicit Ordinal Logistic Re...Implicit Feedback Recommendation via Implicit-to-Explicit Ordinal Logistic Re...
Implicit Feedback Recommendation via Implicit-to-Explicit Ordinal Logistic Re...
 
Research on Recommender Systems: Beyond Ratings and Lists
Research on Recommender Systems: Beyond Ratings and ListsResearch on Recommender Systems: Beyond Ratings and Lists
Research on Recommender Systems: Beyond Ratings and Lists
 
The Effect of Different Set-based Visualizations on User Exploration of Reco...
The Effect of Different Set-based  Visualizations on User Exploration of Reco...The Effect of Different Set-based  Visualizations on User Exploration of Reco...
The Effect of Different Set-based Visualizations on User Exploration of Reco...
 
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
 
Data Fusion for Dealing with the Recommendation Problem
Data Fusion for Dealing with the Recommendation ProblemData Fusion for Dealing with the Recommendation Problem
Data Fusion for Dealing with the Recommendation Problem
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender Systems
 
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vecword2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
 

Ähnlich wie Based on the Author-Topic model, this passage was most likely written by:- Scholkopf_B: Support vector machines and kernels are prominent topics associated with this author. Words like "kernel", "support", and "vector" appear in a topic strongly associated with Scholkopf. - Vapnik_V: SVM and kernels are also closely associated with this author. - Cristianini_N: Kernels and SVMs appear in topics linked to this author as well.So the most likely authors based on the language model are Scholkopf_B, Vapnik_V, and Cristianini_N. The passage discusses kernels and support vectors, core concepts in SVM modeling that these

Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsClaudia Wagner
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015rusbase
 
Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisDavid Gleich
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_faria2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_fariaPaulo Faria
 
The Geometry of Learning
The Geometry of LearningThe Geometry of Learning
The Geometry of Learningfridolin.wild
 
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...Marko Rodriguez
 
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...Aaron Li
 
A Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic ModellingA Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic Modellingcsandit
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGcscpconf
 
Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...National Institute of Informatics
 
Diversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesDiversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesBryan Gummibearehausen
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Seth Grimes
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...Seth Grimes
 
Kdd 2014 tutorial bringing structure to text - chi
Kdd 2014 tutorial   bringing structure to text - chiKdd 2014 tutorial   bringing structure to text - chi
Kdd 2014 tutorial bringing structure to text - chiBarbara Starr
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Sean Golliher
 
Survey of Generative Clustering Models 2008
Survey of Generative Clustering Models 2008Survey of Generative Clustering Models 2008
Survey of Generative Clustering Models 2008Roman Stanchak
 
Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...inscit2006
 

Ähnlich wie Based on the Author-Topic model, this passage was most likely written by:- Scholkopf_B: Support vector machines and kernels are prominent topics associated with this author. Words like "kernel", "support", and "vector" appear in a topic strongly associated with Scholkopf. - Vapnik_V: SVM and kernels are also closely associated with this author. - Cristianini_N: Kernels and SVMs appear in topics linked to this author as well.So the most likely authors based on the language model are Scholkopf_B, Vapnik_V, and Cristianini_N. The passage discusses kernels and support vectors, core concepts in SVM modeling that these (20)

Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic Models
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
 
Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
 
2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_faria2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_faria
 
The Geometry of Learning
The Geometry of LearningThe Geometry of Learning
The Geometry of Learning
 
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
 
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
 
A Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic ModellingA Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic Modelling
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
 
Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...
 
Diversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesDiversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News Stories
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
 
Kdd 2014 tutorial bringing structure to text - chi
Kdd 2014 tutorial   bringing structure to text - chiKdd 2014 tutorial   bringing structure to text - chi
Kdd 2014 tutorial bringing structure to text - chi
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
 
Survey of Generative Clustering Models 2008
Survey of Generative Clustering Models 2008Survey of Generative Clustering Models 2008
Survey of Generative Clustering Models 2008
 
Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...
 

Kürzlich hochgeladen

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 

Kürzlich hochgeladen (20)

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 

Based on the Author-Topic model, this passage was most likely written by:- Scholkopf_B: Support vector machines and kernels are prominent topics associated with this author. Words like "kernel", "support", and "vector" appear in a topic strongly associated with Scholkopf. - Vapnik_V: SVM and kernels are also closely associated with this author. - Cristianini_N: Kernels and SVMs appear in topics linked to this author as well.So the most likely authors based on the language model are Scholkopf_B, Vapnik_V, and Cristianini_N. The passage discusses kernels and support vectors, core concepts in SVM modeling that these

  • 1. LDA on Social Bookmarking Systems: an experiment on CiteUlike Introduction to Natural Language Processing, CS2731 Professor Rebecca Hwa University of Pittsburgh Denis Parra-Santander December 16th 2009 1
  • 2. Outline 2 Outline Topic ModelingJoke (check mood of people…) LDAIntroduction Sorry I’m nervous… Motivation Smart Statement… Definitions Monte Carlo: a great place pass your vacations DIRICHLET: [diʀiˈkleː] Uuuh… Uuuh… Experiments Results END Evaluation method
  • 3. Topic modeling: Evolution  LSA [Deerwester et al. 90]: find “latent” structure or “concepts” in a text corpus.: ◦ Compare texts using a vector-based representation that is learned from a corpus. Relies on SVD (for dimensionality reduction)  PLSA [Hoffman 99] extends LSA by adding the idea of mixture decomposition derived from a latent class model.  LDA [Blei et al. 2004] : extends PLSA by using a generative model, in particular, by adding a Dirichlet prior. 3
  • 4. Document 22 LDA : Generative Model* (I/II) 4 Words: Information About catalog pricing changes 2009 welcome looking hands-on science ideas try kitchen • LDA assumes that each word in the document was generated by a distribution of topics over words. Topic 15: science experiment learning ideas practice information Topic 9: catalog shopping buy internet checkout cart • Paired with an inference mechanism (Gibbs sampling), learns per- document distribution over topics, per-topic distributions over words … *Original slide by Daniel Ramage, Stanford University
  • 5. LDA I/II : Graphical Model 5  Graphical model representations  Compact notation: Cat w1 w2 w3 w4 wn … *Original slide by Roger Levy, UCSD Cat w1 n “generate a word from Cat n times” a “plate”
  • 6. LDA II/II : Graphical Model 6 Nd D zi wi θ (d) φ (j) α β θ (d) ∼ Dirichlet(α) zi ∼ Discrete(θ (d) )φ(j) ∼ Dirichlet(β) wi ∼ Discrete(φ(zi) ) T distribution over topics for each document topic assignment for each word distribution over words for each topic word generated from assigned topic Dirichlet priors *Original slide by Roger Levy, UCSD
  • 7. Learning the parameters 7  Maximum likelihood estimation (EM) ◦ e.g. Hofmann (1999)  Deterministic approximate algorithms ◦ variational EM; Blei, Ng & Jordan (2001; 2003) ◦ expectation propagation; Minka & Lafferty (2002)  Markov chain Monte Carlo ◦ full Gibbs sampler; Pritchard et al. (2000) ◦ collapsed Gibbs sampler; Griffiths & Steyvers (2004) *Original slide by Roger Levy, UCSD
  • 8. My Experiments  IdentifyTopics in a collection of documents from a social bookmarking system (citeULike) [Ramage et al. 2008]  Objective: Clusterise documents by LDA  QUESTION: If the documents have, in addition to title and text, USERTAGS… how can they help/influence/improve topic identification/clustering? 8
  • 9. Tools available  Many implementations of LDA based on Gibbs sampling:  LingPipe (Java)  Mallet (Java)  STMT (Scala) – I chose this one 9
  • 10. The Dataset  Initially ◦ Corpus: ~45k documents, ◦ Definition of 99 topics (queries) ◦ Gold-std : Identification document-topic by expert feedback, defining a ground-truth  But then, then gold-standard and RAM… ◦ Not all documents were relevant ◦ Unable to train model with 45k, 20k and10k  And then, the tags: not all the documents in gold-standard had associated tags (#>2) ◦ Finally:Training with 1.1k documents ◦ Experiments on 212 documents 10
  • 11. Evaluation: Pair-wise precision / recall 11*Original slide by Daniel Ramage, Stanford University
  • 13. Perplexity 13 # of topics 38 52 99 Content Tags 1860.7642 1880.7974 1270.8032 Title + text 2526.7589 2447.5477 2755.1329 Using Stanford Topic Modeling Toolbox (STMT) Training with ~1.1k documents, 80% training, 20% to calculate pp.
  • 14. F1 ( & precision/recall)  F-1, in parenthesis precision and recall 14 # of topics 38 52 99 Tags 0.139 (0.118/0.167) 0.168 (0.187/0.152) 0.215 (0.267/0.18) Title + text 0.1252 (0.122/0.128) 0.157 (0.151/0.163) 0.156 (0.198/0.129)
  • 15. Conclusions  Results are not the same than “motivational” paper, though are consistent with their conclusions (dataset is very domain-specific)  Pending: combining tags and documents, in particular MM-LDA  Importance to NLP: extensions of the model have been used to: ◦ learn syntactic and semantic factors that guide word choice ◦ Identify authorship ◦ Many others () 15
  • 16. … and to finish … Thanks! And… 16
  • 17. “Invent new worlds and watch your word; The adjective, when it doesn’t give life, kills…” Ars Poetica Vicente Huidobro “Inventa nuevos mundos y cuida tu palabra; El adjetivo, cuando no da vida, mata…” 17
  • 18. References Heinrich, G. (2008). Parameter estimation for text analysis,.Technical report, University of Leipzig. Ramage, D., P. Heymann, C. D. Manning, and H. G. Molina (2009). Clustering the tagged web. In WSDM '09: Proceedings of the Second ACM International Conference onWeb Search and Data Mining, NewYork, NY, USA, pp. 54-63. ACM. Steyvers, M. and T. Griffiths (2007). Probabilistic Topic Models. Lawrence Erlbaum Associates. 18
  • 20. LSA: 3 claims (2 match with LDA)  Semantic Information can be derived from a word-document co-ocurrence matrix  Dimensionality reduction is an essential part of this derivation  Words and documents can be represented as points in an Euclidean Space => different than LDA: semantic properties of words and docs are expressed in terms of probabilistic topics 20
  • 21. 21 Parameter estimation and Gibbs Sampling (3 Slides)
  • 22. Inverting the generative model  Maximum likelihood estimation (EM) ◦ e.g. Hofmann (1999)  Deterministic approximate algorithms ◦ variational EM; Blei, Ng & Jordan (2001; 2003) ◦ expectation propagation; Minka & Lafferty (2002)  Markov chain Monte Carlo ◦ full Gibbs sampler; Pritchard et al. (2000) ◦ collapsed Gibbs sampler; Griffiths & Steyvers (2004)
  • 23. The collapsed Gibbs sampler  Using conjugacy of Dirichlet and multinomial distributions, integrate out continuous parameters  Defines a distribution on discrete ensembles z ΦΦΦ= ∫ ∆ dpPP T W )(),|()|( zwzw ΘΘΘ= ∫ ∆ dpPP D T )()|()( zz ∑ = z zzw zzw wz )()|( )()|( )|( PP PP P ∏ ∑ ∏ = +Γ Γ Γ +Γ = T j w j w W w j w n Wn 1 )( )( )( )( )( )( β β β β ∏ ∑ ∏ = +Γ Γ Γ +Γ = D d j d j T j d j n Tn 1 )( )( )( )( )( )( α α α α
  • 24. The collapsed Gibbs sampler  Sample each zi conditioned on z-i  This is nicer than your average Gibbs sampler: ◦ memory: counts can be cached in two sparse matrices ◦ optimization: no special functions, simple arithmetic ◦ the distributions on Φ and Θ are analytic given z and w, and can later be found for each sample α α β β Tn n Wn n zP i i i i i d d j z z w ii + + + + ∝ •• − )( )( )( )( ),|( zw
  • 25. Gibbs Sampling from PTM paper 25
  • 27. Nu U zi wi θ(u) φ (j) α β θ(u)|su=0 ∼ Delta(θ(u-1)) θ(u)|su=1 ∼ Dirichlet(α) zi ∼ Discrete(θ (u) ) φ(j) ∼ Dirichlet(β) wi ∼ Discrete(φ(zi) ) T Extension: a model for meetings su θ(u-1) … (Purver, Kording, Griffiths, & Tenenbaum, 2006)
  • 28. Sample of ICSI meeting corpus (25 meetings)  no it's o_k.  it's it'll work.  well i can do that.  but then i have to end the presentation in the middle so i can go back to open up javabayes.  o_k fine.  here let's see if i can.  alright.  very nice.  is that better.  yeah.  o_k.  uh i'll also get rid of this click to add notes.  o_k. perfect  NEW TOPIC (not supplied to algorithm)  so then the features we decided or we decided we were talked about.  right.  uh the the prosody the discourse verb choice.  you know we had a list of things like to go and to visit and what not.  the landmark-iness of uh.  i knew you'd like that.
  • 29. Topic segmentation applied to meetings Inferred Segmentation Inferred Topics
  • 30. Comparison with human judgments Topics recovered are much more coherent than those found using random segmentation, no segmentation, or an HMM
  • 31. Learning the number of topics  Can use standard Bayes factor methods to evaluate models of different dimensionality ◦ e.g. importance sampling via MCMC  Alternative: nonparametric Bayes ◦ fixed number of topics per document, unbounded number of topics per corpus (Blei, Griffiths, Jordan, & Tenenbaum, 2004) ◦ unbounded number of topics for both (the hierarchical Dirichlet process) (Teh, Jordan, Beal, & Blei, 2004)
  • 32. The Author-Topic model (Rosen-Zvi, Griffiths,Smyth, & Steyvers, 2004) Nd D zi wi θ (a) φ (j) α β θ (a) ∼ Dirichlet(α) zi ∼ Discrete(θ (xi) ) φ(j) ∼ Dirichlet(β) wi ∼ Discrete(φ(zi) ) T xi A xi ∼ Uniform(A(d) ) each author has a distribution over topics the author of each word is chosen uniformly at random
  • 33. Four example topics from NIPS WORD PROB. WORD PROB. WORD PROB. WORD PROB. LIKELIHOOD 0.0539 RECOGNITION 0.0400 REINFORCEMENT 0.0411 KERNEL 0.0683 MIXTURE 0.0509 CHARACTER 0.0336 POLICY 0.0371 SUPPORT 0.0377 EM 0.0470 CHARACTERS 0.0250 ACTION 0.0332 VECTOR 0.0257 DENSITY 0.0398 TANGENT 0.0241 OPTIMAL 0.0208 KERNELS 0.0217 GAUSSIAN 0.0349 HANDWRITTEN 0.0169 ACTIONS 0.0208 SET 0.0205 ESTIMATION 0.0314 DIGITS 0.0159 FUNCTION 0.0178 SVM 0.0204 LOG 0.0263 IMAGE 0.0157 REWARD 0.0165 SPACE 0.0188 MAXIMUM 0.0254 DISTANCE 0.0153 SUTTON 0.0164 MACHINES 0.0168 PARAMETERS 0.0209 DIGIT 0.0149 AGENT 0.0136 REGRESSION 0.0155 ESTIMATE 0.0204 HAND 0.0126 DECISION 0.0118 MARGIN 0.0151 AUTHOR PROB. AUTHOR PROB. AUTHOR PROB. AUTHOR PROB. Tresp_V 0.0333 Simard_P 0.0694 Singh_S 0.1412 Smola_A 0.1033 Singer_Y 0.0281 Martin_G 0.0394 Barto_A 0.0471 Scholkopf_B 0.0730 Jebara_T 0.0207 LeCun_Y 0.0359 Sutton_R 0.0430 Burges_C 0.0489 Ghahramani_Z 0.0196 Denker_J 0.0278 Dayan_P 0.0324 Vapnik_V 0.0431 Ueda_N 0.0170 Henderson_D 0.0256 Parr_R 0.0314 Chapelle_O 0.0210 Jordan_M 0.0150 Revow_M 0.0229 Dietterich_T 0.0231 Cristianini_N 0.0185 Roweis_S 0.0123 Platt_J 0.0226 Tsitsiklis_J 0.0194 Ratsch_G 0.0172 Schuster_M 0.0104 Keeler_J 0.0192 Randlov_J 0.0167 Laskov_P 0.0169 Xu_L 0.0098 Rashid_M 0.0182 Bradtke_S 0.0161 Tipping_M 0.0153 Saul_L 0.0094 Sackinger_E 0.0132 Schwartz_A 0.0142 Sollich_P 0.0141 TOPIC 19 TOPIC 24 TOPIC 29 TOPIC 87
  • 34. Who wrote what? A method1 is described which like the kernel1 trick1 in support1 vector1 machines1 SVMs1 lets us generalize distance1 based2 algorithms to operate in feature1 spaces usually nonlinearly related to the input1 spaceThis is done by identifying a class of kernels1 which can be represented as norm1 based2 distances1 in Hilbert spaces It turns1 out that common kernel1 algorithms such as SVMs1 and kernel1 PCA1 are actually really distance1 based2 algorithms and can be run2 with that class of kernels1 too As well as providing1 a useful new insight1 into how these algorithms work the present2 work can form the basis1 for conceiving new algorithms This paper presents2 a comprehensive approach for model2 based2 diagnosis2 which includes proposals for characterizing and computing2 preferred2 diagnoses2 assuming that the system2 description2 is augmented with a system2 structure2 a directed2 graph2 explicating the interconnections between system2 components2 Specifically we first introduce the notion of a consequence2 which is a syntactically2 unconstrained propositional2 sentence2 that characterizes all consistency2 based2 diagnoses2 and show2 that standard2 characterizations of diagnoses2 such as minimal conflicts1 correspond to syntactic2 variations1 on a consequence2 Second we propose a new syntactic2 variation on the consequence2 known as negation2 normal form NNF and discuss its merits compared to standard variationsThird we introduce a basic algorithm2 for computing consequences in NNF given a structured system2 description We show that if the system2 structure2 does not contain cycles2 then there is always a linear size2 consequence2 in NNF which can be computed in linear time2 For arbitrary1 system2 structures2 we show a precise connection between the complexity2 of computing2 consequences and the topology of the underlying system2 structure2 Finally we present2 an algorithm2 that enumerates2 the preferred2 diagnoses2 characterized by a consequence2 The algorithm2 is shown1 to take linear time2 in the size2 of the consequence2 if the preference criterion1 satisfies some general conditions Written by (1) Scholkopf_B Written by (2) Darwiche_A
  • 35. Analysis of PNAS abstracts  Test topic models with a real database of scientific papers from PNAS  All 28,154 abstracts from 1991-2001  All words occurring in at least five abstracts, not on “stop” list (20,551)  Total of 3,026,970 tokens in corpus (Griffiths & Steyvers, 2004)
  • 36. FORCE SURFACE MOLECULES SOLUTION SURFACES MICROSCOPY WATER FORCES PARTICLES STRENGTH POLYMER IONIC ATOMIC AQUEOUS MOLECULAR PROPERTIES LIQUID SOLUTIONS BEADS MECHANICAL HIV VIRUS INFECTED IMMUNODEFICIENCY CD4 INFECTION HUMAN VIRAL TAT GP120 REPLICATION TYPE ENVELOPE AIDS REV BLOOD CCR5 INDIVIDUALS ENV PERIPHERAL MUSCLE CARDIAC HEART SKELETAL MYOCYTES VENTRICULAR MUSCLES SMOOTH HYPERTROPHY DYSTROPHIN HEARTS CONTRACTION FIBERS FUNCTION TISSUE RAT MYOCARDIAL ISOLATED MYOD FAILURE STRUCTURE ANGSTROM CRYSTAL RESIDUES STRUCTURES STRUCTURAL RESOLUTION HELIX THREE HELICES DETERMINED RAY CONFORMATION HELICAL HYDROPHOBIC SIDE DIMENSIONAL INTERACTIONS MOLECULE SURFACE NEURONS BRAIN CORTEX CORTICAL OLFACTORY NUCLEUS NEURONAL LAYER RAT NUCLEI CEREBELLUM CEREBELLAR LATERAL CEREBRAL LAYERS GRANULE LABELED HIPPOCAMPUS AREAS THALAMIC A selection of topics TUMOR CANCER TUMORS HUMAN CELLS BREAST MELANOMA GROWTH CARCINOMA PROSTATE NORMAL CELL METASTATIC MALIGNANT LUNG CANCERS MICE NUDE PRIMARY OVARIAN
  • 37. Cold topics Hot topics 2 SPECIES GLOBAL CLIMATE CO2 WATER ENVIRONMENTAL YEARS MARINE CARBON DIVERSITY OCEAN EXTINCTION TERRESTRIAL COMMUNITY ABUNDANCE 134 MICE DEFICIENT NORMAL GENE NULL MOUSE TYPE HOMOZYGOUS ROLE KNOCKOUT DEVELOPMENT GENERATED LACKING ANIMALS REDUCED 179 APOPTOSIS DEATH CELL INDUCED BCL CELLS APOPTOTIC CASPASE FAS SURVIVAL PROGRAMMED MEDIATED INDUCTION CERAMIDE EXPRESSION 37 CDNA AMINO SEQUENCE ACID PROTEIN ISOLATED ENCODING CLONED ACIDS IDENTITY CLONE EXPRESSED ENCODES RAT HOMOLOGY 289 KDA PROTEIN PURIFIED MOLECULAR MASS CHROMATOGRAPHY POLYPEPTIDE GEL SDS BAND APPARENT LABELED IDENTIFIED FRACTION DETECTED 75 ANTIBODY ANTIBODIES MONOCLONAL ANTIGEN IGG MAB SPECIFIC EPITOPE HUMAN MABS RECOGNIZED SERA EPITOPES DIRECTED NEUTRALIZING
  • 38. 38 The effect of Alpha and beta as hyperparameters
  • 39. Effects of hyperparameters  α and β control the relative sparsity of Φ and Θ ◦ smaller α, fewer topics per document ◦ smaller β, fewer words per topic  Good assignments z compromise in sparsity logΓ(x) x ∏ ∑ ∏ = +Γ Γ Γ +Γ = T j w j w W w j w n Wn P 1 )( )( )( )( )( )( )|( β β β β zw ∏ ∑ ∏ = +Γ Γ Γ +Γ = D d j d j T j d j n Tn P 1 )( )( )( )( )( )( )( α α α α z
  • 40. Varying α decreasing α increases sparsity