SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Concurrent Inference of Topic Models
and Distributed Vector Representations
Debakar Shamanta1, Sheikh Motahar Naim1, Parang Saraf2, Naren
Ramakrishnan2, and M. Shahriar Hossain2
1
Dept of CS, University of Texas at El Paso, El Paso, TX 79968;
2
Dept of CS, Virginia Tech, Arlington, VA 22203
Presented By:
Parang Saraf
Background - I
•  A document collection comprises of different
elements
–  Some elements are given like words, documents,
labels, etc.
–  Some are hidden (latent) e.g. topics
•  These elements can be represented with local
or distributed features (Neural Networks)
2
Background - II
•  Local vs. Distributed Representations
– Local Representations
•  Assigns each Neuron to represent one entity
•  Ex: PKDD in Porto
•  Representations (concatenation of vocabulary and
color vectors):
–  [ 0/1 , 0/1, 0/1 , 0/1 , 0/1 , 0/1 ]
PKDD in Porto Red Blue Green
–  PKDD : [ 1 0 0 1 0 0]
–  in : [ 0 1 0 0 1 0]
–  Porto: [ 0 0 1 0 0 1]
3
Background - III
•  Local vs. Distributed Representations
– Distributed Representations
•  Each Neuron represent one or more information
•  Ex: PKDD in Porto
•  Representations (concatenation of 2-bit vocabulary
and color vectors):
–  [ 0/1, 0/1, 0/1 , 0/1 ]
–  PKDD : [ 0 1 0 1]
–  in : [ 1 0 1 0]
–  Porto: [ 1 1 1 1]
4
Background - IV
•  Distributed representation has better
generalization capabilities
–  Each feature captures facts from entire dataset
Ref: Hinton, Geoffrey E. "Distributed representations." (1984).
5
Problem Statement - I
•  So far in the literature we could achieve
distributed representation for labeled
elements.
– But what about inferred entities like Topics?
•  Distributed representation for topics are
difficult to find since the topics are not
readily available
•  We present a mechanism to generate
distributed representations of both given
and latent elements
6
Problem Statement - II
•  But why do we need distributed
representations for both given and inferred
representations?
– So that we can represent them in the same
space
– Allows for comparison and all other types of
analysis
7
Word2Vec / Doc2Vec
•  Jump on the bandwagon
8
Word2Vec / Doc2Vec
•  Tomas Mikolov et al. at Google released a
“shallow” Neural Network based model
that generates ‘better’ distributed word
representations by trading model’s
complexity for efficiency
– Requires learning from bigger dataset
– Trained on 100 billion words from Google
News Dataset
– Gensim has a python version of the code
– You can train on your own data
9
Word2Vec / Doc2Vec Insights
10
Ref: Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013).
Word2Vec / Doc2Vec Insights
11
Ref: Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013).
Works only
with given
entities and
not with
inferred
ones
Proposed Solution
•  We can do all of this
PLUS
•  Generate similar meaningful
representations for inferred entities
– For example ‘topics’
– In the same space as words, documents,
labels etc.
12
Proposed Solution
•  In this paper we propose a framework that
1.  Determines topics of each document using neural
network
2.  Simultaneously computes distributed representation
of topics in the same space as documents and words
3.  Generates the distributed vectors using a smaller
number of dimensions than the actual text feature
space.
13
Proposed Framework
14
Forward
propagation
Ξ"
g"
d"
lt""
ld"
ls"W1"
W2"
Topic Generation
Module
!!!!!!!!!!!!!!
!!
!!
!!
1"
k"
!!!!!!!!!!!!!!
!!!!!!!!!!!!!!
t+p"
t+p"
t"
!!
!!
!!
!!
!!
!!
!"#!
!"#!
!"#!
Distributed Vector
Generation
Update vectors
using BP
Doc vectors,
Topic vectors,!!!!!!!!!!!!!!
!!!!!!!!!!!!!!
Word vectors,!"#!
Evaluation Strategies
•  Q1: Can our framework establish relationships
between distributed represents of topics and
documents?
•  Q2: Are the generated topic vectors expressive
enough to capture similarity between topics and to
distinguish difference between them?
•  Q3: How do our topic modeling results compare
with the results produced by other topic modeling
algorithms?
•  Q4: Do the generated topics bring documents with
similar domain-specific themes together?
•  Q5: How does the runtime of the proposed
framework scale?
15
Datasets
16
Evaluation Question 1
•  Question: Can our framework establish
relationships between distributed
representations of topics and documents?
•  Topic–document relationships should be
similar for two documents of the same
topic as compared to documents from
different topics
17
Evaluation Question 1
•  Given a topic vector Ti of topic ti, and a set of document
vectors Dtj that are assigned a topic tj, we compute
alignment using the following formula:
18
Evaluation Question 1
19
Evaluation Question 2
•  Are the generated topic vectors expressive
enough to capture similarity between
topics and to distinguish differences
between them?
– Take the generated topic vectors and do
hierarchical clustering on them.
•  Similar topics should appear close-by
20
Evaluation Question 2
21
Evaluation Question 3
•  How do our topic modeling results
compare with the results produced by
other topic modeling algorithms?
– Compare with LDA, and
– NTM : Close resemblance to our work. Works
with pre-computed vectors
Ref: Cao, Ziqiang, et al. "A Novel Neural Topic Model and Its Supervised Extension."
Twenty-Ninth AAAI Conference on Artificial Intelligence. 2015.
22
Evaluation Question 3
•  Evaluation methods used to evaluate
clustering results when ground truth labels
are available:
– Adjusted Rand Index (ARI)
•  Estimates the agreement between two topic
assignments
•  Higher values are better
– Normalized Mutual Information (NMI)
•  Estimates the agreement between two topic
assignments
•  Higher values are better
23
Evaluation Question 3
24
Evaluation Question 3
•  Evaluation Methods used to evaluate
clustering results when ground truth labels
are not available:
– Dunn Index (DI):
•  measures the separation between groups of
vectors
•  Larger values are better
– Average Silhouette Coefficient (ASC)
•  Measures both cohesion and separation of groups
•  Higher values are better
25
Evaluation Question 3
26
NTM Missing: Both Dunn Index and Average Silhouette coefficient require
document vectors but NTM doesn’t use any document vectors, rather it uses
only pre-computed word vectors
Evaluation Question 4
•  Do the generated topics bring documents
with similar domain-specific themes
together?
•  Use Pub-Med dataset that comes with
MeSH terms
– It is expected that two documents on same
topic will have more common MeSH terms as
compared to documents on different topics
27
Evaluation Question 4
28
Pick top n meSH terms for two documents:
1.  Similar topic documents: common meSH terms increase with larger n
2.  Different topic documents: higher absence of overlapping terms with
smaller n
Evaluation Question 5
•  How does the runtime of the proposed
framework scale with the size of the
– distributed representations
– Increasing number of documents
– Increasing number of topics
29
Evaluation Question 5
30
Linear Increase
In Summary
•  Framework generates distributed
representations for both given as well as
inferred entities
•  Generating representations in the same
hyperspace for both given and hidden
entities is crucial:
– Opens door for performing different types of
analysis
31
Take it for a Spin !
•  Data and software source codes are
available here:
http://dal.cs.utep.edu/projects/tvec/
32
Thank You
Presented By:
Parang Saraf
@parangsaraf
See you at the Poster Session in the Evening

Weitere ähnliche Inhalte

Was ist angesagt?

Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Primya Tamil
 
Exploring Session Context using Distributed Representations of Queries and Re...
Exploring Session Context using Distributed Representations of Queries and Re...Exploring Session Context using Distributed Representations of Queries and Re...
Exploring Session Context using Distributed Representations of Queries and Re...Bhaskar Mitra
 
Text clustering
Text clusteringText clustering
Text clusteringKU Leuven
 
A Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia ArticlesA Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia Articlesijma
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Bhaskar Mitra
 
Tdm recent trends
Tdm recent trendsTdm recent trends
Tdm recent trendsKU Leuven
 
Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Sebastian Ruder
 
G04124041046
G04124041046G04124041046
G04124041046IOSR-JEN
 
Probabilistic retrieval model
Probabilistic retrieval modelProbabilistic retrieval model
Probabilistic retrieval modelbaradhimarch81
 
Construction of Keyword Extraction using Statistical Approaches and Document ...
Construction of Keyword Extraction using Statistical Approaches and Document ...Construction of Keyword Extraction using Statistical Approaches and Document ...
Construction of Keyword Extraction using Statistical Approaches and Document ...IJERA Editor
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal
 
Semantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with IdiomsSemantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methodsrajshreemuthiah
 
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASESEFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASESIJCSEIT Journal
 
Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)KU Leuven
 
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...IRJET Journal
 
Textual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisTextual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisEditor IJMTER
 
Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Editor IJARCET
 

Was ist angesagt? (18)

Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
 
Exploring Session Context using Distributed Representations of Queries and Re...
Exploring Session Context using Distributed Representations of Queries and Re...Exploring Session Context using Distributed Representations of Queries and Re...
Exploring Session Context using Distributed Representations of Queries and Re...
 
Text clustering
Text clusteringText clustering
Text clustering
 
A Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia ArticlesA Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia Articles
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
 
Tdm recent trends
Tdm recent trendsTdm recent trends
Tdm recent trends
 
Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...
 
G04124041046
G04124041046G04124041046
G04124041046
 
Probabilistic retrieval model
Probabilistic retrieval modelProbabilistic retrieval model
Probabilistic retrieval model
 
Construction of Keyword Extraction using Statistical Approaches and Document ...
Construction of Keyword Extraction using Statistical Approaches and Document ...Construction of Keyword Extraction using Statistical Approaches and Document ...
Construction of Keyword Extraction using Statistical Approaches and Document ...
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
 
Semantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with IdiomsSemantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with Idioms
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methods
 
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASESEFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES
 
Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)
 
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
 
Textual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisTextual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative Analysis
 
Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973
 

Andere mochten auch

Email and Network Analyzer
Email and Network AnalyzerEmail and Network Analyzer
Email and Network AnalyzerParang Saraf
 
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Big Data Spain
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsPython for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsRoelof Pieters
 
Paragraph Writing
Paragraph WritingParagraph Writing
Paragraph Writingm nagaRAJU
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceLukas Masuch
 

Andere mochten auch (6)

Email and Network Analyzer
Email and Network AnalyzerEmail and Network Analyzer
Email and Network Analyzer
 
PyData2015
PyData2015PyData2015
PyData2015
 
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsPython for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
 
Paragraph Writing
Paragraph WritingParagraph Writing
Paragraph Writing
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial Intelligence
 

Ähnlich wie Slides: Concurrent Inference of Topic Models and Distributed Vector Representations

Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the WebRinke Hoekstra
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval GESIS
 
Understanding Information Architecture
Understanding Information ArchitectureUnderstanding Information Architecture
Understanding Information ArchitectureScott Abel
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Lucidworks
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingSimon Hughes
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesOpenSource Connections
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectorsSimon Hughes
 
Qda ces 2013 toronto workshop
Qda ces 2013 toronto workshopQda ces 2013 toronto workshop
Qda ces 2013 toronto workshopCesToronto
 
Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory acijjournal
 
Data Science Keys to Open Up OpenNASA Datasets
Data Science Keys to Open Up OpenNASA DatasetsData Science Keys to Open Up OpenNASA Datasets
Data Science Keys to Open Up OpenNASA DatasetsPyData
 
Data Science Keys to Open Up OpenNASA Datasets - PyData New York 2017
Data Science Keys to Open Up OpenNASA Datasets - PyData New York 2017Data Science Keys to Open Up OpenNASA Datasets - PyData New York 2017
Data Science Keys to Open Up OpenNASA Datasets - PyData New York 2017Noemi Derzsy
 
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...Sergey Sosnovsky
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksLeonardo Di Donato
 
A Benchmark for the Use of Topic Models for Text Visualization Tasks - Online...
A Benchmark for the Use of Topic Models for Text Visualization Tasks - Online...A Benchmark for the Use of Topic Models for Text Visualization Tasks - Online...
A Benchmark for the Use of Topic Models for Text Visualization Tasks - Online...Matthias Trapp
 
Dr.saleem gul assignment summary
Dr.saleem gul assignment summaryDr.saleem gul assignment summary
Dr.saleem gul assignment summaryJaved Riza
 
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...Damiano Spina
 

Ähnlich wie Slides: Concurrent Inference of Topic Models and Distributed Vector Representations (20)

Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval
 
Understanding Information Architecture
Understanding Information ArchitectureUnderstanding Information Architecture
Understanding Information Architecture
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon Hughes
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectors
 
Qda ces 2013 toronto workshop
Qda ces 2013 toronto workshopQda ces 2013 toronto workshop
Qda ces 2013 toronto workshop
 
Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory
 
Data Science Keys to Open Up OpenNASA Datasets
Data Science Keys to Open Up OpenNASA DatasetsData Science Keys to Open Up OpenNASA Datasets
Data Science Keys to Open Up OpenNASA Datasets
 
Data Science Keys to Open Up OpenNASA Datasets - PyData New York 2017
Data Science Keys to Open Up OpenNASA Datasets - PyData New York 2017Data Science Keys to Open Up OpenNASA Datasets - PyData New York 2017
Data Science Keys to Open Up OpenNASA Datasets - PyData New York 2017
 
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic ...
 
ESWC 2014 Tutorial part 3
ESWC 2014 Tutorial part 3ESWC 2014 Tutorial part 3
ESWC 2014 Tutorial part 3
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
 
A Benchmark for the Use of Topic Models for Text Visualization Tasks - Online...
A Benchmark for the Use of Topic Models for Text Visualization Tasks - Online...A Benchmark for the Use of Topic Models for Text Visualization Tasks - Online...
A Benchmark for the Use of Topic Models for Text Visualization Tasks - Online...
 
Dr.saleem gul assignment summary
Dr.saleem gul assignment summaryDr.saleem gul assignment summary
Dr.saleem gul assignment summary
 
2012.10 - DDI Lifecycle - Moving Forward
2012.10 - DDI Lifecycle - Moving Forward2012.10 - DDI Lifecycle - Moving Forward
2012.10 - DDI Lifecycle - Moving Forward
 
Topic modelling
Topic modellingTopic modelling
Topic modelling
 
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
 

Mehr von Parang Saraf

Slides: Safeguarding Abila through Multiple Data Perspectives
Slides: Safeguarding Abila through Multiple Data PerspectivesSlides: Safeguarding Abila through Multiple Data Perspectives
Slides: Safeguarding Abila through Multiple Data PerspectivesParang Saraf
 
Slides: Safeguarding Abila: Real-time Streaming Analysis
Slides: Safeguarding Abila: Real-time Streaming AnalysisSlides: Safeguarding Abila: Real-time Streaming Analysis
Slides: Safeguarding Abila: Real-time Streaming AnalysisParang Saraf
 
Slides: Safeguarding Abila: Spatio-Temporal Activity Modeling
Slides: Safeguarding Abila: Spatio-Temporal Activity ModelingSlides: Safeguarding Abila: Spatio-Temporal Activity Modeling
Slides: Safeguarding Abila: Spatio-Temporal Activity ModelingParang Saraf
 
Safeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksSafeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksParang Saraf
 
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest EventsEMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest EventsParang Saraf
 
EMBERS at 4 years: Experiences operating an Open Source Indicators Forecastin...
EMBERS at 4 years: Experiences operating an Open Source Indicators Forecastin...EMBERS at 4 years: Experiences operating an Open Source Indicators Forecastin...
EMBERS at 4 years: Experiences operating an Open Source Indicators Forecastin...Parang Saraf
 
Slides: Forex-Foreteller: Currency Trend Modeling using News Articles
Slides: Forex-Foreteller: Currency Trend Modeling using News ArticlesSlides: Forex-Foreteller: Currency Trend Modeling using News Articles
Slides: Forex-Foreteller: Currency Trend Modeling using News ArticlesParang Saraf
 
Slides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on TwitterSlides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on TwitterParang Saraf
 
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest EventsEMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest EventsParang Saraf
 
DMAP: Data Aggregation and Presentation Framework
DMAP: Data Aggregation and Presentation FrameworkDMAP: Data Aggregation and Presentation Framework
DMAP: Data Aggregation and Presentation FrameworkParang Saraf
 
Bayesian Model Fusion for Forecasting Civil Unrest
Bayesian Model Fusion for Forecasting Civil UnrestBayesian Model Fusion for Forecasting Civil Unrest
Bayesian Model Fusion for Forecasting Civil UnrestParang Saraf
 
‘Beating the News’ with EMBERS: Forecasting Civil Unrest using Open Source In...
‘Beating the News’ with EMBERS: Forecasting Civil Unrest using Open Source In...‘Beating the News’ with EMBERS: Forecasting Civil Unrest using Open Source In...
‘Beating the News’ with EMBERS: Forecasting Civil Unrest using Open Source In...Parang Saraf
 
Safeguarding Abila through Multiple Data Perspectives
Safeguarding Abila through Multiple Data PerspectivesSafeguarding Abila through Multiple Data Perspectives
Safeguarding Abila through Multiple Data PerspectivesParang Saraf
 
Safeguarding Abila: Real-time Streaming Analysis
Safeguarding Abila: Real-time Streaming AnalysisSafeguarding Abila: Real-time Streaming Analysis
Safeguarding Abila: Real-time Streaming AnalysisParang Saraf
 
Safeguarding Abila: Spatio-Temporal Activity Modeling
Safeguarding Abila: Spatio-Temporal Activity ModelingSafeguarding Abila: Spatio-Temporal Activity Modeling
Safeguarding Abila: Spatio-Temporal Activity ModelingParang Saraf
 
Safeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksSafeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksParang Saraf
 
Forex-Foreteller: Currency Trend Modeling using News Articles
Forex-Foreteller: Currency Trend Modeling using News ArticlesForex-Foreteller: Currency Trend Modeling using News Articles
Forex-Foreteller: Currency Trend Modeling using News ArticlesParang Saraf
 
Epidemiological Modeling of News and Rumors on Twitter
Epidemiological Modeling of News and Rumors on TwitterEpidemiological Modeling of News and Rumors on Twitter
Epidemiological Modeling of News and Rumors on TwitterParang Saraf
 

Mehr von Parang Saraf (20)

Slides: Safeguarding Abila through Multiple Data Perspectives
Slides: Safeguarding Abila through Multiple Data PerspectivesSlides: Safeguarding Abila through Multiple Data Perspectives
Slides: Safeguarding Abila through Multiple Data Perspectives
 
Slides: Safeguarding Abila: Real-time Streaming Analysis
Slides: Safeguarding Abila: Real-time Streaming AnalysisSlides: Safeguarding Abila: Real-time Streaming Analysis
Slides: Safeguarding Abila: Real-time Streaming Analysis
 
Slides: Safeguarding Abila: Spatio-Temporal Activity Modeling
Slides: Safeguarding Abila: Spatio-Temporal Activity ModelingSlides: Safeguarding Abila: Spatio-Temporal Activity Modeling
Slides: Safeguarding Abila: Spatio-Temporal Activity Modeling
 
Safeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksSafeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist Networks
 
News Analyzer
News AnalyzerNews Analyzer
News Analyzer
 
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest EventsEMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
 
EMBERS at 4 years: Experiences operating an Open Source Indicators Forecastin...
EMBERS at 4 years: Experiences operating an Open Source Indicators Forecastin...EMBERS at 4 years: Experiences operating an Open Source Indicators Forecastin...
EMBERS at 4 years: Experiences operating an Open Source Indicators Forecastin...
 
Slides: Forex-Foreteller: Currency Trend Modeling using News Articles
Slides: Forex-Foreteller: Currency Trend Modeling using News ArticlesSlides: Forex-Foreteller: Currency Trend Modeling using News Articles
Slides: Forex-Foreteller: Currency Trend Modeling using News Articles
 
Slides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on TwitterSlides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on Twitter
 
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest EventsEMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
 
DMAP: Data Aggregation and Presentation Framework
DMAP: Data Aggregation and Presentation FrameworkDMAP: Data Aggregation and Presentation Framework
DMAP: Data Aggregation and Presentation Framework
 
EMBERS Posters
EMBERS PostersEMBERS Posters
EMBERS Posters
 
Bayesian Model Fusion for Forecasting Civil Unrest
Bayesian Model Fusion for Forecasting Civil UnrestBayesian Model Fusion for Forecasting Civil Unrest
Bayesian Model Fusion for Forecasting Civil Unrest
 
‘Beating the News’ with EMBERS: Forecasting Civil Unrest using Open Source In...
‘Beating the News’ with EMBERS: Forecasting Civil Unrest using Open Source In...‘Beating the News’ with EMBERS: Forecasting Civil Unrest using Open Source In...
‘Beating the News’ with EMBERS: Forecasting Civil Unrest using Open Source In...
 
Safeguarding Abila through Multiple Data Perspectives
Safeguarding Abila through Multiple Data PerspectivesSafeguarding Abila through Multiple Data Perspectives
Safeguarding Abila through Multiple Data Perspectives
 
Safeguarding Abila: Real-time Streaming Analysis
Safeguarding Abila: Real-time Streaming AnalysisSafeguarding Abila: Real-time Streaming Analysis
Safeguarding Abila: Real-time Streaming Analysis
 
Safeguarding Abila: Spatio-Temporal Activity Modeling
Safeguarding Abila: Spatio-Temporal Activity ModelingSafeguarding Abila: Spatio-Temporal Activity Modeling
Safeguarding Abila: Spatio-Temporal Activity Modeling
 
Safeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksSafeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist Networks
 
Forex-Foreteller: Currency Trend Modeling using News Articles
Forex-Foreteller: Currency Trend Modeling using News ArticlesForex-Foreteller: Currency Trend Modeling using News Articles
Forex-Foreteller: Currency Trend Modeling using News Articles
 
Epidemiological Modeling of News and Rumors on Twitter
Epidemiological Modeling of News and Rumors on TwitterEpidemiological Modeling of News and Rumors on Twitter
Epidemiological Modeling of News and Rumors on Twitter
 

Kürzlich hochgeladen

一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxVivek487417
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制vexqp
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制vexqp
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 

Kürzlich hochgeladen (20)

一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 

Slides: Concurrent Inference of Topic Models and Distributed Vector Representations

  • 1. Concurrent Inference of Topic Models and Distributed Vector Representations Debakar Shamanta1, Sheikh Motahar Naim1, Parang Saraf2, Naren Ramakrishnan2, and M. Shahriar Hossain2 1 Dept of CS, University of Texas at El Paso, El Paso, TX 79968; 2 Dept of CS, Virginia Tech, Arlington, VA 22203 Presented By: Parang Saraf
  • 2. Background - I •  A document collection comprises of different elements –  Some elements are given like words, documents, labels, etc. –  Some are hidden (latent) e.g. topics •  These elements can be represented with local or distributed features (Neural Networks) 2
  • 3. Background - II •  Local vs. Distributed Representations – Local Representations •  Assigns each Neuron to represent one entity •  Ex: PKDD in Porto •  Representations (concatenation of vocabulary and color vectors): –  [ 0/1 , 0/1, 0/1 , 0/1 , 0/1 , 0/1 ] PKDD in Porto Red Blue Green –  PKDD : [ 1 0 0 1 0 0] –  in : [ 0 1 0 0 1 0] –  Porto: [ 0 0 1 0 0 1] 3
  • 4. Background - III •  Local vs. Distributed Representations – Distributed Representations •  Each Neuron represent one or more information •  Ex: PKDD in Porto •  Representations (concatenation of 2-bit vocabulary and color vectors): –  [ 0/1, 0/1, 0/1 , 0/1 ] –  PKDD : [ 0 1 0 1] –  in : [ 1 0 1 0] –  Porto: [ 1 1 1 1] 4
  • 5. Background - IV •  Distributed representation has better generalization capabilities –  Each feature captures facts from entire dataset Ref: Hinton, Geoffrey E. "Distributed representations." (1984). 5
  • 6. Problem Statement - I •  So far in the literature we could achieve distributed representation for labeled elements. – But what about inferred entities like Topics? •  Distributed representation for topics are difficult to find since the topics are not readily available •  We present a mechanism to generate distributed representations of both given and latent elements 6
  • 7. Problem Statement - II •  But why do we need distributed representations for both given and inferred representations? – So that we can represent them in the same space – Allows for comparison and all other types of analysis 7
  • 8. Word2Vec / Doc2Vec •  Jump on the bandwagon 8
  • 9. Word2Vec / Doc2Vec •  Tomas Mikolov et al. at Google released a “shallow” Neural Network based model that generates ‘better’ distributed word representations by trading model’s complexity for efficiency – Requires learning from bigger dataset – Trained on 100 billion words from Google News Dataset – Gensim has a python version of the code – You can train on your own data 9
  • 10. Word2Vec / Doc2Vec Insights 10 Ref: Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013).
  • 11. Word2Vec / Doc2Vec Insights 11 Ref: Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013). Works only with given entities and not with inferred ones
  • 12. Proposed Solution •  We can do all of this PLUS •  Generate similar meaningful representations for inferred entities – For example ‘topics’ – In the same space as words, documents, labels etc. 12
  • 13. Proposed Solution •  In this paper we propose a framework that 1.  Determines topics of each document using neural network 2.  Simultaneously computes distributed representation of topics in the same space as documents and words 3.  Generates the distributed vectors using a smaller number of dimensions than the actual text feature space. 13
  • 15. Evaluation Strategies •  Q1: Can our framework establish relationships between distributed represents of topics and documents? •  Q2: Are the generated topic vectors expressive enough to capture similarity between topics and to distinguish difference between them? •  Q3: How do our topic modeling results compare with the results produced by other topic modeling algorithms? •  Q4: Do the generated topics bring documents with similar domain-specific themes together? •  Q5: How does the runtime of the proposed framework scale? 15
  • 17. Evaluation Question 1 •  Question: Can our framework establish relationships between distributed representations of topics and documents? •  Topic–document relationships should be similar for two documents of the same topic as compared to documents from different topics 17
  • 18. Evaluation Question 1 •  Given a topic vector Ti of topic ti, and a set of document vectors Dtj that are assigned a topic tj, we compute alignment using the following formula: 18
  • 20. Evaluation Question 2 •  Are the generated topic vectors expressive enough to capture similarity between topics and to distinguish differences between them? – Take the generated topic vectors and do hierarchical clustering on them. •  Similar topics should appear close-by 20
  • 22. Evaluation Question 3 •  How do our topic modeling results compare with the results produced by other topic modeling algorithms? – Compare with LDA, and – NTM : Close resemblance to our work. Works with pre-computed vectors Ref: Cao, Ziqiang, et al. "A Novel Neural Topic Model and Its Supervised Extension." Twenty-Ninth AAAI Conference on Artificial Intelligence. 2015. 22
  • 23. Evaluation Question 3 •  Evaluation methods used to evaluate clustering results when ground truth labels are available: – Adjusted Rand Index (ARI) •  Estimates the agreement between two topic assignments •  Higher values are better – Normalized Mutual Information (NMI) •  Estimates the agreement between two topic assignments •  Higher values are better 23
  • 25. Evaluation Question 3 •  Evaluation Methods used to evaluate clustering results when ground truth labels are not available: – Dunn Index (DI): •  measures the separation between groups of vectors •  Larger values are better – Average Silhouette Coefficient (ASC) •  Measures both cohesion and separation of groups •  Higher values are better 25
  • 26. Evaluation Question 3 26 NTM Missing: Both Dunn Index and Average Silhouette coefficient require document vectors but NTM doesn’t use any document vectors, rather it uses only pre-computed word vectors
  • 27. Evaluation Question 4 •  Do the generated topics bring documents with similar domain-specific themes together? •  Use Pub-Med dataset that comes with MeSH terms – It is expected that two documents on same topic will have more common MeSH terms as compared to documents on different topics 27
  • 28. Evaluation Question 4 28 Pick top n meSH terms for two documents: 1.  Similar topic documents: common meSH terms increase with larger n 2.  Different topic documents: higher absence of overlapping terms with smaller n
  • 29. Evaluation Question 5 •  How does the runtime of the proposed framework scale with the size of the – distributed representations – Increasing number of documents – Increasing number of topics 29
  • 31. In Summary •  Framework generates distributed representations for both given as well as inferred entities •  Generating representations in the same hyperspace for both given and hidden entities is crucial: – Opens door for performing different types of analysis 31
  • 32. Take it for a Spin ! •  Data and software source codes are available here: http://dal.cs.utep.edu/projects/tvec/ 32
  • 33. Thank You Presented By: Parang Saraf @parangsaraf See you at the Poster Session in the Evening