SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
LDAvis:
A method for visualizing and interpreting topics
Carson Sievert Iowa State University
Kenneth E. Shirley AT&T Labs Research
Illvi2014
• LDAvis, a web-based interactive visualization of topics
estimated using Latent Dirichlet Allocation that is built using
a combination of R and D3.
• We introduce LDAvis that attempts to answer a few basic
questions about a fitted topic model:
I . I N T R O D U C T I O N
A. What is the meaning of each topic?,
B. How prevalent is each topic?,
C. How do the topics relate to each other?
I . I N T R O D U C T I O N
I I . R E L A T E D W O R K
LDA Turbo-
Topics
liftPMI
I I . R E L A T E D W O R K
LDA
Latent Dirichlet allocation (LDA)
is an example of a topic model and
was first presented as a graphical
model for topic discovery by
David Blei, Andrew Ng, and
Michael I. Jordan in 2003.
I I . R E L A T E D W O R K
Turbo-
Topics
Blei and Lafferty (2009) developed
“Turbo Topics”, a method of
identifying n-grams within LDA
inferred topics。
the resulting output is still simply a
ranked list containing a mixture of
terms and n-grams。
I I . R E L A T E D W O R K
Turbo-
Topics
I I . R E L A T E D W O R K
Pointwise Mutual Information(PMI) :
PMI
Independence : P(x , y) = P(x) × P(y)
PMI = log1 = 0
I I . R E L A T E D W O R K
lift
Lift:
Φ kw denote the probability of term
w ∈ {1, ..., V } for topic k ∈ {1, ..., K},
where V denotes the number of
terms in the vocabulary, and let pw
denote the marginal probability of
term w in the corpus.
I I . T O P I C M O D E L V I S U A L I Z A T I O N
• A number of visualization systems for topic models have been
developed in recent years.
• But, the visualization elements are limited to barcharts or
word clouds of term probabilities for each topic, pie charts of
topic probabilities for each document, and/or various
barcharts or scatterplots related to document metadata.
I I . T O P I C M O D E L V I S U A L I Z A T I O N
• Chuang et al. (2012b) develop such a tool, called “Termite”,
which visualizes the set of topic term distributions estimated
in LDA using a matrix layout.
• http://vis.stanford.edu/papers/termite
I I . T O P I C M O D E L V I S U A L I Z A T I O N
• 在代表主題內容的關鍵字選擇上,Termite定義了saliency 變數: saliency(w)
= P(w)× distinctiveness(w)。
• w 代表一個關鍵字; P(w)代表w的頻率;distinctiveness(w)代表w在主題間
的差異性,包含w的主題越多,distinctiveness(w)值越低。
• 在關鍵詞排列上,Termite 把經常連續出現的詞,如social和networks 排在
一起以便增強語義讓使用者較好理解。
I I I . R E L E V A N C E O F T E R M S T O T O P I C S
• Here we define relevance, our method for ranking terms
within topics, and we describe the results of a user study to
learn an optimal tuning parameter in the computation of
relevance.
• We define the relevance of term w to topic k given a weight
parameter λ (where 0 ≦ λ ≦ 1).
I I I . R E L E V A N C E O F T E R M S T O T O P I C S
• “λ” determines the weight given to the probability of term w
under topic k relative to its lift.
• Setting “λ” = 1 results in the familiar ranking of terms in
decreasing order of their topic-specific probability, and
setting “λ”= 0 ranks terms solely by their lift.
I I I . R E L E V A N C E O F T E R M S T O T O P I C S
• We fit a 50-topic model to the 20 Newsgroups data and each
term in the vocabulary (which has size V = 22, 524) for a
given topic.
• Figure shows this plot for Topic 29, which documents posted
to the “Motorcycles” Newsgroup, but also from documents
posted to the “Automobiles” Newsgroup and the “Electronics”
Newsgroup.
I I I . U S E R S T U D Y

13, 695 documents
20 Newsgroups
I I I . U S E R S T U D Y
• Some of the LDA-inferred topics occurred almost exclusively
(> 90% of occurrences) from a single Newsgroup, such as
Topic 38. which was came from the documents posted to the
“Medicine” (or “sci.med”) Newsgroup.
• Other topics occurred in a wide variety of Newsgroups. One
would expect these “spread-out” topics to be harder to
interpret than the “pure” topics like Topic 38.
I I I . U S E R S T U D Y
• we recruited 29 subjects among our colleagues (research
scientists at AT&T Labs with moderate familiarity with text
mining techniques and topic models).
• each subject completed an online experiment consisting of 50
tasks k (for k ∈ {1, ..., 50}).
• Task k was to read a list of five terms, ranked from 1-5 in
order of relevance to topic k, where “λ” ∈ (0, 1) was randomly
sampled to compute relevance.
I I I . U S E R S T U D Y
• we expected the proportion of correct responses to be roughly
1/3 no matter the value of λ used to compute relevance.
• In fact, seven of the topics were correctly identified by all 29
users, and one topic was incorrectly by all users.
• we estimated a topic specific intercept term to control the
difficulty of the topic (not just due to its tokens variety, but
also to account for the inherent familiarity of each topic to
our subject.)
I I I . U S E R S T U D Y
• The estimated effects of λ and λ² was statistically significant
(χ² p-value = 0.018).
• there was roughly a 67% baseline probability of correct
identification. As Figure 3 shows, for these topics, the
“optimal” value of λ was about 0.6.
• λ:0 ≒ 53 % and λ:1 ≒ 63 %
• We view this as evidence that where λ < 1 can really improve
topic interpretability.
I I I . U S E R S T U D Y
I V . T H E L D A V I S S Y S T E M
Installation: pip install pyldavis
I V . T H E L D A V I S S Y S T E M
http://nbviewer.jupyter.org/github/bmabey/pyLDAvis/blob/
master/notebooks/pyLDAvis_overview.ipynb
I V . T H E L D A V I S S Y S T E M
http://nbviewer.jupyter.org/github/effytseng/dh_ldavis/blob/ma
ster/LDA-practice-1.html#topic=0&lambda=1&term=
http://nbviewer.jupyter.org/github/effytseng/dh_ldavis/blob/ma
ster/python-lldavis_test1-24-2.html
http://nbviewer.jupyter.org/github/bmabey/pyLDAvis/blob/mast
er/notebooks/Gensim%20Newsgroup.ipynb
.用 gensim 處理文本:
.自由中國 ldavis 圖表:
.自由中國 l-ldavis 圖表:
I V . T H E L D A V I S S Y S T E M
• the areas of the circles are proportional to the relative
prevalences of the topics in the corpus
• The default for computing inter-topic distances is Jensen-
Shannon divergence.
• The default for scaling the set of inter-topic distances defaults
to Principal Components.
I V . T H E L D A V I S S Y S T E M
I V . T H E L D A V I S S Y S T E M
I V . T H E L D A V I S S Y S T E M
V . D I S C U S S I O N
A. What is the meaning of each topic?,
B. How prevalent is each topic?,
C. How do the topics relate to each other?
V . D I S C U S S I O N
• For future work, we anticipate performing a larger user study to
further understand how to facilitate topic interpretation in fitted
LDA models
• In addition to relevance. The need to visualize correlations
between topics can provide insight into what is happening on the
document level without actually displaying entire documents.
• Last, we seek a solution to the problem of visualizing a large
number of topics (say, from 100 - 500 topics) in a compact way.

Weitere ähnliche Inhalte

Was ist angesagt?

Computing with Directed Labeled Graphs
Computing with Directed Labeled GraphsComputing with Directed Labeled Graphs
Computing with Directed Labeled GraphsMarko Rodriguez
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationMarco Righini
 
Interactive Latent Dirichlet Allocation
Interactive Latent Dirichlet AllocationInteractive Latent Dirichlet Allocation
Interactive Latent Dirichlet AllocationQuentin Pleplé
 
Bytewise Approximate Match: Theory, Algorithms and Applications
Bytewise Approximate Match:  Theory, Algorithms and ApplicationsBytewise Approximate Match:  Theory, Algorithms and Applications
Bytewise Approximate Match: Theory, Algorithms and ApplicationsLiwei Ren任力偉
 
The Maze of Deletion in Ontology Stream Reasoning
The Maze of Deletion in Ontology Stream Reasoning The Maze of Deletion in Ontology Stream Reasoning
The Maze of Deletion in Ontology Stream Reasoning Jeff Z. Pan
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information RetrievalNik Spirin
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksLeonardo Di Donato
 
Transfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine LearningTransfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine LearningSebastian Ruder
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentationSoojung Hong
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introductionYueshen Xu
 
Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical I...
Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical I...Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical I...
Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical I...Lukas Galke
 
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge GraphJoint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge GraphFedorNikolaev
 
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...Johann Petrak
 
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...Jeff Z. Pan
 
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...Isabelle Augenstein
 
Supporting Springer Nature Editors by means of Semantic Technologies
Supporting Springer Nature Editors by means of Semantic TechnologiesSupporting Springer Nature Editors by means of Semantic Technologies
Supporting Springer Nature Editors by means of Semantic TechnologiesFrancesco Osborne
 

Was ist angesagt? (20)

Computing with Directed Labeled Graphs
Computing with Directed Labeled GraphsComputing with Directed Labeled Graphs
Computing with Directed Labeled Graphs
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
Interactive Latent Dirichlet Allocation
Interactive Latent Dirichlet AllocationInteractive Latent Dirichlet Allocation
Interactive Latent Dirichlet Allocation
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Bytewise Approximate Match: Theory, Algorithms and Applications
Bytewise Approximate Match:  Theory, Algorithms and ApplicationsBytewise Approximate Match:  Theory, Algorithms and Applications
Bytewise Approximate Match: Theory, Algorithms and Applications
 
Author Topic Model
Author Topic ModelAuthor Topic Model
Author Topic Model
 
The Maze of Deletion in Ontology Stream Reasoning
The Maze of Deletion in Ontology Stream Reasoning The Maze of Deletion in Ontology Stream Reasoning
The Maze of Deletion in Ontology Stream Reasoning
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
 
Transfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine LearningTransfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine Learning
 
Semantics reloaded
Semantics reloadedSemantics reloaded
Semantics reloaded
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentation
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introduction
 
Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical I...
Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical I...Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical I...
Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical I...
 
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge GraphJoint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
 
Meta learning tutorial
Meta learning tutorialMeta learning tutorial
Meta learning tutorial
 
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
 
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
 
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
 
Supporting Springer Nature Editors by means of Semantic Technologies
Supporting Springer Nature Editors by means of Semantic TechnologiesSupporting Springer Nature Editors by means of Semantic Technologies
Supporting Springer Nature Editors by means of Semantic Technologies
 

Ähnlich wie LDAvis

NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationEugene Nho
 
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Jonathon Hare
 
Towards Computational Research Objects
Towards Computational Research ObjectsTowards Computational Research Objects
Towards Computational Research ObjectsDavid De Roure
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015rusbase
 
Frontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisFrontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisJonathan Stray
 
Word_Embeddings.pptx
Word_Embeddings.pptxWord_Embeddings.pptx
Word_Embeddings.pptxGowrySailaja
 
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Anubhav Jain
 
A Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic ModellingA Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic Modellingcsandit
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGcscpconf
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaopenseesdays
 
A scalable gibbs sampler for probabilistic entity linking
A scalable gibbs sampler for probabilistic entity linkingA scalable gibbs sampler for probabilistic entity linking
A scalable gibbs sampler for probabilistic entity linkingSunny Kr
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Semantic Web languages: Expressivity vs scalability
Semantic Web languages: Expressivity vs scalabilitySemantic Web languages: Expressivity vs scalability
Semantic Web languages: Expressivity vs scalabilitynvitucci
 
Kdd 2014 tutorial bringing structure to text - chi
Kdd 2014 tutorial   bringing structure to text - chiKdd 2014 tutorial   bringing structure to text - chi
Kdd 2014 tutorial bringing structure to text - chiBarbara Starr
 
A Distributed Architecture System for Recognizing Textual Entailment
A Distributed Architecture System for Recognizing Textual EntailmentA Distributed Architecture System for Recognizing Textual Entailment
A Distributed Architecture System for Recognizing Textual EntailmentFaculty of Computer Science
 

Ähnlich wie LDAvis (20)

NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic Classification
 
A-Study_TopicModeling
A-Study_TopicModelingA-Study_TopicModeling
A-Study_TopicModeling
 
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 
Lecture1.pptx
Lecture1.pptxLecture1.pptx
Lecture1.pptx
 
Towards Computational Research Objects
Towards Computational Research ObjectsTowards Computational Research Objects
Towards Computational Research Objects
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
 
Frontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisFrontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text Analysis
 
Word_Embeddings.pptx
Word_Embeddings.pptxWord_Embeddings.pptx
Word_Embeddings.pptx
 
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
 
A Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic ModellingA Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic Modelling
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKenna
 
A scalable gibbs sampler for probabilistic entity linking
A scalable gibbs sampler for probabilistic entity linkingA scalable gibbs sampler for probabilistic entity linking
A scalable gibbs sampler for probabilistic entity linking
 
Wi presentation
Wi presentationWi presentation
Wi presentation
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Semantic Web languages: Expressivity vs scalability
Semantic Web languages: Expressivity vs scalabilitySemantic Web languages: Expressivity vs scalability
Semantic Web languages: Expressivity vs scalability
 
Kdd 2014 tutorial bringing structure to text - chi
Kdd 2014 tutorial   bringing structure to text - chiKdd 2014 tutorial   bringing structure to text - chi
Kdd 2014 tutorial bringing structure to text - chi
 
A Distributed Architecture System for Recognizing Textual Entailment
A Distributed Architecture System for Recognizing Textual EntailmentA Distributed Architecture System for Recognizing Textual Entailment
A Distributed Architecture System for Recognizing Textual Entailment
 
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
 

Kürzlich hochgeladen

Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...David Celestin
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfSkillCertProExams
 
Introduction to Artificial intelligence.
Introduction to Artificial intelligence.Introduction to Artificial intelligence.
Introduction to Artificial intelligence.thamaeteboho94
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...ZurliaSoop
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...amilabibi1
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoKayode Fayemi
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatmentnswingard
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalFabian de Rijk
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar TrainingKylaCullinane
 
Zone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxZone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxlionnarsimharajumjf
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfMahamudul Hasan
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityHung Le
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIINhPhngng3
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lodhisaajjda
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Baileyhlharris
 

Kürzlich hochgeladen (17)

Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
Introduction to Artificial intelligence.
Introduction to Artificial intelligence.Introduction to Artificial intelligence.
Introduction to Artificial intelligence.
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait Cityin kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
 
Zone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxZone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptx
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 

LDAvis

  • 1. LDAvis: A method for visualizing and interpreting topics Carson Sievert Iowa State University Kenneth E. Shirley AT&T Labs Research Illvi2014
  • 2. • LDAvis, a web-based interactive visualization of topics estimated using Latent Dirichlet Allocation that is built using a combination of R and D3. • We introduce LDAvis that attempts to answer a few basic questions about a fitted topic model: I . I N T R O D U C T I O N
  • 3. A. What is the meaning of each topic?, B. How prevalent is each topic?, C. How do the topics relate to each other? I . I N T R O D U C T I O N
  • 4. I I . R E L A T E D W O R K LDA Turbo- Topics liftPMI
  • 5. I I . R E L A T E D W O R K LDA Latent Dirichlet allocation (LDA) is an example of a topic model and was first presented as a graphical model for topic discovery by David Blei, Andrew Ng, and Michael I. Jordan in 2003.
  • 6. I I . R E L A T E D W O R K Turbo- Topics Blei and Lafferty (2009) developed “Turbo Topics”, a method of identifying n-grams within LDA inferred topics。 the resulting output is still simply a ranked list containing a mixture of terms and n-grams。
  • 7. I I . R E L A T E D W O R K Turbo- Topics
  • 8. I I . R E L A T E D W O R K Pointwise Mutual Information(PMI) : PMI Independence : P(x , y) = P(x) × P(y) PMI = log1 = 0
  • 9. I I . R E L A T E D W O R K lift Lift: Φ kw denote the probability of term w ∈ {1, ..., V } for topic k ∈ {1, ..., K}, where V denotes the number of terms in the vocabulary, and let pw denote the marginal probability of term w in the corpus.
  • 10. I I . T O P I C M O D E L V I S U A L I Z A T I O N • A number of visualization systems for topic models have been developed in recent years. • But, the visualization elements are limited to barcharts or word clouds of term probabilities for each topic, pie charts of topic probabilities for each document, and/or various barcharts or scatterplots related to document metadata.
  • 11. I I . T O P I C M O D E L V I S U A L I Z A T I O N • Chuang et al. (2012b) develop such a tool, called “Termite”, which visualizes the set of topic term distributions estimated in LDA using a matrix layout. • http://vis.stanford.edu/papers/termite
  • 12. I I . T O P I C M O D E L V I S U A L I Z A T I O N • 在代表主題內容的關鍵字選擇上,Termite定義了saliency 變數: saliency(w) = P(w)× distinctiveness(w)。 • w 代表一個關鍵字; P(w)代表w的頻率;distinctiveness(w)代表w在主題間 的差異性,包含w的主題越多,distinctiveness(w)值越低。 • 在關鍵詞排列上,Termite 把經常連續出現的詞,如social和networks 排在 一起以便增強語義讓使用者較好理解。
  • 13. I I I . R E L E V A N C E O F T E R M S T O T O P I C S • Here we define relevance, our method for ranking terms within topics, and we describe the results of a user study to learn an optimal tuning parameter in the computation of relevance. • We define the relevance of term w to topic k given a weight parameter λ (where 0 ≦ λ ≦ 1).
  • 14. I I I . R E L E V A N C E O F T E R M S T O T O P I C S • “λ” determines the weight given to the probability of term w under topic k relative to its lift. • Setting “λ” = 1 results in the familiar ranking of terms in decreasing order of their topic-specific probability, and setting “λ”= 0 ranks terms solely by their lift.
  • 15. I I I . R E L E V A N C E O F T E R M S T O T O P I C S • We fit a 50-topic model to the 20 Newsgroups data and each term in the vocabulary (which has size V = 22, 524) for a given topic. • Figure shows this plot for Topic 29, which documents posted to the “Motorcycles” Newsgroup, but also from documents posted to the “Automobiles” Newsgroup and the “Electronics” Newsgroup.
  • 16.
  • 17. I I I . U S E R S T U D Y  13, 695 documents 20 Newsgroups
  • 18. I I I . U S E R S T U D Y • Some of the LDA-inferred topics occurred almost exclusively (> 90% of occurrences) from a single Newsgroup, such as Topic 38. which was came from the documents posted to the “Medicine” (or “sci.med”) Newsgroup. • Other topics occurred in a wide variety of Newsgroups. One would expect these “spread-out” topics to be harder to interpret than the “pure” topics like Topic 38.
  • 19. I I I . U S E R S T U D Y • we recruited 29 subjects among our colleagues (research scientists at AT&T Labs with moderate familiarity with text mining techniques and topic models). • each subject completed an online experiment consisting of 50 tasks k (for k ∈ {1, ..., 50}). • Task k was to read a list of five terms, ranked from 1-5 in order of relevance to topic k, where “λ” ∈ (0, 1) was randomly sampled to compute relevance.
  • 20. I I I . U S E R S T U D Y • we expected the proportion of correct responses to be roughly 1/3 no matter the value of λ used to compute relevance. • In fact, seven of the topics were correctly identified by all 29 users, and one topic was incorrectly by all users. • we estimated a topic specific intercept term to control the difficulty of the topic (not just due to its tokens variety, but also to account for the inherent familiarity of each topic to our subject.)
  • 21. I I I . U S E R S T U D Y • The estimated effects of λ and λ² was statistically significant (χ² p-value = 0.018). • there was roughly a 67% baseline probability of correct identification. As Figure 3 shows, for these topics, the “optimal” value of λ was about 0.6. • λ:0 ≒ 53 % and λ:1 ≒ 63 % • We view this as evidence that where λ < 1 can really improve topic interpretability.
  • 22. I I I . U S E R S T U D Y
  • 23. I V . T H E L D A V I S S Y S T E M Installation: pip install pyldavis
  • 24. I V . T H E L D A V I S S Y S T E M http://nbviewer.jupyter.org/github/bmabey/pyLDAvis/blob/ master/notebooks/pyLDAvis_overview.ipynb
  • 25. I V . T H E L D A V I S S Y S T E M http://nbviewer.jupyter.org/github/effytseng/dh_ldavis/blob/ma ster/LDA-practice-1.html#topic=0&lambda=1&term= http://nbviewer.jupyter.org/github/effytseng/dh_ldavis/blob/ma ster/python-lldavis_test1-24-2.html http://nbviewer.jupyter.org/github/bmabey/pyLDAvis/blob/mast er/notebooks/Gensim%20Newsgroup.ipynb .用 gensim 處理文本: .自由中國 ldavis 圖表: .自由中國 l-ldavis 圖表:
  • 26. I V . T H E L D A V I S S Y S T E M • the areas of the circles are proportional to the relative prevalences of the topics in the corpus • The default for computing inter-topic distances is Jensen- Shannon divergence. • The default for scaling the set of inter-topic distances defaults to Principal Components.
  • 27. I V . T H E L D A V I S S Y S T E M
  • 28. I V . T H E L D A V I S S Y S T E M
  • 29. I V . T H E L D A V I S S Y S T E M
  • 30. V . D I S C U S S I O N A. What is the meaning of each topic?, B. How prevalent is each topic?, C. How do the topics relate to each other?
  • 31. V . D I S C U S S I O N • For future work, we anticipate performing a larger user study to further understand how to facilitate topic interpretation in fitted LDA models • In addition to relevance. The need to visualize correlations between topics can provide insight into what is happening on the document level without actually displaying entire documents. • Last, we seek a solution to the problem of visualizing a large number of topics (say, from 100 - 500 topics) in a compact way.