SlideShare ist ein Scribd-Unternehmen logo
1 von 37
©2013 LinkedIn Corporation. All Rights Reserved.
Latent Dirichlet Allocation (LDA)
- for ML-IR Discussion Group
1
Prepared by Wayne Tai Lee, Satpreet Singh
©2013 LinkedIn Corporation. All Rights Reserved.
Latent Dirichlet Allocation:
A Bayesian Unsupervised Learning Model
Roadmap
2
• Unsupervised learning
• Bayesian Statistics
• Mixture Models
• LDA – theory and intuition
• LDA – practice and applications
©2013 LinkedIn Corporation. All Rights Reserved.
Unsupervised Learning
Learning patterns with no labels
3
• Clustering is a form of “Unsupervised learning”
• Classification is known as supervised learning
• Validation is difficult
©2013 LinkedIn Corporation. All Rights Reserved. 4
How would you cluster?
©2013 LinkedIn Corporation. All Rights Reserved. 5
Documents of wikipedia
Now try these ones!
©2013 LinkedIn Corporation. All Rights Reserved.
Bayesian Statistics
A framework to update your beliefs
6
• Probabilities as beliefs
• Updates your belief as data is observed
• Requires a model that describes the data generation
©2013 LinkedIn Corporation. All Rights Reserved. 7
Candidate potential
Example: Evaluating Candidates
©2013 LinkedIn Corporation. All Rights Reserved. 8
Candidate potential
Example: Evaluating Candidates
Schooling
Experience
Interview
Internship
©2013 LinkedIn Corporation. All Rights Reserved. 9
Candidate potential
Example: Evaluating Candidates
Schooling
Experience
Interview
Internship
How to update?!
©2013 LinkedIn Corporation. All Rights Reserved. 10
©2013 LinkedIn Corporation. All Rights Reserved. 11
Model for candidates Model for data generation
©2013 LinkedIn Corporation. All Rights Reserved.
Mixture Models
A popular statistical model
12
• An easy way to build hierarchical relationships
©2013 LinkedIn Corporation. All Rights Reserved.
Mixture models visualized
13
Candidate Quality
High
Low
©2013 LinkedIn Corporation. All Rights Reserved. 14
Marginal Distribution of Candidate Performance: ignore quality
©2013 LinkedIn Corporation. All Rights Reserved. 15
Distribution of Candidate Performance:
©2013 LinkedIn Corporation. All Rights Reserved. 16
Distribution of Candidate Performance:
Mixture Weights
©2013 LinkedIn Corporation. All Rights Reserved. 17
Mixture Weights
Distribution of Candidate Performance:
©2013 LinkedIn Corporation. All Rights Reserved. 18
Distribution of Candidate Performance:
?
? ?
?
©2013 LinkedIn Corporation. All Rights Reserved.
How are words in a document generated?
19
©2013 LinkedIn Corporation. All Rights Reserved.
One possibility:
20
Each word comes from different topics (bag of words: ignore order)
©2013 LinkedIn Corporation. All Rights Reserved.
How are words in a document generated?
21
Each word comes from different topics
Mixture Weight
for Topic k
Multinomial Distribution
over ALL words based
on topic k
©2013 LinkedIn Corporation. All Rights Reserved.
Just a mixture model
22
Word
Topic 1
Topic K
Leadership
Big Data
Machine Learning
©2013 LinkedIn Corporation. All Rights Reserved.
Just a mixture model
23
Word
Topic 1
Topic K
Leadership
Big Data
Machine Learning
1) Pick a topic
2) Pick a word
©2013 LinkedIn Corporation. All Rights Reserved.
Just a mixture model
24
Word
Topic 1
Topic K
Leadership
Big Data
Machine Learning
The chosen
Topic: Z
©2013 LinkedIn Corporation. All Rights Reserved.
Just a mixture model
25
Word
Topic 1
Topic K
Leadership
Big Data
Machine Learning
So we really want to know
1) Z
2) _
3) _
The chosen
Topic: Z
©2013 LinkedIn Corporation. All Rights Reserved.
Just a mixture model
26
Word
Topic 1
Topic K
Leadership
Big Data
Machine Learning
So we really want to know
1) Z (cluster for the word)
2) (document composition)
3) (key words)
The chosen
Topic: Z
©2013 LinkedIn Corporation. All Rights Reserved.
Review!
27
Z W
©2013 LinkedIn Corporation. All Rights Reserved. 28
Zd,n
k=1…K
Wd,n
n=1,…,Nd
d=1,…,D
K: number of topics
Nd: number of words
D: number of documents
©2013 LinkedIn Corporation. All Rights Reserved. 29
Zd,n
k=1…K
Wd,n
n=1,…,Nd
d=1,…,D
K: number of topics
Nd: number of words
D: number of documents
Bayesian: But what about the distribution for and ??
©2013 LinkedIn Corporation. All Rights Reserved. 30
Zd,n
k=1…K
Wd,n
n=1,…,Nd
d=1,…,D
K: number of topics
Nd: number of words
D: number of documents
Bayesian: But what about the distribution for and ??
©2013 LinkedIn Corporation. All Rights Reserved. 31
and control the “sparsity” of the weights for the multinomial.
Implications: a priori we assume
- Topics have few key words
- Documents only have a small subset of topics
©2013 LinkedIn Corporation. All Rights Reserved.
Dirichlet Distribution with Different Sparsity Parameters
32
©2013 LinkedIn Corporation. All Rights Reserved. 33
Latent Dirichlet Allocation!!!
Zd,n
k=1…K
Wd,n
n=1,…,Nd
©2013 LinkedIn Corporation. All Rights Reserved. 34
How do we fit this model?
Want the posterior:
Worst part of Bayesian Analysis…..personally speaking~
©2013 LinkedIn Corporation. All Rights Reserved. 35
Two main ways to get posterior:
- Sampling methods
- Asymtotically correct
- Time consuming
- Lots of black magic in sampling tricks
- Variational methods (practical solution!)
- An approximation with no guarantees
- Faster
- Need math skills
©2013 LinkedIn Corporation. All Rights Reserved. 36
Variational Bayes (specifically mean field variational bayes):
What’s crazy?
- Assumes all the latent variables are independent
What’s not crazy?
- Finds the “best” model within this crazy class.
- Best under KL divergence
Empirically have shown promising results!
For “sufficient” details:
“Explaining Variational Approximations ” by Ormerod and Wand
©2013 LinkedIn Corporation. All Rights Reserved.
LDA Take Home
37
- An intuitively appealing Bayesian unsupervised learning model
- Training is difficult
- Lots of packages exist, main issue is scalability
- Validation is difficult
- Usually cast into a supervised learning framework
- Presentation is difficult
- Visualization for the Bayesian model is hard.

Weitere ähnliche Inhalte

Was ist angesagt?

Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
Simplilearn
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)
Cory Cook
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
Text summarization
Text summarizationText summarization
Text summarization
kareemhashem
 

Was ist angesagt? (20)

Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
 
Em algorithm
Em algorithmEm algorithm
Em algorithm
 
Zero shot learning
Zero shot learning Zero shot learning
Zero shot learning
 
Text Classification
Text ClassificationText Classification
Text Classification
 
K MEANS CLUSTERING
K MEANS CLUSTERINGK MEANS CLUSTERING
K MEANS CLUSTERING
 
Matrix Factorization
Matrix FactorizationMatrix Factorization
Matrix Factorization
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - Introduction
 
Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text Classification
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Introduction to text classification using naive bayes
Introduction to text classification using naive bayesIntroduction to text classification using naive bayes
Introduction to text classification using naive bayes
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes Classifier
 
Text summarization
Text summarizationText summarization
Text summarization
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language Technology
 

Ähnlich wie LDA Beginner's Tutorial

Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Vitaly Gordon
 
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Lionel Briand
 
Open Source Data Visualization for Resource Sharing: An Ivy Plus Libraries Pr...
Open Source Data Visualization for Resource Sharing: An Ivy Plus Libraries Pr...Open Source Data Visualization for Resource Sharing: An Ivy Plus Libraries Pr...
Open Source Data Visualization for Resource Sharing: An Ivy Plus Libraries Pr...
Heidi Nance
 
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
Social Fresh Conference
 

Ähnlich wie LDA Beginner's Tutorial (20)

Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips. Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
 
Computing Professional Identity for the Economic Graph
Computing Professional Identity for the Economic GraphComputing Professional Identity for the Economic Graph
Computing Professional Identity for the Economic Graph
 
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
 
SF Data Science: Developing Data Products
SF Data Science: Developing Data ProductsSF Data Science: Developing Data Products
SF Data Science: Developing Data Products
 
Workshop - Neo4j Graph Data Science
Workshop - Neo4j Graph Data ScienceWorkshop - Neo4j Graph Data Science
Workshop - Neo4j Graph Data Science
 
Developing Data Products
Developing Data ProductsDeveloping Data Products
Developing Data Products
 
MIT Sloan: Intro to Machine Learning
MIT Sloan: Intro to Machine LearningMIT Sloan: Intro to Machine Learning
MIT Sloan: Intro to Machine Learning
 
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
 
Getstarteddssd12717sd
Getstarteddssd12717sdGetstarteddssd12717sd
Getstarteddssd12717sd
 
Relationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningRelationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine Learning
 
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
 
Big Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR CongressBig Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR Congress
 
Open Source Data Visualization for Resource Sharing: An Ivy Plus Libraries Pr...
Open Source Data Visualization for Resource Sharing: An Ivy Plus Libraries Pr...Open Source Data Visualization for Resource Sharing: An Ivy Plus Libraries Pr...
Open Source Data Visualization for Resource Sharing: An Ivy Plus Libraries Pr...
 
Social Search in a Professional Context
Social Search in a Professional ContextSocial Search in a Professional Context
Social Search in a Professional Context
 
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
 
Building Enterprise Knowledge Using Semantic Encyclopedias
Building Enterprise Knowledge Using Semantic EncyclopediasBuilding Enterprise Knowledge Using Semantic Encyclopedias
Building Enterprise Knowledge Using Semantic Encyclopedias
 
Knowledge Graphs and Generative AI
Knowledge Graphs and Generative AIKnowledge Graphs and Generative AI
Knowledge Graphs and Generative AI
 
Data-X-v3.1
Data-X-v3.1Data-X-v3.1
Data-X-v3.1
 
Data-X-Sparse-v2
Data-X-Sparse-v2Data-X-Sparse-v2
Data-X-Sparse-v2
 
Applied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML modelApplied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML model
 

Mehr von Wayne Lee

Mehr von Wayne Lee (7)

Feature selection can hurt model inference
Feature selection can hurt model inferenceFeature selection can hurt model inference
Feature selection can hurt model inference
 
Explaining the Basics of Mean Field Variational Approximation for Statisticians
Explaining the Basics of Mean Field Variational Approximation for StatisticiansExplaining the Basics of Mean Field Variational Approximation for Statisticians
Explaining the Basics of Mean Field Variational Approximation for Statisticians
 
What is bayesian statistics and how is it different?
What is bayesian statistics and how is it different?What is bayesian statistics and how is it different?
What is bayesian statistics and how is it different?
 
R merge-tutorial
R merge-tutorialR merge-tutorial
R merge-tutorial
 
The Key to Blind Dates - Data Snooping
The Key to Blind Dates - Data SnoopingThe Key to Blind Dates - Data Snooping
The Key to Blind Dates - Data Snooping
 
Crash Course in A/B testing
Crash Course in A/B testingCrash Course in A/B testing
Crash Course in A/B testing
 
Introduction to Bag of Little Bootstrap
Introduction to Bag of Little Bootstrap Introduction to Bag of Little Bootstrap
Introduction to Bag of Little Bootstrap
 

Kürzlich hochgeladen

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
SoniaTolstoy
 

Kürzlich hochgeladen (20)

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 

LDA Beginner's Tutorial

  • 1. ©2013 LinkedIn Corporation. All Rights Reserved. Latent Dirichlet Allocation (LDA) - for ML-IR Discussion Group 1 Prepared by Wayne Tai Lee, Satpreet Singh
  • 2. ©2013 LinkedIn Corporation. All Rights Reserved. Latent Dirichlet Allocation: A Bayesian Unsupervised Learning Model Roadmap 2 • Unsupervised learning • Bayesian Statistics • Mixture Models • LDA – theory and intuition • LDA – practice and applications
  • 3. ©2013 LinkedIn Corporation. All Rights Reserved. Unsupervised Learning Learning patterns with no labels 3 • Clustering is a form of “Unsupervised learning” • Classification is known as supervised learning • Validation is difficult
  • 4. ©2013 LinkedIn Corporation. All Rights Reserved. 4 How would you cluster?
  • 5. ©2013 LinkedIn Corporation. All Rights Reserved. 5 Documents of wikipedia Now try these ones!
  • 6. ©2013 LinkedIn Corporation. All Rights Reserved. Bayesian Statistics A framework to update your beliefs 6 • Probabilities as beliefs • Updates your belief as data is observed • Requires a model that describes the data generation
  • 7. ©2013 LinkedIn Corporation. All Rights Reserved. 7 Candidate potential Example: Evaluating Candidates
  • 8. ©2013 LinkedIn Corporation. All Rights Reserved. 8 Candidate potential Example: Evaluating Candidates Schooling Experience Interview Internship
  • 9. ©2013 LinkedIn Corporation. All Rights Reserved. 9 Candidate potential Example: Evaluating Candidates Schooling Experience Interview Internship How to update?!
  • 10. ©2013 LinkedIn Corporation. All Rights Reserved. 10
  • 11. ©2013 LinkedIn Corporation. All Rights Reserved. 11 Model for candidates Model for data generation
  • 12. ©2013 LinkedIn Corporation. All Rights Reserved. Mixture Models A popular statistical model 12 • An easy way to build hierarchical relationships
  • 13. ©2013 LinkedIn Corporation. All Rights Reserved. Mixture models visualized 13 Candidate Quality High Low
  • 14. ©2013 LinkedIn Corporation. All Rights Reserved. 14 Marginal Distribution of Candidate Performance: ignore quality
  • 15. ©2013 LinkedIn Corporation. All Rights Reserved. 15 Distribution of Candidate Performance:
  • 16. ©2013 LinkedIn Corporation. All Rights Reserved. 16 Distribution of Candidate Performance: Mixture Weights
  • 17. ©2013 LinkedIn Corporation. All Rights Reserved. 17 Mixture Weights Distribution of Candidate Performance:
  • 18. ©2013 LinkedIn Corporation. All Rights Reserved. 18 Distribution of Candidate Performance: ? ? ? ?
  • 19. ©2013 LinkedIn Corporation. All Rights Reserved. How are words in a document generated? 19
  • 20. ©2013 LinkedIn Corporation. All Rights Reserved. One possibility: 20 Each word comes from different topics (bag of words: ignore order)
  • 21. ©2013 LinkedIn Corporation. All Rights Reserved. How are words in a document generated? 21 Each word comes from different topics Mixture Weight for Topic k Multinomial Distribution over ALL words based on topic k
  • 22. ©2013 LinkedIn Corporation. All Rights Reserved. Just a mixture model 22 Word Topic 1 Topic K Leadership Big Data Machine Learning
  • 23. ©2013 LinkedIn Corporation. All Rights Reserved. Just a mixture model 23 Word Topic 1 Topic K Leadership Big Data Machine Learning 1) Pick a topic 2) Pick a word
  • 24. ©2013 LinkedIn Corporation. All Rights Reserved. Just a mixture model 24 Word Topic 1 Topic K Leadership Big Data Machine Learning The chosen Topic: Z
  • 25. ©2013 LinkedIn Corporation. All Rights Reserved. Just a mixture model 25 Word Topic 1 Topic K Leadership Big Data Machine Learning So we really want to know 1) Z 2) _ 3) _ The chosen Topic: Z
  • 26. ©2013 LinkedIn Corporation. All Rights Reserved. Just a mixture model 26 Word Topic 1 Topic K Leadership Big Data Machine Learning So we really want to know 1) Z (cluster for the word) 2) (document composition) 3) (key words) The chosen Topic: Z
  • 27. ©2013 LinkedIn Corporation. All Rights Reserved. Review! 27 Z W
  • 28. ©2013 LinkedIn Corporation. All Rights Reserved. 28 Zd,n k=1…K Wd,n n=1,…,Nd d=1,…,D K: number of topics Nd: number of words D: number of documents
  • 29. ©2013 LinkedIn Corporation. All Rights Reserved. 29 Zd,n k=1…K Wd,n n=1,…,Nd d=1,…,D K: number of topics Nd: number of words D: number of documents Bayesian: But what about the distribution for and ??
  • 30. ©2013 LinkedIn Corporation. All Rights Reserved. 30 Zd,n k=1…K Wd,n n=1,…,Nd d=1,…,D K: number of topics Nd: number of words D: number of documents Bayesian: But what about the distribution for and ??
  • 31. ©2013 LinkedIn Corporation. All Rights Reserved. 31 and control the “sparsity” of the weights for the multinomial. Implications: a priori we assume - Topics have few key words - Documents only have a small subset of topics
  • 32. ©2013 LinkedIn Corporation. All Rights Reserved. Dirichlet Distribution with Different Sparsity Parameters 32
  • 33. ©2013 LinkedIn Corporation. All Rights Reserved. 33 Latent Dirichlet Allocation!!! Zd,n k=1…K Wd,n n=1,…,Nd
  • 34. ©2013 LinkedIn Corporation. All Rights Reserved. 34 How do we fit this model? Want the posterior: Worst part of Bayesian Analysis…..personally speaking~
  • 35. ©2013 LinkedIn Corporation. All Rights Reserved. 35 Two main ways to get posterior: - Sampling methods - Asymtotically correct - Time consuming - Lots of black magic in sampling tricks - Variational methods (practical solution!) - An approximation with no guarantees - Faster - Need math skills
  • 36. ©2013 LinkedIn Corporation. All Rights Reserved. 36 Variational Bayes (specifically mean field variational bayes): What’s crazy? - Assumes all the latent variables are independent What’s not crazy? - Finds the “best” model within this crazy class. - Best under KL divergence Empirically have shown promising results! For “sufficient” details: “Explaining Variational Approximations ” by Ormerod and Wand
  • 37. ©2013 LinkedIn Corporation. All Rights Reserved. LDA Take Home 37 - An intuitively appealing Bayesian unsupervised learning model - Training is difficult - Lots of packages exist, main issue is scalability - Validation is difficult - Usually cast into a supervised learning framework - Presentation is difficult - Visualization for the Bayesian model is hard.

Hinweis der Redaktion

  1. Take home: validation is difficult….no true answer here.
  2. Clustering documents is difficult because many repeated words are used. Some documents may be similar to one another on different topics. So we might want to cluster allowing membership.
  3. 2 stage process
  4. Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  5. 2 stage process
  6. 2 stage process
  7. 2 stage process
  8. 2 stage process
  9. 2 stage process
  10. Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  11. Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  12. Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  13. Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  14. Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  15. Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  16. Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  17. Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.
  18. Example: the word usage of “professional” is probably higher in the topic of professional network than a social network.