SlideShare ist ein Scribd-Unternehmen logo
1 von 19
LDA and it’s applications
AI HACKERS
What is LDA?
 LDA stands for latent dirichlet allocation
 It is basically of distribution of words in topic k (let’s say 50) with probability of
topic k occurring in document d (let’s say 5000)
 Mechanism - It uses special kind of distribution called Dirichlet Distribution which
is nothing but multi—variate generalization of Beta distribution of probability
density function
LDA in layman terms
Sentence 1: I spend the evening watching football
Sentence 2: I ate nachos and guacamole.
Sentence 3: I spend the evening watching football while eating nachos and guacamole.
LDA might say something like:
Sentence A is 100% about Topic 1
Sentence B is 100% Topic 2
Sentence C is 65% is Topic 1, 35% Topic 2
But also tells that
Topic 1 is about football (50%), evening (50%),
topic 2 is about nachos (50%), guacamole (50)%
Bayesian Network Example
LDA is Bayesian Network of Probability
Density function
LDA history
Andrew NgDavid Blei Michael I Jordan
A simple LDA
https://ai.stanford.edu/~ang/papers/nips01-lda.pdf
Packages used in python
 sudo pip install nltk
 sudo pip install genism
 sudo pip intall stop-words
Stop words
 Stop words are commonly occurring words which doesn’t contribute to topic
modelling.
 the, and, or
 However, sometimes, removing stop words affect topic modelling
 For e.g., Thor The Ragnarok is a single topic but we use stop words mechanism, then it
will be removed.
Porter’s Stemmer algorithm
 A common NLP technique to reduce topically similar words to their root. For e.g., “stemming,” “stemmer,”
“stemmed,” all have similar meanings; stemming reduces those terms to “stem.”
 Important for topic modeling, which would otherwise view those terms as separate entities and reduce
their importance in the model.
 It's a bunch of rules for reducing a word:
 sses -> es
 ies -> i
 ational -> ate
 tional -> tion
 s -> ∅
 when conflicts, the longest rule wins
 Bad idea unless you customize it.
Porter’s Stemmer algorithm -Flowchart
Arabic Stemming Process
Simple Stemming Process
Lemmatization
 It goes one step further than stemming.
 It obtains grammatically correct words and distinguishes words by their word
sense with the use of a vocabulary (e.g., type can mean write or category).
 It is a much more difficult and expensive process than stemming.
Lemmatization - Example
Bag of Words
Word2Vec
CBOW v/s SKIP-GRAM
https://arxiv.org/pdf/1301.3781.pdf
LDA 2 VEC –
what really happens?
https://arxiv.org/pdf/1605.02019.pdf
LDA2VEC model adds in skipgrams.
A word predicts another word in the same window,
as in word2vec, but also has the notion of a context vector
which only changes at the document level as in LDA.
Lda2Vec – Pytorch code
 Source: https://github.com/TropComplique/lda2vec-pytorch
 Go to 20newsgroups/.
 Run get_windows.ipynb to prepare data.
 Run python train.py for training.
 Run explore_trained_model.ipynb.
 To use this on your data you need to edit get_windows.ipynb. Also there are
hyperparameters in 20newsgroups/train.py, utils/training.py, utils/lda2vec_loss.py.
Thank ou

Weitere Àhnliche Inhalte

Was ist angesagt?

[GomGuard] 뉎런부터 YOLO êčŒì§€ - ë”„ëŸŹë‹ 전반에 대한 읎알Ʞ
[GomGuard] 뉎런부터 YOLO êčŒì§€ - ë”„ëŸŹë‹ 전반에 대한 읎알Ʞ[GomGuard] 뉎런부터 YOLO êčŒì§€ - ë”„ëŸŹë‹ 전반에 대한 읎알Ʞ
[GomGuard] 뉎런부터 YOLO êčŒì§€ - ë”„ëŸŹë‹ 전반에 대한 읎알ꞰJungHyun Hong
 
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)Universitat PolitĂšcnica de Catalunya
 
Point Processing
Point ProcessingPoint Processing
Point ProcessingGowrishankar S
 
Problem Solving
Problem Solving Problem Solving
Problem Solving Amar Jukuntla
 
Alpaydin - Chapter 2
Alpaydin - Chapter 2Alpaydin - Chapter 2
Alpaydin - Chapter 2butest
 
Text Mining mit Python und PowerBI
Text Mining mit Python und PowerBIText Mining mit Python und PowerBI
Text Mining mit Python und PowerBIJens Albrecht
 
Introduction_to_DEEP_LEARNING.ppt
Introduction_to_DEEP_LEARNING.pptIntroduction_to_DEEP_LEARNING.ppt
Introduction_to_DEEP_LEARNING.pptSwatiMahale4
 
Ai ê·žêčŒìŽê±°
Ai ê·žêčŒìŽê±°Ai ê·žêčŒìŽê±°
Ai ê·žêčŒìŽê±°ë„형 임
 
Introduction to Generative Adversarial Networks
Introduction to Generative Adversarial NetworksIntroduction to Generative Adversarial Networks
Introduction to Generative Adversarial NetworksBennoG1
 
Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Amr Rashed
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMDivya Gera
 
Deep learning.pptx
Deep learning.pptxDeep learning.pptx
Deep learning.pptxMdMahfoozAlam5
 
First order logic
First order logicFirst order logic
First order logicRushdi Shams
 
Domain adaptation
Domain adaptationDomain adaptation
Domain adaptationTomoya Koike
 
Unsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANUnsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANShyam Krishna Khadka
 

Was ist angesagt? (20)

[GomGuard] 뉎런부터 YOLO êčŒì§€ - ë”„ëŸŹë‹ 전반에 대한 읎알Ʞ
[GomGuard] 뉎런부터 YOLO êčŒì§€ - ë”„ëŸŹë‹ 전반에 대한 읎알Ʞ[GomGuard] 뉎런부터 YOLO êčŒì§€ - ë”„ëŸŹë‹ 전반에 대한 읎알Ʞ
[GomGuard] 뉎런부터 YOLO êčŒì§€ - ë”„ëŸŹë‹ 전반에 대한 읎알Ʞ
 
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
 
Point Processing
Point ProcessingPoint Processing
Point Processing
 
Problem Solving
Problem Solving Problem Solving
Problem Solving
 
Alpaydin - Chapter 2
Alpaydin - Chapter 2Alpaydin - Chapter 2
Alpaydin - Chapter 2
 
Text Mining mit Python und PowerBI
Text Mining mit Python und PowerBIText Mining mit Python und PowerBI
Text Mining mit Python und PowerBI
 
Introduction_to_DEEP_LEARNING.ppt
Introduction_to_DEEP_LEARNING.pptIntroduction_to_DEEP_LEARNING.ppt
Introduction_to_DEEP_LEARNING.ppt
 
Ai ê·žêčŒìŽê±°
Ai ê·žêčŒìŽê±°Ai ê·žêčŒìŽê±°
Ai ê·žêčŒìŽê±°
 
Introduction to Generative Adversarial Networks
Introduction to Generative Adversarial NetworksIntroduction to Generative Adversarial Networks
Introduction to Generative Adversarial Networks
 
Deep learning
Deep learningDeep learning
Deep learning
 
Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Deep learning tutorial 9/2019
Deep learning tutorial 9/2019
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
 
Generative adversarial text to image synthesis
Generative adversarial text to image synthesisGenerative adversarial text to image synthesis
Generative adversarial text to image synthesis
 
Html graphics
Html graphicsHtml graphics
Html graphics
 
Deep learning.pptx
Deep learning.pptxDeep learning.pptx
Deep learning.pptx
 
Artificial Intelligence Algorithms
Artificial Intelligence AlgorithmsArtificial Intelligence Algorithms
Artificial Intelligence Algorithms
 
First order logic
First order logicFirst order logic
First order logic
 
Domain adaptation
Domain adaptationDomain adaptation
Domain adaptation
 
AI-09 Logic in AI
AI-09 Logic in AIAI-09 Logic in AI
AI-09 Logic in AI
 
Unsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANUnsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGAN
 

Ähnlich wie Lda and it's applications

Using topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic searchUsing topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic searchDawn Anderson MSc DigM
 
Introduction to word embeddings with Python
Introduction to word embeddings with PythonIntroduction to word embeddings with Python
Introduction to word embeddings with PythonPavel Kalaidin
 
DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
DF1 - Py - Kalaidin - Introduction to Word Embeddings with PythonDF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
DF1 - Py - Kalaidin - Introduction to Word Embeddings with PythonMoscowDataFest
 
CMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics ICMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics Ibutest
 
Vectorization In NLP.pptx
Vectorization In NLP.pptxVectorization In NLP.pptx
Vectorization In NLP.pptxChode Amarnath
 
Measuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and ConceptsMeasuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and ConceptsUniversity of Minnesota, Duluth
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2Viral Gupta
 
graduate_thesis (1)
graduate_thesis (1)graduate_thesis (1)
graduate_thesis (1)Sihan Chen
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Tobias Wunner
 
Tricks in natural language processing
Tricks in natural language processingTricks in natural language processing
Tricks in natural language processingBabu Priyavrat
 
Jpl presentation
Jpl presentationJpl presentation
Jpl presentationRama Bastola
 
Jpl presentation
Jpl presentationJpl presentation
Jpl presentationRama Bastola
 
Jpl presentation
Jpl presentationJpl presentation
Jpl presentationRama Bastola
 
information retrieval --> dictionary.ppt
information retrieval --> dictionary.pptinformation retrieval --> dictionary.ppt
information retrieval --> dictionary.pptssusere3b1a2
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Chunyang Chen
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsClaudia Wagner
 

Ähnlich wie Lda and it's applications (20)

Using topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic searchUsing topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic search
 
Introduction to word embeddings with Python
Introduction to word embeddings with PythonIntroduction to word embeddings with Python
Introduction to word embeddings with Python
 
DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
DF1 - Py - Kalaidin - Introduction to Word Embeddings with PythonDF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
 
CMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics ICMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics I
 
Class14
Class14Class14
Class14
 
Icon 2007 Pedersen
Icon 2007 PedersenIcon 2007 Pedersen
Icon 2007 Pedersen
 
Vectorization In NLP.pptx
Vectorization In NLP.pptxVectorization In NLP.pptx
Vectorization In NLP.pptx
 
Ir 03
Ir   03Ir   03
Ir 03
 
Measuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and ConceptsMeasuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and Concepts
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2
 
graduate_thesis (1)
graduate_thesis (1)graduate_thesis (1)
graduate_thesis (1)
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1
 
Tricks in natural language processing
Tricks in natural language processingTricks in natural language processing
Tricks in natural language processing
 
Jpl presentation
Jpl presentationJpl presentation
Jpl presentation
 
Jpl presentation
Jpl presentationJpl presentation
Jpl presentation
 
Jpl presentation
Jpl presentationJpl presentation
Jpl presentation
 
information retrieval --> dictionary.ppt
information retrieval --> dictionary.pptinformation retrieval --> dictionary.ppt
information retrieval --> dictionary.ppt
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic Models
 

Mehr von Babu Priyavrat

Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning TechniquesBabu Priyavrat
 
NLP using Deep learning
NLP using Deep learningNLP using Deep learning
NLP using Deep learningBabu Priyavrat
 
Introduction to TensorFlow
Introduction to TensorFlowIntroduction to TensorFlow
Introduction to TensorFlowBabu Priyavrat
 
Supervised Machine Learning in R
Supervised  Machine Learning  in RSupervised  Machine Learning  in R
Supervised Machine Learning in RBabu Priyavrat
 
Introduction to-machine-learning
Introduction to-machine-learningIntroduction to-machine-learning
Introduction to-machine-learningBabu Priyavrat
 

Mehr von Babu Priyavrat (7)

5G and Drones
5G and Drones 5G and Drones
5G and Drones
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning Techniques
 
NLP using Deep learning
NLP using Deep learningNLP using Deep learning
NLP using Deep learning
 
Introduction to TensorFlow
Introduction to TensorFlowIntroduction to TensorFlow
Introduction to TensorFlow
 
Neural network
Neural networkNeural network
Neural network
 
Supervised Machine Learning in R
Supervised  Machine Learning  in RSupervised  Machine Learning  in R
Supervised Machine Learning in R
 
Introduction to-machine-learning
Introduction to-machine-learningIntroduction to-machine-learning
Introduction to-machine-learning
 

KĂŒrzlich hochgeladen

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectBoston Institute of Analytics
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
âž„đŸ” 7737669865 đŸ”â–» Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
âž„đŸ” 7737669865 đŸ”â–» Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...âž„đŸ” 7737669865 đŸ”â–» Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
âž„đŸ” 7737669865 đŸ”â–» Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort ServiceDelhi Call girls
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 

KĂŒrzlich hochgeladen (20)

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
âž„đŸ” 7737669865 đŸ”â–» Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
âž„đŸ” 7737669865 đŸ”â–» Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...âž„đŸ” 7737669865 đŸ”â–» Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
âž„đŸ” 7737669865 đŸ”â–» Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 

Lda and it's applications

  • 1. LDA and it’s applications AI HACKERS
  • 2. What is LDA?  LDA stands for latent dirichlet allocation  It is basically of distribution of words in topic k (let’s say 50) with probability of topic k occurring in document d (let’s say 5000)  Mechanism - It uses special kind of distribution called Dirichlet Distribution which is nothing but multi—variate generalization of Beta distribution of probability density function
  • 3. LDA in layman terms Sentence 1: I spend the evening watching football Sentence 2: I ate nachos and guacamole. Sentence 3: I spend the evening watching football while eating nachos and guacamole. LDA might say something like: Sentence A is 100% about Topic 1 Sentence B is 100% Topic 2 Sentence C is 65% is Topic 1, 35% Topic 2 But also tells that Topic 1 is about football (50%), evening (50%), topic 2 is about nachos (50%), guacamole (50)%
  • 5. LDA is Bayesian Network of Probability Density function
  • 6. LDA history Andrew NgDavid Blei Michael I Jordan
  • 8. Packages used in python  sudo pip install nltk  sudo pip install genism  sudo pip intall stop-words
  • 9. Stop words  Stop words are commonly occurring words which doesn’t contribute to topic modelling.  the, and, or  However, sometimes, removing stop words affect topic modelling  For e.g., Thor The Ragnarok is a single topic but we use stop words mechanism, then it will be removed.
  • 10. Porter’s Stemmer algorithm  A common NLP technique to reduce topically similar words to their root. For e.g., “stemming,” “stemmer,” “stemmed,” all have similar meanings; stemming reduces those terms to “stem.”  Important for topic modeling, which would otherwise view those terms as separate entities and reduce their importance in the model.  It's a bunch of rules for reducing a word:  sses -> es  ies -> i  ational -> ate  tional -> tion  s -> ∅  when conflicts, the longest rule wins  Bad idea unless you customize it.
  • 11. Porter’s Stemmer algorithm -Flowchart Arabic Stemming Process Simple Stemming Process
  • 12. Lemmatization  It goes one step further than stemming.  It obtains grammatically correct words and distinguishes words by their word sense with the use of a vocabulary (e.g., type can mean write or category).  It is a much more difficult and expensive process than stemming.
  • 17. LDA 2 VEC – what really happens? https://arxiv.org/pdf/1605.02019.pdf LDA2VEC model adds in skipgrams. A word predicts another word in the same window, as in word2vec, but also has the notion of a context vector which only changes at the document level as in LDA.
  • 18. Lda2Vec – Pytorch code  Source: https://github.com/TropComplique/lda2vec-pytorch  Go to 20newsgroups/.  Run get_windows.ipynb to prepare data.  Run python train.py for training.  Run explore_trained_model.ipynb.  To use this on your data you need to edit get_windows.ipynb. Also there are hyperparameters in 20newsgroups/train.py, utils/training.py, utils/lda2vec_loss.py.