SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
Context Based Search

                     By
       Shatabdi Kundu (2010EET2553)
        Computer Technology,M.Tech
                  IIT Delhi
       Email ID:shatabdikundu@live.com

                 Project Guide:
           Prof.Santanu Chaudhury
      Electrical Engineering Department
                   IIT Delhi
        Email ID:santanuc@ee.iitd.ac.in


Shatabdi Kundu :: 2010EET2553   Prof.Santanu Chaudhury   09 MAY 2011   1of 16
Outline



      Introduction to Topic Models- Probabilistic Modelling
      Latent Dirichlet Allocation
      Topic Discovery using Wordnet
      Work Done
      Results
      Conclusion and Future Work
      References




            Shatabdi Kundu :: 2010EET2553   Prof.Santanu Chaudhury   09 MAY 2011   2of 16
Probabilistic Modelling



       Treat data as observations that arise from a generative
       probabilistic process that includes hidden variables
           For documents, the hidden variables reflect the thematic
           structure of the collection
       Infer the hidden structure using posterior inference
           What are the topics that describe this collection?
       Situate new data into the estimated model
           How does this query or new document fit into the estimated
           topic structure?




              Shatabdi Kundu :: 2010EET2553   Prof.Santanu Chaudhury   09 MAY 2011   3of 16
Intuition behind LDA




           Shatabdi Kundu :: 2010EET2553   Prof.Santanu Chaudhury   09 MAY 2011   4of 16
Generative Process




      Cast these intuitions into a generative probabilistic process
      Each document is a random mixture of corpus-wide topics
      Each word is drawn from one of those topics
             Shatabdi Kundu :: 2010EET2553   Prof.Santanu Chaudhury   09 MAY 2011   5of 16
Graphical Models




      Nodes are random variables
      Edges denote possible dependence
      Observed variables are shaded
      Plates denote replicated structure
            Shatabdi Kundu :: 2010EET2553   Prof.Santanu Chaudhury   09 MAY 2011   6of 16
Graphical Models




      Structure of the graph defines the pattern of conditional
      dependence between the ensemble of random variables.
      Eg. this graph corressponds to
                                                    N
                       p(y , x1 ...xN ) = p(y )         p(xn | y )            (1)
                                                  n=1

            Shatabdi Kundu :: 2010EET2553   Prof.Santanu Chaudhury   09 MAY 2011    7of 16
Latent Dirichlet Allocation




     1   Draw each topic βk ∼ Dir(η), for k                 {1,.....,K}
     2   For each document:
           1   Draw topic proportions θd ∼ Dir(α)
           2   For each word:
                 1   Draw Zd,n ∼ Mult(θd )
                 2   Draw Wd,n ∼ Mult(βZd,n )
                 Shatabdi Kundu :: 2010EET2553   Prof.Santanu Chaudhury   09 MAY 2011   8of 16
Latent Dirichlet Allocation




       From a collection of documents, infer
           Per-word topic assignment Zd,n
           Per-document topic proportions θd
           Per-corpus topic distributions βk
       Use posterior expectations to perform the task at hand, e.g
       information retrieval,document similarity, etc.

             Shatabdi Kundu :: 2010EET2553   Prof.Santanu Chaudhury   09 MAY 2011   9of 16
Topic Discovery using Wordnet




   Lexical relations used for finding out the latent topics
        synsets(synonym sets) as basic units
        hyponymy
            a semantic relation between word meanings
            Eg. {maple} is a hyponym of {tree}
       hypernymy
            inverse of hyponym
            Eg.{tree} is a hypernym of {maple}
               Shatabdi Kundu :: 2010EET2553   Prof.Santanu Chaudhury   09 MAY 2011   10of 16
Work Done


     I took a collection of 10 documents that had a total of around
     28K words
     I removed the stop words and rare words along with
     punctuation marks and numbers.
     Then I modeled a 7-topic LDA model with this corpus
     Now I had 7 topics with 5 most highly probable occuring
     words from each topic.
     I then used the lexical relations of Wordnet to identify the
     hidden topics using common parents of all the words in each
     topic.



            Shatabdi Kundu :: 2010EET2553   Prof.Santanu Chaudhury   09 MAY 2011   11of 16
Results after training LDA model




      This model only selects appropriate words within a topic but
      does not name the topic
      Discovering the topic name is done using Wordnet
            Shatabdi Kundu :: 2010EET2553   Prof.Santanu Chaudhury   09 MAY 2011   12of 16
Results after applying to Wordnet




      The above result gives us the hidden topic names of the words
      that comprised the documents.
      This kind of model can be used for identifying topics when
      given only a word.
            Shatabdi Kundu :: 2010EET2553   Prof.Santanu Chaudhury   09 MAY 2011   13of 16
Conclusion and Future Work




      Now we will be working on searching based on topics(context)
      using this model.
      Basically we will be dealing with geo-intent of the queries and
      decide on the topic to which they belong for better retrieval of
      information.




             Shatabdi Kundu :: 2010EET2553   Prof.Santanu Chaudhury   09 MAY 2011   14of 16
References


      Latent Dirichlet allocation. D. Blei, A. Ng, and M. Jordan.
      Journal of Machine Learning Research, 3:993-1022, January
      2003.
      Jun Fu Cai, Wee Sun Lee, Yee Whye Teh. NUS-ML:
      Improving Word Sense Disambiguation Using Topic Features.
      SEMEVAL (2007).
      David M. Blei, Jon D. McAuliffe. Supervised Topic Models.
      NIPS (2007).
      Wordnet. http://www.shiffman.net/teaching/a2z/wordnet




             Shatabdi Kundu :: 2010EET2553   Prof.Santanu Chaudhury   09 MAY 2011   15of 16
Thank You




Shatabdi Kundu :: 2010EET2553   Prof.Santanu Chaudhury   09 MAY 2011   16of 16

Weitere ähnliche Inhalte

Was ist angesagt?

CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal
 
Neural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftNeural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftSebastian Ruder
 
Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...
Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...
Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...IJECEIAES
 
Blei ngjordan2003
Blei ngjordan2003Blei ngjordan2003
Blei ngjordan2003Ajay Ohri
 
Report
ReportReport
Reportbutest
 
EXPERT OPINION AND COHERENCE BASED TOPIC MODELING
EXPERT OPINION AND COHERENCE BASED TOPIC MODELINGEXPERT OPINION AND COHERENCE BASED TOPIC MODELING
EXPERT OPINION AND COHERENCE BASED TOPIC MODELINGijnlc
 
The recognition system of sentential
The recognition system of sententialThe recognition system of sentential
The recognition system of sententialijaia
 
Semantic Annotation of Documents
Semantic Annotation of DocumentsSemantic Annotation of Documents
Semantic Annotation of Documentssubash chandra
 
G04124041046
G04124041046G04124041046
G04124041046IOSR-JEN
 

Was ist angesagt? (10)

CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
 
Neural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftNeural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain Shift
 
Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...
Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...
Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...
 
Blei ngjordan2003
Blei ngjordan2003Blei ngjordan2003
Blei ngjordan2003
 
Report
ReportReport
Report
 
L3 v2
L3 v2L3 v2
L3 v2
 
EXPERT OPINION AND COHERENCE BASED TOPIC MODELING
EXPERT OPINION AND COHERENCE BASED TOPIC MODELINGEXPERT OPINION AND COHERENCE BASED TOPIC MODELING
EXPERT OPINION AND COHERENCE BASED TOPIC MODELING
 
The recognition system of sentential
The recognition system of sententialThe recognition system of sentential
The recognition system of sentential
 
Semantic Annotation of Documents
Semantic Annotation of DocumentsSemantic Annotation of Documents
Semantic Annotation of Documents
 
G04124041046
G04124041046G04124041046
G04124041046
 

Ähnlich wie Minor Project

Publishing Math Lecture Notes as Linked Data
Publishing Math Lecture Notes as Linked DataPublishing Math Lecture Notes as Linked Data
Publishing Math Lecture Notes as Linked DataChristoph Lange
 
Ontology Building and its Application using Hozo
Ontology Building and its Application using HozoOntology Building and its Application using Hozo
Ontology Building and its Application using HozoKouji Kozaki
 
graduate_thesis (1)
graduate_thesis (1)graduate_thesis (1)
graduate_thesis (1)Sihan Chen
 
Document Classification Using KNN with Fuzzy Bags of Word Representation
Document Classification Using KNN with Fuzzy Bags of Word RepresentationDocument Classification Using KNN with Fuzzy Bags of Word Representation
Document Classification Using KNN with Fuzzy Bags of Word Representationsuthi
 
A Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic ModellingA Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic Modellingcsandit
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGcscpconf
 
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataDedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataVrije Universiteit Amsterdam
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain OntologyKeerti Bhogaraju
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksLeonardo Di Donato
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modelingHiroyuki Kuromiya
 
Aletras, Nikolaos and Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
Aletras, Nikolaos  and  Stevenson, Mark (2013) "Evaluating Topic Coherence Us...Aletras, Nikolaos  and  Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
Aletras, Nikolaos and Stevenson, Mark (2013) "Evaluating Topic Coherence Us...pathsproject
 
StarSpace: Embed All The Things!
StarSpace: Embed All The Things!StarSpace: Embed All The Things!
StarSpace: Embed All The Things!☕ Keita Watanabe
 
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...AI Publications
 
Slides ECIR 2016 VODUM
Slides ECIR 2016 VODUMSlides ECIR 2016 VODUM
Slides ECIR 2016 VODUMtthonet
 
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...Jeff Z. Pan
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241Urjit Patel
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsClaudia Wagner
 

Ähnlich wie Minor Project (20)

Topic modelling
Topic modellingTopic modelling
Topic modelling
 
Publishing Math Lecture Notes as Linked Data
Publishing Math Lecture Notes as Linked DataPublishing Math Lecture Notes as Linked Data
Publishing Math Lecture Notes as Linked Data
 
Ontology Building and its Application using Hozo
Ontology Building and its Application using HozoOntology Building and its Application using Hozo
Ontology Building and its Application using Hozo
 
Link Discovery Tutorial Introduction
Link Discovery Tutorial IntroductionLink Discovery Tutorial Introduction
Link Discovery Tutorial Introduction
 
graduate_thesis (1)
graduate_thesis (1)graduate_thesis (1)
graduate_thesis (1)
 
Document Classification Using KNN with Fuzzy Bags of Word Representation
Document Classification Using KNN with Fuzzy Bags of Word RepresentationDocument Classification Using KNN with Fuzzy Bags of Word Representation
Document Classification Using KNN with Fuzzy Bags of Word Representation
 
A Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic ModellingA Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic Modelling
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
 
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataDedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain Ontology
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modeling
 
Aletras, Nikolaos and Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
Aletras, Nikolaos  and  Stevenson, Mark (2013) "Evaluating Topic Coherence Us...Aletras, Nikolaos  and  Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
Aletras, Nikolaos and Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
 
StarSpace: Embed All The Things!
StarSpace: Embed All The Things!StarSpace: Embed All The Things!
StarSpace: Embed All The Things!
 
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
 
Slides ECIR 2016 VODUM
Slides ECIR 2016 VODUMSlides ECIR 2016 VODUM
Slides ECIR 2016 VODUM
 
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic Models
 

Kürzlich hochgeladen

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 

Kürzlich hochgeladen (20)

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 

Minor Project

  • 1. Context Based Search By Shatabdi Kundu (2010EET2553) Computer Technology,M.Tech IIT Delhi Email ID:shatabdikundu@live.com Project Guide: Prof.Santanu Chaudhury Electrical Engineering Department IIT Delhi Email ID:santanuc@ee.iitd.ac.in Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 1of 16
  • 2. Outline Introduction to Topic Models- Probabilistic Modelling Latent Dirichlet Allocation Topic Discovery using Wordnet Work Done Results Conclusion and Future Work References Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 2of 16
  • 3. Probabilistic Modelling Treat data as observations that arise from a generative probabilistic process that includes hidden variables For documents, the hidden variables reflect the thematic structure of the collection Infer the hidden structure using posterior inference What are the topics that describe this collection? Situate new data into the estimated model How does this query or new document fit into the estimated topic structure? Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 3of 16
  • 4. Intuition behind LDA Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 4of 16
  • 5. Generative Process Cast these intuitions into a generative probabilistic process Each document is a random mixture of corpus-wide topics Each word is drawn from one of those topics Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 5of 16
  • 6. Graphical Models Nodes are random variables Edges denote possible dependence Observed variables are shaded Plates denote replicated structure Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 6of 16
  • 7. Graphical Models Structure of the graph defines the pattern of conditional dependence between the ensemble of random variables. Eg. this graph corressponds to N p(y , x1 ...xN ) = p(y ) p(xn | y ) (1) n=1 Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 7of 16
  • 8. Latent Dirichlet Allocation 1 Draw each topic βk ∼ Dir(η), for k {1,.....,K} 2 For each document: 1 Draw topic proportions θd ∼ Dir(α) 2 For each word: 1 Draw Zd,n ∼ Mult(θd ) 2 Draw Wd,n ∼ Mult(βZd,n ) Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 8of 16
  • 9. Latent Dirichlet Allocation From a collection of documents, infer Per-word topic assignment Zd,n Per-document topic proportions θd Per-corpus topic distributions βk Use posterior expectations to perform the task at hand, e.g information retrieval,document similarity, etc. Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 9of 16
  • 10. Topic Discovery using Wordnet Lexical relations used for finding out the latent topics synsets(synonym sets) as basic units hyponymy a semantic relation between word meanings Eg. {maple} is a hyponym of {tree} hypernymy inverse of hyponym Eg.{tree} is a hypernym of {maple} Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 10of 16
  • 11. Work Done I took a collection of 10 documents that had a total of around 28K words I removed the stop words and rare words along with punctuation marks and numbers. Then I modeled a 7-topic LDA model with this corpus Now I had 7 topics with 5 most highly probable occuring words from each topic. I then used the lexical relations of Wordnet to identify the hidden topics using common parents of all the words in each topic. Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 11of 16
  • 12. Results after training LDA model This model only selects appropriate words within a topic but does not name the topic Discovering the topic name is done using Wordnet Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 12of 16
  • 13. Results after applying to Wordnet The above result gives us the hidden topic names of the words that comprised the documents. This kind of model can be used for identifying topics when given only a word. Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 13of 16
  • 14. Conclusion and Future Work Now we will be working on searching based on topics(context) using this model. Basically we will be dealing with geo-intent of the queries and decide on the topic to which they belong for better retrieval of information. Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 14of 16
  • 15. References Latent Dirichlet allocation. D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Jun Fu Cai, Wee Sun Lee, Yee Whye Teh. NUS-ML: Improving Word Sense Disambiguation Using Topic Features. SEMEVAL (2007). David M. Blei, Jon D. McAuliffe. Supervised Topic Models. NIPS (2007). Wordnet. http://www.shiffman.net/teaching/a2z/wordnet Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 15of 16
  • 16. Thank You Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 16of 16