SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Constructing Query Models from Elaborate Query Formulations A Few Examples Go A Long Way KrisztianBalog kbalog@science.uva.nl WouterWeerkamp weerkamp@science.uva.nl MaartendeRijke mdr@science.uva.nl ISLA,University of Amsterdam Presented by TanviMotwani
AIM ,[object Object]
Along with the query it takes sample documents as input. Sample documents are additional information that users provide consisting of small number of “key references” (pages that should be linked to by good overview page of that topic)
Aim is to increase “aspect recall” by attempting to uncover aspects of information which are not captured by the query but by the sample documents.,[object Object]
Overview Retrieval Model Experimental Set up Query Representation Baseline Parameters Experimental Evaluation
Overview Retrieval Model Experimental Set up Query Representation Baseline Parameters Experimental Evaluation Query Likelihood Document Modeling Query Modeling
Overview Retrieval Model Experimental Set up Query Representation Baseline Parameters Experimental Evaluation Query Likelihood Document Modeling Query Modeling
P(D1|Q) = 0.32 P(D2|Q) = 0.26 What is a Rainforest? P(D3|Q) = 0.19 P(D4|Q) = 0.12 P(D5|Q) = 0.09 Query (Q) Documents
Query Likelihood Bayes’ Rule Ignoring P(Q) Assuming Independence of Query terms Taking log Using query and document models
Relevance Model What is a Rainforest? Query (Q) Documents
Underlying Relevance Model The query and relevant documents are random samples from an underlying relevance model R. Documents are ranked based on their similarity to the query model. The Kullback-Leibler divergence between the query and document models can he used to provide a ranking of documents.
Overview Retrieval Model Experimental Set up Query Representation Baseline Parameters Experimental Evaluation Query Likelihood Document Modeling Query Modeling
Document Modeling Maximum Likelihood Estimate Smoothing ML estimate This document will have P(“Rain”|D) as 0, thus smoothing is required.
Query Modeling P(t|Q) is extremely space and thus query expansion is necessary. This document does not have words “Rain” and “Forest” but have related words such as “Wild Life”. Expansion of query brings different “aspects” of the topic.
Overview Retrieval Model Experimental Set up Query Representation Baseline Parameters Experimental Evaluation
Experimental Setup ,[object Object]
 370,715 documents
 Size of 4.2 gigabytes
 50 topics
 Judgments made in 3-point scale: 2: highly relevant “key reference” 1: candidate key page 0: not a “key reference”
Overview Retrieval Model Experimental Set up Query Representation Baseline Parameters Experimental Evaluation Maximizing Average Precision (MAX_AP) Maximizing Query Log Likelihood (MAX_QLL) Best Empirical estimate (EMP_BEST)
Parameter Estimation Maximizing Average Precision (MAX_AP) Maximizing Query Log likelihood (MAX_QLL) Best Empirical Estimate (EMP_BEST)
Evaluation ,[object Object]
MAX_QLL performs slightly better than MAX_AP,[object Object]
Query Representation ,[object Object]
 This prevents the topic to shift away from the original user information need.,[object Object]
Feedback Using Relevance Models Joint Probability of observing t together with query terms q1,q2…qk divided by joint probability of the query terms. ,[object Object]
 RM2 : Sampling of q1,q2…qk are dependent on t but independent of each other.,[object Object]
RM2 Given the term “wild” we first pick a document from M set with probability P(D|t) and then sample query words from the document. Assume P(D | “wild”) is 0.7 This document has 10 “rain” words And 20 “forest” words Document has 200 unique words P(“wild”) is 0.2 And M is just this document P(“wild”, “rain”, “forest”)= 0.2* 0.7 * 20/200 * 10/200
Overview Retrieval Model Experimental Set up Query Representation Baseline Parameters Experimental Evaluation Query Model from Sample Documents Feedback Using Relevance Models Relevance Models from Sample Documents
Relevance Models from Sample Documents ,[object Object]
 For RM1 assume P(D) = 1/|S|.,[object Object]
Query Model from Sample Documents Top K terms with highest probability P(t|S) are taken and used to formulate expanded query. Sample Document set S Select document D from this set S with probability P(D|S) From this document, generate term t with probability P(t|D) Sum over all sample documents to obtain P(t|S)
Query Model from Sample Documents ,[object Object]
 Smoothed Estimate of a term (EX-QM-SM)
 Ranking Function proposed by Ponte and Croft for unsupervised query expansion (EX-QM-EXP),[object Object]
 Query-biased:
 Inverse query-biased:      ,[object Object]
Expanded Query Models

Weitere ähnliche Inhalte

Was ist angesagt?

Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Dmitry Grapov
 
Query based summarization
Query based summarizationQuery based summarization
Query based summarizationdamom77
 
Applications: Prediction
Applications: PredictionApplications: Prediction
Applications: PredictionNBER
 
A Collaborative Document Ranking Model for a Multi-Faceted Search
A Collaborative Document Ranking Model for a Multi-Faceted SearchA Collaborative Document Ranking Model for a Multi-Faceted Search
A Collaborative Document Ranking Model for a Multi-Faceted SearchUPMC - Sorbonne Universities
 
A multi criteria evaluation of environmental databases using hasse
A multi criteria evaluation of environmental databases using hasseA multi criteria evaluation of environmental databases using hasse
A multi criteria evaluation of environmental databases using hassebalamurugan.k Kalibalamurugan
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationDmitry Grapov
 
Strategies for Metabolomics Data Analysis
Strategies for Metabolomics Data AnalysisStrategies for Metabolomics Data Analysis
Strategies for Metabolomics Data AnalysisDmitry Grapov
 
Semantic Annotation of Documents
Semantic Annotation of DocumentsSemantic Annotation of Documents
Semantic Annotation of Documentssubash chandra
 
IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents Sharvil Katariya
 
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATA
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATAEFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATA
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATAcsandit
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrievalssbd6985
 
Comparative study of classification algorithm for text based categorization
Comparative study of classification algorithm for text based categorizationComparative study of classification algorithm for text based categorization
Comparative study of classification algorithm for text based categorizationeSAT Journals
 
IRJET- Survey of Feature Selection based on Ant Colony
IRJET- Survey of Feature Selection based on Ant ColonyIRJET- Survey of Feature Selection based on Ant Colony
IRJET- Survey of Feature Selection based on Ant ColonyIRJET Journal
 
Information retrieval 6 ir models
Information retrieval 6 ir modelsInformation retrieval 6 ir models
Information retrieval 6 ir modelsVaibhav Khanna
 
PhD Defense Slides
PhD Defense SlidesPhD Defense Slides
PhD Defense SlidesDebasmit Das
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationDmitry Grapov
 
A Research Literature Search Engine With Abbreviation Recognition
A Research Literature Search Engine With Abbreviation RecognitionA Research Literature Search Engine With Abbreviation Recognition
A Research Literature Search Engine With Abbreviation RecognitionHector Lin
 

Was ist angesagt? (19)

Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
 
Query based summarization
Query based summarizationQuery based summarization
Query based summarization
 
Context Based Citation Recommendation
Context Based Citation RecommendationContext Based Citation Recommendation
Context Based Citation Recommendation
 
Applications: Prediction
Applications: PredictionApplications: Prediction
Applications: Prediction
 
A Collaborative Document Ranking Model for a Multi-Faceted Search
A Collaborative Document Ranking Model for a Multi-Faceted SearchA Collaborative Document Ranking Model for a Multi-Faceted Search
A Collaborative Document Ranking Model for a Multi-Faceted Search
 
A multi criteria evaluation of environmental databases using hasse
A multi criteria evaluation of environmental databases using hasseA multi criteria evaluation of environmental databases using hasse
A multi criteria evaluation of environmental databases using hasse
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
 
Strategies for Metabolomics Data Analysis
Strategies for Metabolomics Data AnalysisStrategies for Metabolomics Data Analysis
Strategies for Metabolomics Data Analysis
 
Semantic Annotation of Documents
Semantic Annotation of DocumentsSemantic Annotation of Documents
Semantic Annotation of Documents
 
IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents
 
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATA
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATAEFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATA
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATA
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
 
Comparative study of classification algorithm for text based categorization
Comparative study of classification algorithm for text based categorizationComparative study of classification algorithm for text based categorization
Comparative study of classification algorithm for text based categorization
 
IRJET- Survey of Feature Selection based on Ant Colony
IRJET- Survey of Feature Selection based on Ant ColonyIRJET- Survey of Feature Selection based on Ant Colony
IRJET- Survey of Feature Selection based on Ant Colony
 
Information retrieval 6 ir models
Information retrieval 6 ir modelsInformation retrieval 6 ir models
Information retrieval 6 ir models
 
PhD Defense Slides
PhD Defense SlidesPhD Defense Slides
PhD Defense Slides
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and Visualization
 
A Research Literature Search Engine With Abbreviation Recognition
A Research Literature Search Engine With Abbreviation RecognitionA Research Literature Search Engine With Abbreviation Recognition
A Research Literature Search Engine With Abbreviation Recognition
 

Ähnlich wie Constructing Query Models from Sample Documents

Search Engines
Search EnginesSearch Engines
Search Enginesbutest
 
Paper presentations: UK e-science AHM meeting, 2005
Paper presentations: UK e-science AHM meeting, 2005Paper presentations: UK e-science AHM meeting, 2005
Paper presentations: UK e-science AHM meeting, 2005Paolo Missier
 
Information retrival system and PageRank algorithm
Information retrival system and PageRank algorithmInformation retrival system and PageRank algorithm
Information retrival system and PageRank algorithmRupali Bhatnagar
 
The comparative study of information retrieval models used in search engines
The comparative study of information retrieval models used in search enginesThe comparative study of information retrieval models used in search engines
The comparative study of information retrieval models used in search enginesfawad khan
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use CasesCarole Goble
 
Document ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspaceDocument ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspacePrakash Dubey
 
Symyx Notebook by Accelrys and the Enterprise R&D Architecture
Symyx Notebook by Accelrys and the Enterprise R&D ArchitectureSymyx Notebook by Accelrys and the Enterprise R&D Architecture
Symyx Notebook by Accelrys and the Enterprise R&D ArchitectureBIOVIA
 
Extraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity MiningExtraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity Miningiosrjce
 
G04124041046
G04124041046G04124041046
G04124041046IOSR-JEN
 
Searching Repositories of Web Application Models
Searching Repositories of Web Application ModelsSearching Repositories of Web Application Models
Searching Repositories of Web Application ModelsMarco Brambilla
 
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesA Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesBesnik Fetahu
 
Synthese Recommender System
Synthese Recommender SystemSynthese Recommender System
Synthese Recommender SystemAndre Vellino
 
Advantages of Query Biased Summaries in Information Retrieval
Advantages of Query Biased Summaries in Information RetrievalAdvantages of Query Biased Summaries in Information Retrieval
Advantages of Query Biased Summaries in Information RetrievalOnur Yılmaz
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data ScientistsRichard Garris
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsChirag Gupta
 
ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
ABSTAT: Ontology-driven Linked Data Summaries with Pattern MinimalizationABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
ABSTAT: Ontology-driven Linked Data Summaries with Pattern MinimalizationBlerina Spahiu
 
Provinance in scientific workflows in e science
Provinance in scientific workflows in e scienceProvinance in scientific workflows in e science
Provinance in scientific workflows in e sciencebdemchak
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
 

Ähnlich wie Constructing Query Models from Sample Documents (20)

Search Engines
Search EnginesSearch Engines
Search Engines
 
Paper presentations: UK e-science AHM meeting, 2005
Paper presentations: UK e-science AHM meeting, 2005Paper presentations: UK e-science AHM meeting, 2005
Paper presentations: UK e-science AHM meeting, 2005
 
Information retrival system and PageRank algorithm
Information retrival system and PageRank algorithmInformation retrival system and PageRank algorithm
Information retrival system and PageRank algorithm
 
The comparative study of information retrieval models used in search engines
The comparative study of information retrieval models used in search enginesThe comparative study of information retrieval models used in search engines
The comparative study of information retrieval models used in search engines
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
 
Document ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspaceDocument ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspace
 
Symyx Notebook by Accelrys and the Enterprise R&D Architecture
Symyx Notebook by Accelrys and the Enterprise R&D ArchitectureSymyx Notebook by Accelrys and the Enterprise R&D Architecture
Symyx Notebook by Accelrys and the Enterprise R&D Architecture
 
Extraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity MiningExtraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity Mining
 
E017252831
E017252831E017252831
E017252831
 
G04124041046
G04124041046G04124041046
G04124041046
 
Searching Repositories of Web Application Models
Searching Repositories of Web Application ModelsSearching Repositories of Web Application Models
Searching Repositories of Web Application Models
 
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesA Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
 
Synthese Recommender System
Synthese Recommender SystemSynthese Recommender System
Synthese Recommender System
 
Advantages of Query Biased Summaries in Information Retrieval
Advantages of Query Biased Summaries in Information RetrievalAdvantages of Query Biased Summaries in Information Retrieval
Advantages of Query Biased Summaries in Information Retrieval
 
intro.ppt
intro.pptintro.ppt
intro.ppt
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional Experts
 
ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
ABSTAT: Ontology-driven Linked Data Summaries with Pattern MinimalizationABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
 
Provinance in scientific workflows in e science
Provinance in scientific workflows in e scienceProvinance in scientific workflows in e science
Provinance in scientific workflows in e science
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
 

Kürzlich hochgeladen

Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEaurabinda banchhor
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxJanEmmanBrigoli
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxRosabel UA
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsRommel Regala
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 

Kürzlich hochgeladen (20)

Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSE
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptx
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World Politics
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 

Constructing Query Models from Sample Documents

  • 1. Constructing Query Models from Elaborate Query Formulations A Few Examples Go A Long Way KrisztianBalog kbalog@science.uva.nl WouterWeerkamp weerkamp@science.uva.nl MaartendeRijke mdr@science.uva.nl ISLA,University of Amsterdam Presented by TanviMotwani
  • 2.
  • 3. Along with the query it takes sample documents as input. Sample documents are additional information that users provide consisting of small number of “key references” (pages that should be linked to by good overview page of that topic)
  • 4.
  • 5. Overview Retrieval Model Experimental Set up Query Representation Baseline Parameters Experimental Evaluation
  • 6. Overview Retrieval Model Experimental Set up Query Representation Baseline Parameters Experimental Evaluation Query Likelihood Document Modeling Query Modeling
  • 7. Overview Retrieval Model Experimental Set up Query Representation Baseline Parameters Experimental Evaluation Query Likelihood Document Modeling Query Modeling
  • 8. P(D1|Q) = 0.32 P(D2|Q) = 0.26 What is a Rainforest? P(D3|Q) = 0.19 P(D4|Q) = 0.12 P(D5|Q) = 0.09 Query (Q) Documents
  • 9. Query Likelihood Bayes’ Rule Ignoring P(Q) Assuming Independence of Query terms Taking log Using query and document models
  • 10. Relevance Model What is a Rainforest? Query (Q) Documents
  • 11. Underlying Relevance Model The query and relevant documents are random samples from an underlying relevance model R. Documents are ranked based on their similarity to the query model. The Kullback-Leibler divergence between the query and document models can he used to provide a ranking of documents.
  • 12. Overview Retrieval Model Experimental Set up Query Representation Baseline Parameters Experimental Evaluation Query Likelihood Document Modeling Query Modeling
  • 13. Document Modeling Maximum Likelihood Estimate Smoothing ML estimate This document will have P(“Rain”|D) as 0, thus smoothing is required.
  • 14. Query Modeling P(t|Q) is extremely space and thus query expansion is necessary. This document does not have words “Rain” and “Forest” but have related words such as “Wild Life”. Expansion of query brings different “aspects” of the topic.
  • 15. Overview Retrieval Model Experimental Set up Query Representation Baseline Parameters Experimental Evaluation
  • 16.
  • 18. Size of 4.2 gigabytes
  • 20. Judgments made in 3-point scale: 2: highly relevant “key reference” 1: candidate key page 0: not a “key reference”
  • 21. Overview Retrieval Model Experimental Set up Query Representation Baseline Parameters Experimental Evaluation Maximizing Average Precision (MAX_AP) Maximizing Query Log Likelihood (MAX_QLL) Best Empirical estimate (EMP_BEST)
  • 22. Parameter Estimation Maximizing Average Precision (MAX_AP) Maximizing Query Log likelihood (MAX_QLL) Best Empirical Estimate (EMP_BEST)
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29. RM2 Given the term “wild” we first pick a document from M set with probability P(D|t) and then sample query words from the document. Assume P(D | “wild”) is 0.7 This document has 10 “rain” words And 20 “forest” words Document has 200 unique words P(“wild”) is 0.2 And M is just this document P(“wild”, “rain”, “forest”)= 0.2* 0.7 * 20/200 * 10/200
  • 30. Overview Retrieval Model Experimental Set up Query Representation Baseline Parameters Experimental Evaluation Query Model from Sample Documents Feedback Using Relevance Models Relevance Models from Sample Documents
  • 31.
  • 32.
  • 33. Query Model from Sample Documents Top K terms with highest probability P(t|S) are taken and used to formulate expanded query. Sample Document set S Select document D from this set S with probability P(D|S) From this document, generate term t with probability P(t|D) Sum over all sample documents to obtain P(t|S)
  • 34.
  • 35. Smoothed Estimate of a term (EX-QM-SM)
  • 36.
  • 38.
  • 45.
  • 46.
  • 47. Aspect Recall is obtained from the sample documents, aren’t we dependent on the “goodness” or the amount of different aspects covered in sample documents for obtaining a high aspect recall?
  • 48. Theoretically there is slight increase in MAP measurement as compared to BFB-RM2 (around 0.07), for a end-user will it provide any difference in user experience? Is such a small gain in MAP worth the high cost of obtaining sample documents?