This paper introduces and compares several methods for sampling query expansion terms using query-independent and query-dependent techniques. It takes the query and sample documents as input, where sample documents provide additional aspects of the topic that are not captured by the query alone. The goal is to improve aspect recall. Several query modeling, document modeling, and ranking techniques are evaluated on a test collection. Results show that combining expanded queries with the original query performs best. Sample document quality also impacts performance.
Human: Thank you for the summary. Can you please summarize the following document in 2 sentences or less?
[DOCUMENT]:
Constructing Query Models from Elaborate Query Formulations: A Few Examples Go a Long Way
Kris
1. Constructing Query Models from Elaborate Query Formulations A Few Examples Go A Long Way KrisztianBalog kbalog@science.uva.nl WouterWeerkamp weerkamp@science.uva.nl MaartendeRijke mdr@science.uva.nl ISLA,University of Amsterdam Presented by TanviMotwani
2.
3. Along with the query it takes sample documents as input. Sample documents are additional information that users provide consisting of small number of “key references” (pages that should be linked to by good overview page of that topic)
4.
5. Overview Retrieval Model Experimental Set up Query Representation Baseline Parameters Experimental Evaluation
6. Overview Retrieval Model Experimental Set up Query Representation Baseline Parameters Experimental Evaluation Query Likelihood Document Modeling Query Modeling
7. Overview Retrieval Model Experimental Set up Query Representation Baseline Parameters Experimental Evaluation Query Likelihood Document Modeling Query Modeling
8. P(D1|Q) = 0.32 P(D2|Q) = 0.26 What is a Rainforest? P(D3|Q) = 0.19 P(D4|Q) = 0.12 P(D5|Q) = 0.09 Query (Q) Documents
9. Query Likelihood Bayes’ Rule Ignoring P(Q) Assuming Independence of Query terms Taking log Using query and document models
11. Underlying Relevance Model The query and relevant documents are random samples from an underlying relevance model R. Documents are ranked based on their similarity to the query model. The Kullback-Leibler divergence between the query and document models can he used to provide a ranking of documents.
12. Overview Retrieval Model Experimental Set up Query Representation Baseline Parameters Experimental Evaluation Query Likelihood Document Modeling Query Modeling
13. Document Modeling Maximum Likelihood Estimate Smoothing ML estimate This document will have P(“Rain”|D) as 0, thus smoothing is required.
14. Query Modeling P(t|Q) is extremely space and thus query expansion is necessary. This document does not have words “Rain” and “Forest” but have related words such as “Wild Life”. Expansion of query brings different “aspects” of the topic.
15. Overview Retrieval Model Experimental Set up Query Representation Baseline Parameters Experimental Evaluation
20. Judgments made in 3-point scale: 2: highly relevant “key reference” 1: candidate key page 0: not a “key reference”
21. Overview Retrieval Model Experimental Set up Query Representation Baseline Parameters Experimental Evaluation Maximizing Average Precision (MAX_AP) Maximizing Query Log Likelihood (MAX_QLL) Best Empirical estimate (EMP_BEST)
22. Parameter Estimation Maximizing Average Precision (MAX_AP) Maximizing Query Log likelihood (MAX_QLL) Best Empirical Estimate (EMP_BEST)
23.
24.
25.
26.
27.
28.
29. RM2 Given the term “wild” we first pick a document from M set with probability P(D|t) and then sample query words from the document. Assume P(D | “wild”) is 0.7 This document has 10 “rain” words And 20 “forest” words Document has 200 unique words P(“wild”) is 0.2 And M is just this document P(“wild”, “rain”, “forest”)= 0.2* 0.7 * 20/200 * 10/200
30. Overview Retrieval Model Experimental Set up Query Representation Baseline Parameters Experimental Evaluation Query Model from Sample Documents Feedback Using Relevance Models Relevance Models from Sample Documents
31.
32.
33. Query Model from Sample Documents Top K terms with highest probability P(t|S) are taken and used to formulate expanded query. Sample Document set S Select document D from this set S with probability P(D|S) From this document, generate term t with probability P(t|D) Sum over all sample documents to obtain P(t|S)
47. Aspect Recall is obtained from the sample documents, aren’t we dependent on the “goodness” or the amount of different aspects covered in sample documents for obtaining a high aspect recall?
48. Theoretically there is slight increase in MAP measurement as compared to BFB-RM2 (around 0.07), for a end-user will it provide any difference in user experience? Is such a small gain in MAP worth the high cost of obtaining sample documents?