A Proposal for Evaluating Answer Distillation from Web Data

2. Answer distillation task Given query and passage containing answer, summarize answer for presentation, on- screen or read-out. Answers are query-biased summaries: • Single entities or phrases (e.g., “Rome” for the query “Italy capital”) • Multi-sentence (e.g., for the query “how to get a passport”) • Need not be spans from passage • Might combine multiple passages

3. commissioner of the nba Source passages: Query: Answer? Adam Silver

5. http://www.kdnuggets.com/2016/05/datasets-over-algorithms.html Datasets let algorithms shine

6. Questions are not search queries Questions are well-formed & curated Single entity / phrase answers Multiple-choice answers Data E.g., TREC-QA, MCTest, CBT, WikiQA, SQuAD Designed for matching short responses, or Poorly correlate with human judgments, or Human in the loop (non-repeatable) Metric E.g., P/R, BLEU, METEOR + Existing QA Datasets

7. Sample queries from Bing logs Editorially curated reference answers Many reference answers per query Data Phrasing Aware (pa-) metrics Modified versions of BLEU / METEOR Metric+ Our proposal

8. Towards variance reduction Use single reference passage set to reduce variance from conflicting information at source Get many reference answers to model the natural variance in answer phrasing Extend existing metrics to take better advantage of the large number of available reference answers The law requires all children traveling in the front or rear seat of any car, van or goods vehicle must use the correct child car seat until they are either 135cm in height or 12 years old (which ever they reach ﬁrst). After this they must use an adult seat belt. There are very few exceptions. law for ages for children allowed to sit in front seatQuery Passages Children under the age of 12 and less than 135cm tall need a child car seat when traveling in the front or the rear seat of a car. Distilled answers Children of any age can travel in the front or the rear seat of a car. They need a child seat if under the age of 12. Children under the age of 12 need a child seat, unless more than 135cm tall. … A child seat is necessary for children under 12. Otherwise an adult seat belt must be worn.

9. Generating the dataset Sample queries • Randomly sample from Bing logs • Remove PII • Remove navigational, transactional queries • Remove queries with no deterministic answers (E.g., “holiday recipes”) Retrieve candidate passages • Retrieve top-N candidate passages per query • Typically retrieved from many different documents Select minimal passage set • Editors select the minimal but sufficient passage set • If multiple passages are selected then information across passages should not conflict Curate reference answers • Editors curate minimal but complete answer for ach query • Answers can be single entity or phrase, or multi- sentence passage

10. Phrasing Aware Metrics Score candidate answer based on average similarity with all available reference answers Each reference answer is importance weighted based on agreement with other reference answers Metrics like BLEU (or METEROR) can be used as similarity metric

11.

12. Request For Comments We want to make the proposed Answer Distillation dataset and corresponding metrics publicly available for academic research We need YOUR feedback to build the right evaluation framework https://gitter.im/ProjectDistillery/Distillery

Hinweis der Redaktion

I’m a PM: it’s my job to figure out what users want Our analysis mirror’s Google’s: they want answers directly on SERP, as short as possible In emerging interfaces – voice, mobile, this is even more critical Problem: ideal, concise answer often doesn’t appear on the Web Solution: start with passages from the Web, distill the concise answer automatically
Motivate with a particularly challenging example Imagine this isn’t in our knowledge base 
Conclusion – this problem combines machine comprehension with language synthesis
This is a hard problem, it’s a problem we really care about We want to accelerate research in this space Quite possible that the algorithms that can solve this problem have already been invented Deep learning with memory and attention seem like particularly promising But without appropriate data, there’s little hope of applying them properly
Working around imperfect data and imperfect metrics

A Proposal for Evaluating Answer Distillation from Web Data

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie A Proposal for Evaluating Answer Distillation from Web Data

Ähnlich wie A Proposal for Evaluating Answer Distillation from Web Data (20)

Mehr von Bhaskar Mitra

Mehr von Bhaskar Mitra (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

A Proposal for Evaluating Answer Distillation from Web Data

Hinweis der Redaktion