SlideShare ist ein Scribd-Unternehmen logo
1 von 12
A Proposal for
Evaluating
Answer Distillation
from Web Data
Bhaskar Mitra, Grady Simon, Jianfeng
Gao, Nick Craswell & Li Deng
Answer
distillation task
Given query and passage
containing answer, summarize
answer for presentation, on-
screen or read-out.
Answers are query-biased
summaries:
• Single entities or phrases (e.g., “Rome”
for the query “Italy capital”)
• Multi-sentence (e.g., for the query
“how to get a passport”)
• Need not be spans from passage
• Might combine multiple passages
commissioner of the nba
Source passages:
Query:
Answer?
Adam Silver
http://www.kdnuggets.com/2016/05/datasets-over-algorithms.html
Datasets let algorithms shine
Questions are not search queries
Questions are well-formed & curated
Single entity / phrase answers
Multiple-choice answers
Data
E.g., TREC-QA, MCTest,
CBT, WikiQA, SQuAD
Designed for matching short responses, or
Poorly correlate with human judgments, or
Human in the loop (non-repeatable)
Metric
E.g., P/R, BLEU, METEOR
+
Existing QA Datasets
Sample queries from Bing logs
Editorially curated reference answers
Many reference answers per query
Data
Phrasing Aware (pa-) metrics
Modified versions of BLEU / METEOR
Metric+
Our proposal
Towards variance
reduction
Use single reference passage set to
reduce variance from conflicting
information at source
Get many reference answers to
model the natural variance in answer
phrasing
Extend existing metrics to take better
advantage of the large number of
available reference answers
The law requires all children traveling in the front or
rear seat of any car, van or goods vehicle must use the
correct child car seat until they are either 135cm in
height or 12 years old (which ever they reach first).
After this they must use an adult seat belt. There are
very few exceptions.
law for ages for children allowed to sit in front
seatQuery
Passages
Children under the age of 12 and less than 135cm tall
need a child car seat when traveling in the front or the
rear seat of a car.
Distilled answers
Children of any age can travel in the front or the rear
seat of a car. They need a child seat if under the age
of 12.
Children under the age of 12 need a child seat, unless
more than 135cm tall.
…
A child seat is necessary for children under 12.
Otherwise an adult seat belt must be worn.
Generating the dataset
Sample queries
• Randomly sample
from Bing logs
• Remove PII
• Remove navigational,
transactional queries
• Remove queries with
no deterministic
answers (E.g.,
“holiday recipes”)
Retrieve candidate
passages
• Retrieve top-N
candidate passages
per query
• Typically retrieved
from many different
documents
Select minimal
passage set
• Editors select the
minimal but
sufficient passage set
• If multiple passages
are selected then
information across
passages should not
conflict
Curate reference
answers
• Editors curate
minimal but
complete answer for
ach query
• Answers can be
single entity or
phrase, or multi-
sentence passage
Phrasing
Aware Metrics
Score candidate answer based on
average similarity with all
available reference answers
Each reference answer is
importance weighted based on
agreement with other reference
answers
Metrics like BLEU (or METEROR)
can be used as similarity metric
Request For
Comments
We want to make the proposed Answer
Distillation dataset and corresponding metrics
publicly available for academic research
We need YOUR feedback to build the right
evaluation framework
https://gitter.im/ProjectDistillery/Distillery

Weitere ähnliche Inhalte

Ähnlich wie A Proposal for Evaluating Answer Distillation from Web Data

Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Lucidworks
 
Search class
Search classSearch class
Search class
munnisjo
 
Selma Zafar Task Testing
Selma Zafar Task TestingSelma Zafar Task Testing
Selma Zafar Task Testing
ThoughtFarmer
 
Lecture5 Expert Systems And Artificial Intelligence
Lecture5 Expert Systems And Artificial IntelligenceLecture5 Expert Systems And Artificial Intelligence
Lecture5 Expert Systems And Artificial Intelligence
Kodok Ngorex
 
The Reason Behind Semantic SEO: Why does Google Avoid the Word PageRank?
The Reason Behind Semantic SEO: Why does Google Avoid the Word PageRank?The Reason Behind Semantic SEO: Why does Google Avoid the Word PageRank?
The Reason Behind Semantic SEO: Why does Google Avoid the Word PageRank?
Koray Tugberk GUBUR
 

Ähnlich wie A Proposal for Evaluating Answer Distillation from Web Data (20)

Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Search class
Search classSearch class
Search class
 
Selma Zafar Task Testing
Selma Zafar Task TestingSelma Zafar Task Testing
Selma Zafar Task Testing
 
Lecture5 Expert Systems And Artificial Intelligence
Lecture5 Expert Systems And Artificial IntelligenceLecture5 Expert Systems And Artificial Intelligence
Lecture5 Expert Systems And Artificial Intelligence
 
The Reason Behind Semantic SEO: Why does Google Avoid the Word PageRank?
The Reason Behind Semantic SEO: Why does Google Avoid the Word PageRank?The Reason Behind Semantic SEO: Why does Google Avoid the Word PageRank?
The Reason Behind Semantic SEO: Why does Google Avoid the Word PageRank?
 
Child Themes, Starter Themes, and Frameworks.... Oh My!
Child Themes, Starter Themes, and Frameworks.... Oh My!Child Themes, Starter Themes, and Frameworks.... Oh My!
Child Themes, Starter Themes, and Frameworks.... Oh My!
 
presentation.pdf
presentation.pdfpresentation.pdf
presentation.pdf
 
ChatGPT in academic settings H2.de
ChatGPT in academic settings H2.deChatGPT in academic settings H2.de
ChatGPT in academic settings H2.de
 
Text Analytics for Legal work
Text Analytics for Legal workText Analytics for Legal work
Text Analytics for Legal work
 
Data warehousing solution for Department of Internal Medicine, University of ...
Data warehousing solution for Department of Internal Medicine, University of ...Data warehousing solution for Department of Internal Medicine, University of ...
Data warehousing solution for Department of Internal Medicine, University of ...
 
2020 09 24 - CONDG ML.Net
2020 09 24 - CONDG ML.Net2020 09 24 - CONDG ML.Net
2020 09 24 - CONDG ML.Net
 
Introduction to apache spark and machine learning
Introduction to apache spark and machine learningIntroduction to apache spark and machine learning
Introduction to apache spark and machine learning
 
No more Three Tier - A path to a better code for Cloud and Azure
No more Three Tier - A path to a better code for Cloud and AzureNo more Three Tier - A path to a better code for Cloud and Azure
No more Three Tier - A path to a better code for Cloud and Azure
 
DIY: Research on a shoestring budget
DIY: Research on a shoestring budgetDIY: Research on a shoestring budget
DIY: Research on a shoestring budget
 
2020 01 21 Data Platform Geeks - Machine Learning.Net
2020 01 21 Data Platform Geeks - Machine Learning.Net2020 01 21 Data Platform Geeks - Machine Learning.Net
2020 01 21 Data Platform Geeks - Machine Learning.Net
 
Attacat- Turing Festival presentation August 16th 2016
Attacat- Turing Festival presentation August 16th 2016Attacat- Turing Festival presentation August 16th 2016
Attacat- Turing Festival presentation August 16th 2016
 
Cracking the coding interview columbia - march 23 2011
Cracking the coding interview   columbia - march 23 2011Cracking the coding interview   columbia - march 23 2011
Cracking the coding interview columbia - march 23 2011
 
Good++
Good++Good++
Good++
 
Turning Information chaos into reliable data
Turning Information chaos into reliable dataTurning Information chaos into reliable data
Turning Information chaos into reliable data
 

Mehr von Bhaskar Mitra

Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...
Bhaskar Mitra
 
Multisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and RecommendationMultisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and Recommendation
Bhaskar Mitra
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
Bhaskar Mitra
 

Mehr von Bhaskar Mitra (20)

Joint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and RecommendationJoint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and Recommendation
 
What’s next for deep learning for Search?
What’s next for deep learning for Search?What’s next for deep learning for Search?
What’s next for deep learning for Search?
 
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
 
Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...
 
Multisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and RecommendationMultisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and Recommendation
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Neural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progressNeural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progress
 
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning TrackConformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Duet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning TrackDuet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning Track
 
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBenchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Learning to Rank with Neural Networks
Learning to Rank with Neural NetworksLearning to Rank with Neural Networks
Learning to Rank with Neural Networks
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
 
Adversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalAdversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrieval
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

A Proposal for Evaluating Answer Distillation from Web Data

  • 1. A Proposal for Evaluating Answer Distillation from Web Data Bhaskar Mitra, Grady Simon, Jianfeng Gao, Nick Craswell & Li Deng
  • 2. Answer distillation task Given query and passage containing answer, summarize answer for presentation, on- screen or read-out. Answers are query-biased summaries: • Single entities or phrases (e.g., “Rome” for the query “Italy capital”) • Multi-sentence (e.g., for the query “how to get a passport”) • Need not be spans from passage • Might combine multiple passages
  • 3. commissioner of the nba Source passages: Query: Answer? Adam Silver
  • 4.
  • 6. Questions are not search queries Questions are well-formed & curated Single entity / phrase answers Multiple-choice answers Data E.g., TREC-QA, MCTest, CBT, WikiQA, SQuAD Designed for matching short responses, or Poorly correlate with human judgments, or Human in the loop (non-repeatable) Metric E.g., P/R, BLEU, METEOR + Existing QA Datasets
  • 7. Sample queries from Bing logs Editorially curated reference answers Many reference answers per query Data Phrasing Aware (pa-) metrics Modified versions of BLEU / METEOR Metric+ Our proposal
  • 8. Towards variance reduction Use single reference passage set to reduce variance from conflicting information at source Get many reference answers to model the natural variance in answer phrasing Extend existing metrics to take better advantage of the large number of available reference answers The law requires all children traveling in the front or rear seat of any car, van or goods vehicle must use the correct child car seat until they are either 135cm in height or 12 years old (which ever they reach first). After this they must use an adult seat belt. There are very few exceptions. law for ages for children allowed to sit in front seatQuery Passages Children under the age of 12 and less than 135cm tall need a child car seat when traveling in the front or the rear seat of a car. Distilled answers Children of any age can travel in the front or the rear seat of a car. They need a child seat if under the age of 12. Children under the age of 12 need a child seat, unless more than 135cm tall. … A child seat is necessary for children under 12. Otherwise an adult seat belt must be worn.
  • 9. Generating the dataset Sample queries • Randomly sample from Bing logs • Remove PII • Remove navigational, transactional queries • Remove queries with no deterministic answers (E.g., “holiday recipes”) Retrieve candidate passages • Retrieve top-N candidate passages per query • Typically retrieved from many different documents Select minimal passage set • Editors select the minimal but sufficient passage set • If multiple passages are selected then information across passages should not conflict Curate reference answers • Editors curate minimal but complete answer for ach query • Answers can be single entity or phrase, or multi- sentence passage
  • 10. Phrasing Aware Metrics Score candidate answer based on average similarity with all available reference answers Each reference answer is importance weighted based on agreement with other reference answers Metrics like BLEU (or METEROR) can be used as similarity metric
  • 11.
  • 12. Request For Comments We want to make the proposed Answer Distillation dataset and corresponding metrics publicly available for academic research We need YOUR feedback to build the right evaluation framework https://gitter.im/ProjectDistillery/Distillery

Hinweis der Redaktion

  1. I’m a PM: it’s my job to figure out what users want Our analysis mirror’s Google’s: they want answers directly on SERP, as short as possible In emerging interfaces – voice, mobile, this is even more critical Problem: ideal, concise answer often doesn’t appear on the Web Solution: start with passages from the Web, distill the concise answer automatically
  2. Motivate with a particularly challenging example Imagine this isn’t in our knowledge base 
  3. Conclusion – this problem combines machine comprehension with language synthesis
  4. This is a hard problem, it’s a problem we really care about We want to accelerate research in this space Quite possible that the algorithms that can solve this problem have already been invented Deep learning with memory and attention seem like particularly promising But without appropriate data, there’s little hope of applying them properly
  5. Working around imperfect data and imperfect metrics