SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
Weronika Łajewska and Krisztian Balog
University of Stavanger, Norway
Towards Filling the Gap in
Conversational Search: From Passage
Retrieval to Conversational Response
Generation
CIKM’23, Birmingham
This study
● Problem setting: Conversational response generation
○ It extends beyond passage retrieval + summarization
● Goal: snippet-level annotations of relevant passages, to enable
1. the training of response generation models that are able to ground answers
in actual statements
2. the automatic evaluation of the generated responses in terms of
completeness
● Main contributions:
1. Crowdsourcing task design and protocol to collect high-quality annotations
2. A dataset of 1.8k query-passage pairs annotated from the TREC 2020 and
2022 Conversational Assistance track
CAsT-snippets sample
CAsT-snippets sample
The seemingly straightforward task of highlighting relevant
snippets turns out to be not that simple.
Preliminary study
A comparison of different task designs, platforms, and worker pools
● Task designs: paragraph-based vs. sentence-based annotation
● Platforms and workers:
○ Amazon MTurk (regular vs. master workers)
○ Prolific
○ Expert annotators (PhD students)
Evaluation measures
Traditional measures of inter-annotator
agreement are insufficient
● Fleiss’ Kappa and Krippendorff’s Alpha are
measures for categorical annotations that
rely on a binary notion of agreement
● Here: we need to measure the degree to
which snippets selected by different workers
overlap
○ Inter-annotator agreement: Jaccard
similarity (also a less strict variant,
k-Jaccard)
○ Similarity against expert annotators:
“ROUGE-like” variant of precision and recall
Results
Inter-annotator agreement
Task variant Annotators F1
Paragraph-based
MTurk regular 0.36
MTurk master 0.54
Prolific 0.50
Sentence-based
MTurk regular 0.31
MTurk master 0.41
Task variant Annotators Jaccard
Jaccard_k
k = 4 k = 3 k = 2
Paragraph-based
MTurk regular (n=5) 0.02 0.08 0.21 0.48
MTurk master (n=5) 0.18 0.35 0.53 0.73
Prolific (n=5) 0.14 0.27 0.44 0.65
Expert (m=3) 0.25 - - 0.54
Sentence-based
MTurk regular (n=3) 0.35 - - 0.71
MTurk master (n=3) 0.47 - - 0.76
Comparison to expert annotations
Main findings
● Relative ordering: MTurk masters > Prolific > MTurk regular
● Paragraph-level > sentence-level (w.r.t. similarity with expert annotations)
⇒ use MTurk and paragraph-based design for the large-scale data collection
Data collection
Setup
Employ a small group of trained crowd workers, selected through a qualification
task, and create an extended set of guidelines with help of the annotators
Data collection
Performed in daily batches
(1 topic/batch =~46 HITs)
Individual feedback after each
submitted batch
General comments/suggestions on
a common Slack channel
$0.3 per HIT +$2 bonus for
completing within 24h
Qualification task
Task consisted of: a detailed
description of the problem,
examples of correct annotations,
a quiz, and 10 query-passage
pairs to be annotated
20 workers completed/15 passed
Initial guidelines
Discussion
Feedback on qualification task
Extended guidelines
Resulting dataset: CAsT-snippets
371 queries, top 5 passages per query ⇒ 1855 query-passage pairs
(each annotated by 3 crowd workers)
● Data quality
○ Inter-annotator agreement exceeds even that of expert annotators
○ Similarity with expert annotations is on par with MTurk master workers
● Comparison against other datasets
○ More snippets annotated per input text; also, snippets are longer
Dataset Input text
Avg. snippets length
(tokens)
# snippets per
annotation
CAsT-snippets Paragraph 39.6 2.3
SaaC Top 10 passages 23.8 1.5
QuaC Wikipedia article 14.6 1
Challenges identified
Challenges pointed out by the crowd workers that need to be addressed in
conversational response generation:
● Only a partial answer is present
● Temporal considerations
○ Spans may need to be excluded given the time constraints in the query
○ Assessing temporal validity can be challenging based on the paragraph alone
(without larger context)
● Subjectivity of the passages originating from blogs or comments
● Indirect answers that require reasoning and background knowledge
● Determining the appropriate amount of context to include in each span
○ Balancing between being concise and being self-contained
● Determining whether the evidence or additional information is needed or an
entity alone is sufficient as an answer
Summary
● Snippet-level annotations for conversational response generation
(information-seeking queries)
● Several measures to ensure high data quality
○ Preliminary study to compare task variants and crowdsourcing platforms
○ Providing feedback and training to annotators throughout the data
collection process
○ Incentive structure to engage crowd workers over a period of time and avoid
worker fatigue
● Communication with workers also led to various insights regarding
challenges in conversational response generation
Questions?
Extended version on arXiv: https://arxiv.org/abs/2308.08911
Dataset: https://github.com/iai-group/CAsT-snippets
Preliminary study
Dataset: TREC CAsT’20 and ‘22 (top
5 passages according to relevance
score for each query)
Input: query + passage/sentence
Output: snippet-level annotations
in passage
Task
Variant
Annotator Time
#
workers
Acceptance
rate
Cost
Paragraph
MTurk regular 182s 5 50% $0.36
MTurk master 63s 5 90% $0.38
Prolific 154s 5 79% $0.51
Expert 96s 3 - -
Sentence
MTurk regular 977s 3 72% $0.43
MTurk master 305s 3 87% $0.56
Results (large-scale data collection)
Inter-annotator agreement Comparison to expert annotations
Task variant Annotator Jaccard Jaccard_2
Paragraph
-based
MTurk regular (n=5) 0.02 0.48
MTurk master (n=5) 0.18 0.73
Prolific (n=5) 0.14 0.65
Expert (m=3) 0.25 0.54
Large-scale (topics 1,2)
(m=3)
0.38 0.62
Large-scale (all data) (m=3) 0.33 0.61
Sentence
-based
MTurk regular (n=3) 0.35 0.71
MTurk master (n=3) 0.47 0.76
Task variant Annotator F1
Paragraph
-based
MTurk regular 0.36
MTurk master 0.54
Prolific 0.50
Large-scale
(topics 1,2) (m=3)
0.54
Sentence
-based
MTurk regular 0.31
MTurk master 0.41
Amazon MTurk - paragraph-based design
Amazon MTurk - sentence-based design
Prolific
paragraph-based
design

Weitere ähnliche Inhalte

Ähnlich wie Towards Filling the Gap in Conversational Search: From Passage Retrieval to Conversational Response Generation

SelQA: A New Benchmark for Selection-based Question Answering
SelQA: A New Benchmark for Selection-based Question AnsweringSelQA: A New Benchmark for Selection-based Question Answering
SelQA: A New Benchmark for Selection-based Question AnsweringJinho Choi
 
Content Wizard: Concept-Based Recommender System for Instructors of Programmi...
Content Wizard: Concept-Based Recommender System for Instructors of Programmi...Content Wizard: Concept-Based Recommender System for Instructors of Programmi...
Content Wizard: Concept-Based Recommender System for Instructors of Programmi...Hung Chau
 
ML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesVarun Nathan
 
ntcir14centre-overview
ntcir14centre-overviewntcir14centre-overview
ntcir14centre-overviewTetsuya Sakai
 
Financial Question Answering with BERT Language Models
Financial Question Answering with BERT Language ModelsFinancial Question Answering with BERT Language Models
Financial Question Answering with BERT Language ModelsBithiah Yuan
 
AUTOMATIC QUESTION GENERATION USING NATURAL LANGUAGE PROCESSING
AUTOMATIC QUESTION GENERATION USING NATURAL LANGUAGE PROCESSINGAUTOMATIC QUESTION GENERATION USING NATURAL LANGUAGE PROCESSING
AUTOMATIC QUESTION GENERATION USING NATURAL LANGUAGE PROCESSINGIRJET Journal
 
ML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesVarun Nathan
 
Shibani Antonette_Augmenting pedagogic writing practice with CLAD.pdf
Shibani Antonette_Augmenting pedagogic writing practice with CLAD.pdfShibani Antonette_Augmenting pedagogic writing practice with CLAD.pdf
Shibani Antonette_Augmenting pedagogic writing practice with CLAD.pdfShibani22
 
Week 3 Assignment Organizational Needs AssessmentSubmit As.docx
Week 3 Assignment Organizational Needs AssessmentSubmit As.docxWeek 3 Assignment Organizational Needs AssessmentSubmit As.docx
Week 3 Assignment Organizational Needs AssessmentSubmit As.docxendawalling
 
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Sujit Pal
 
Comparison of Techniques for Measuring Research Coverage of Scientific Papers...
Comparison of Techniques for Measuring Research Coverage of Scientific Papers...Comparison of Techniques for Measuring Research Coverage of Scientific Papers...
Comparison of Techniques for Measuring Research Coverage of Scientific Papers...Aravind Sesagiri Raamkumar
 
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...Daniel Davis
 
B2 2006 sizing_benchmarking
B2 2006 sizing_benchmarkingB2 2006 sizing_benchmarking
B2 2006 sizing_benchmarkingSteve Feldman
 
B2 2006 sizing_benchmarking (1)
B2 2006 sizing_benchmarking (1)B2 2006 sizing_benchmarking (1)
B2 2006 sizing_benchmarking (1)Steve Feldman
 
Instant search - A hands-on tutorial
Instant search  - A hands-on tutorialInstant search  - A hands-on tutorial
Instant search - A hands-on tutorialGanesh Venkataraman
 

Ähnlich wie Towards Filling the Gap in Conversational Search: From Passage Retrieval to Conversational Response Generation (20)

SelQA: A New Benchmark for Selection-based Question Answering
SelQA: A New Benchmark for Selection-based Question AnsweringSelQA: A New Benchmark for Selection-based Question Answering
SelQA: A New Benchmark for Selection-based Question Answering
 
Content Wizard: Concept-Based Recommender System for Instructors of Programmi...
Content Wizard: Concept-Based Recommender System for Instructors of Programmi...Content Wizard: Concept-Based Recommender System for Instructors of Programmi...
Content Wizard: Concept-Based Recommender System for Instructors of Programmi...
 
ML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queries
 
Ijcai01 mspc.ppt
Ijcai01 mspc.pptIjcai01 mspc.ppt
Ijcai01 mspc.ppt
 
ntcir14centre-overview
ntcir14centre-overviewntcir14centre-overview
ntcir14centre-overview
 
Financial Question Answering with BERT Language Models
Financial Question Answering with BERT Language ModelsFinancial Question Answering with BERT Language Models
Financial Question Answering with BERT Language Models
 
AUTOMATIC QUESTION GENERATION USING NATURAL LANGUAGE PROCESSING
AUTOMATIC QUESTION GENERATION USING NATURAL LANGUAGE PROCESSINGAUTOMATIC QUESTION GENERATION USING NATURAL LANGUAGE PROCESSING
AUTOMATIC QUESTION GENERATION USING NATURAL LANGUAGE PROCESSING
 
ML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queries
 
Bsa 376 week 5 dq 2
Bsa 376 week 5 dq 2Bsa 376 week 5 dq 2
Bsa 376 week 5 dq 2
 
Bsa 376 week 1 dq 2
Bsa 376 week 1 dq 2Bsa 376 week 1 dq 2
Bsa 376 week 1 dq 2
 
Bsa 376 week 2 dq 2
Bsa 376 week 2 dq 2Bsa 376 week 2 dq 2
Bsa 376 week 2 dq 2
 
Shibani Antonette_Augmenting pedagogic writing practice with CLAD.pdf
Shibani Antonette_Augmenting pedagogic writing practice with CLAD.pdfShibani Antonette_Augmenting pedagogic writing practice with CLAD.pdf
Shibani Antonette_Augmenting pedagogic writing practice with CLAD.pdf
 
Week 3 Assignment Organizational Needs AssessmentSubmit As.docx
Week 3 Assignment Organizational Needs AssessmentSubmit As.docxWeek 3 Assignment Organizational Needs AssessmentSubmit As.docx
Week 3 Assignment Organizational Needs AssessmentSubmit As.docx
 
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
 
Bsa 376 week 4 dq 1
Bsa 376 week 4 dq 1Bsa 376 week 4 dq 1
Bsa 376 week 4 dq 1
 
Comparison of Techniques for Measuring Research Coverage of Scientific Papers...
Comparison of Techniques for Measuring Research Coverage of Scientific Papers...Comparison of Techniques for Measuring Research Coverage of Scientific Papers...
Comparison of Techniques for Measuring Research Coverage of Scientific Papers...
 
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...
 
B2 2006 sizing_benchmarking
B2 2006 sizing_benchmarkingB2 2006 sizing_benchmarking
B2 2006 sizing_benchmarking
 
B2 2006 sizing_benchmarking (1)
B2 2006 sizing_benchmarking (1)B2 2006 sizing_benchmarking (1)
B2 2006 sizing_benchmarking (1)
 
Instant search - A hands-on tutorial
Instant search  - A hands-on tutorialInstant search  - A hands-on tutorial
Instant search - A hands-on tutorial
 

Mehr von krisztianbalog

Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...krisztianbalog
 
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?krisztianbalog
 
Personal Knowledge Graphs
Personal Knowledge GraphsPersonal Knowledge Graphs
Personal Knowledge Graphskrisztianbalog
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligencekrisztianbalog
 
On Entities and Evaluation
On Entities and EvaluationOn Entities and Evaluation
On Entities and Evaluationkrisztianbalog
 
Table Retrieval and Generation
Table Retrieval and GenerationTable Retrieval and Generation
Table Retrieval and Generationkrisztianbalog
 
Entity Search: The Last Decade and the Next
Entity Search: The Last Decade and the NextEntity Search: The Last Decade and the Next
Entity Search: The Last Decade and the Nextkrisztianbalog
 
Overview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search EditionOverview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search Editionkrisztianbalog
 
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF LabOverview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Labkrisztianbalog
 
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Searchkrisztianbalog
 
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)krisztianbalog
 
Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (WSDM 2014 tutorial)Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (WSDM 2014 tutorial)krisztianbalog
 
Time-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation SystemsTime-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation Systemskrisztianbalog
 
Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)krisztianbalog
 
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation RecommendationMulti-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendationkrisztianbalog
 
Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)krisztianbalog
 
Semistructured Data Seach
Semistructured Data SeachSemistructured Data Seach
Semistructured Data Seachkrisztianbalog
 
Collection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity SearchCollection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity Searchkrisztianbalog
 

Mehr von krisztianbalog (19)

Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...
 
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
 
Personal Knowledge Graphs
Personal Knowledge GraphsPersonal Knowledge Graphs
Personal Knowledge Graphs
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligence
 
On Entities and Evaluation
On Entities and EvaluationOn Entities and Evaluation
On Entities and Evaluation
 
Table Retrieval and Generation
Table Retrieval and GenerationTable Retrieval and Generation
Table Retrieval and Generation
 
Entity Search: The Last Decade and the Next
Entity Search: The Last Decade and the NextEntity Search: The Last Decade and the Next
Entity Search: The Last Decade and the Next
 
Overview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search EditionOverview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search Edition
 
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF LabOverview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab
 
Entity Linking
Entity LinkingEntity Linking
Entity Linking
 
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Search
 
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
 
Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (WSDM 2014 tutorial)Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (WSDM 2014 tutorial)
 
Time-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation SystemsTime-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation Systems
 
Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)
 
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation RecommendationMulti-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
 
Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)
 
Semistructured Data Seach
Semistructured Data SeachSemistructured Data Seach
Semistructured Data Seach
 
Collection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity SearchCollection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity Search
 

Kürzlich hochgeladen

Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfWadeK3
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 

Kürzlich hochgeladen (20)

9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 

Towards Filling the Gap in Conversational Search: From Passage Retrieval to Conversational Response Generation

  • 1. Weronika Łajewska and Krisztian Balog University of Stavanger, Norway Towards Filling the Gap in Conversational Search: From Passage Retrieval to Conversational Response Generation CIKM’23, Birmingham
  • 2. This study ● Problem setting: Conversational response generation ○ It extends beyond passage retrieval + summarization ● Goal: snippet-level annotations of relevant passages, to enable 1. the training of response generation models that are able to ground answers in actual statements 2. the automatic evaluation of the generated responses in terms of completeness ● Main contributions: 1. Crowdsourcing task design and protocol to collect high-quality annotations 2. A dataset of 1.8k query-passage pairs annotated from the TREC 2020 and 2022 Conversational Assistance track
  • 4. CAsT-snippets sample The seemingly straightforward task of highlighting relevant snippets turns out to be not that simple.
  • 5. Preliminary study A comparison of different task designs, platforms, and worker pools ● Task designs: paragraph-based vs. sentence-based annotation ● Platforms and workers: ○ Amazon MTurk (regular vs. master workers) ○ Prolific ○ Expert annotators (PhD students)
  • 6. Evaluation measures Traditional measures of inter-annotator agreement are insufficient ● Fleiss’ Kappa and Krippendorff’s Alpha are measures for categorical annotations that rely on a binary notion of agreement ● Here: we need to measure the degree to which snippets selected by different workers overlap ○ Inter-annotator agreement: Jaccard similarity (also a less strict variant, k-Jaccard) ○ Similarity against expert annotators: “ROUGE-like” variant of precision and recall
  • 7. Results Inter-annotator agreement Task variant Annotators F1 Paragraph-based MTurk regular 0.36 MTurk master 0.54 Prolific 0.50 Sentence-based MTurk regular 0.31 MTurk master 0.41 Task variant Annotators Jaccard Jaccard_k k = 4 k = 3 k = 2 Paragraph-based MTurk regular (n=5) 0.02 0.08 0.21 0.48 MTurk master (n=5) 0.18 0.35 0.53 0.73 Prolific (n=5) 0.14 0.27 0.44 0.65 Expert (m=3) 0.25 - - 0.54 Sentence-based MTurk regular (n=3) 0.35 - - 0.71 MTurk master (n=3) 0.47 - - 0.76 Comparison to expert annotations Main findings ● Relative ordering: MTurk masters > Prolific > MTurk regular ● Paragraph-level > sentence-level (w.r.t. similarity with expert annotations) ⇒ use MTurk and paragraph-based design for the large-scale data collection
  • 9. Setup Employ a small group of trained crowd workers, selected through a qualification task, and create an extended set of guidelines with help of the annotators Data collection Performed in daily batches (1 topic/batch =~46 HITs) Individual feedback after each submitted batch General comments/suggestions on a common Slack channel $0.3 per HIT +$2 bonus for completing within 24h Qualification task Task consisted of: a detailed description of the problem, examples of correct annotations, a quiz, and 10 query-passage pairs to be annotated 20 workers completed/15 passed Initial guidelines Discussion Feedback on qualification task Extended guidelines
  • 10. Resulting dataset: CAsT-snippets 371 queries, top 5 passages per query ⇒ 1855 query-passage pairs (each annotated by 3 crowd workers) ● Data quality ○ Inter-annotator agreement exceeds even that of expert annotators ○ Similarity with expert annotations is on par with MTurk master workers ● Comparison against other datasets ○ More snippets annotated per input text; also, snippets are longer Dataset Input text Avg. snippets length (tokens) # snippets per annotation CAsT-snippets Paragraph 39.6 2.3 SaaC Top 10 passages 23.8 1.5 QuaC Wikipedia article 14.6 1
  • 11. Challenges identified Challenges pointed out by the crowd workers that need to be addressed in conversational response generation: ● Only a partial answer is present ● Temporal considerations ○ Spans may need to be excluded given the time constraints in the query ○ Assessing temporal validity can be challenging based on the paragraph alone (without larger context) ● Subjectivity of the passages originating from blogs or comments ● Indirect answers that require reasoning and background knowledge ● Determining the appropriate amount of context to include in each span ○ Balancing between being concise and being self-contained ● Determining whether the evidence or additional information is needed or an entity alone is sufficient as an answer
  • 12. Summary ● Snippet-level annotations for conversational response generation (information-seeking queries) ● Several measures to ensure high data quality ○ Preliminary study to compare task variants and crowdsourcing platforms ○ Providing feedback and training to annotators throughout the data collection process ○ Incentive structure to engage crowd workers over a period of time and avoid worker fatigue ● Communication with workers also led to various insights regarding challenges in conversational response generation
  • 13. Questions? Extended version on arXiv: https://arxiv.org/abs/2308.08911 Dataset: https://github.com/iai-group/CAsT-snippets
  • 14.
  • 15. Preliminary study Dataset: TREC CAsT’20 and ‘22 (top 5 passages according to relevance score for each query) Input: query + passage/sentence Output: snippet-level annotations in passage Task Variant Annotator Time # workers Acceptance rate Cost Paragraph MTurk regular 182s 5 50% $0.36 MTurk master 63s 5 90% $0.38 Prolific 154s 5 79% $0.51 Expert 96s 3 - - Sentence MTurk regular 977s 3 72% $0.43 MTurk master 305s 3 87% $0.56
  • 16. Results (large-scale data collection) Inter-annotator agreement Comparison to expert annotations Task variant Annotator Jaccard Jaccard_2 Paragraph -based MTurk regular (n=5) 0.02 0.48 MTurk master (n=5) 0.18 0.73 Prolific (n=5) 0.14 0.65 Expert (m=3) 0.25 0.54 Large-scale (topics 1,2) (m=3) 0.38 0.62 Large-scale (all data) (m=3) 0.33 0.61 Sentence -based MTurk regular (n=3) 0.35 0.71 MTurk master (n=3) 0.47 0.76 Task variant Annotator F1 Paragraph -based MTurk regular 0.36 MTurk master 0.54 Prolific 0.50 Large-scale (topics 1,2) (m=3) 0.54 Sentence -based MTurk regular 0.31 MTurk master 0.41
  • 17. Amazon MTurk - paragraph-based design
  • 18. Amazon MTurk - sentence-based design