SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Simple and Effective Multi-Paragraph
Reading Comprehension
Christopher Clark, Matt Gardner
(ACL’18)
Paper Reading Fest - 2018/8
Nguyen Phuoc Tat Dat
Question answering task
2
• Return a span of the answer from text
• Challenges:
• Understand the natural language
• Knowledge on the domain
Given text Generate answer to questions
Rajpurkar et al., 2016
Paper overview
● Title: Simple and Effective Multi-Paragraph Reading Comprehension (ACL’18)
→ http://aclweb.org/anthology/P18-1078
● Authors: Christopher Clark, Matt Gardner
● Abstract:
○ Current question answering (QA) models cannot scale to document or multi-document input.
○ Propose a method to apply neural paragraph-level QA models to document-level problem.
● Reasons I chose this paper:
○ My personal research interest on Natural Language Understanding, especially on QA system
○ Proposed method in this paper can work well with document retrieval system (search engine),
which bridges the gap between current QA research on single paragraph and practical QA
system.
3
Agenda
● Introduction
● Pipeline method
● Confidence method
● Experiments
● Results
● Conclusion
4
Introduction
5
Paragraph-level
6https://rajpurkar.github.io/SQuAD-explorer/
7
Document-level
...
Two approaches for document-level QA
● Pipeline approaches:
○ Select one paragraph
○ Extract an answer from the paragraph
● Confidence-based methods:
○ Seek for answers and produce confidence scores from multiple paragraphs
○ Return the answer with highest confidence score
8
Pipeline method
9
Paragraph selection
● Single document:
○ TF-IDF cosine distance between paragraph & query
○ IDF is computed using single paragraphs in the document
● Multi-documents:
○ Linear classification with the following features for each paragraph:
■ TF-IDF score as above
■ Whether the paragraph is the first in its document or not
■ How many tokens preceded it
■ Number of question words included in the paragraph
● Ground truth: select paragraphs containing at least one answer span
● Train the classifier on distantly supervised objective on positive paragraphs
10
where:
A is set of tokens that start the answer
pi
: answer start probability predicted by the model for token i
● Some spans of answer may not relate to the
question -> noisy
● Use summed objective function to optimize the
negative log-likelihood of any correct answer span.
● Apply for both start and end token of the answer
span independently.
● The objective for predicting the answer start token:
Noisy labels
11
The model
12Input text Query text
Embedding layer:
● Word embedding: pre-trained
● Character-derived word embedding: learn
Preprocess layer:
● Bi-directional GRU
Attention layer
The model
13Input text Query text
1. Attention between context word i and question word j:
● nq
, nc
: the lengths of the question and context respectively
● hi
, qj
: vector for context word i and question word j respectively
where w1
, w2
, and w3
are learned vectors
2. Compute attended vector ci
for each context token:
3. Compute query-to-context vector qc
4. Concatenate
5. Linear layer with ReLU activations
Variational dropout is applied before all GRUs and
attention mechanisms at rate of 0.2
The model
14Input text Query text
Self-Attention layer
● Residual style self-attention
● Bi-directional GRU
● Only context-to-query attends itself
Prediction layer
● Start score: Bi-directional GRU, then linear layer
● End score: residual branch of Bi-directional GRU
is added to the input, then pass to another
Bi-directional GRU and finally a linear layer
Confidence method
15
Confidence method
● Span confidence score: sum of start and end score of the span
● At test time:
○ Run the model on each paragraph
○ Select the span with highest confidence score
● Experiment with four approaches to train the confidence model
○ Shared-Normalization
○ Merge
○ No-Answer Option
○ Sigmoid
16
Shared-Normalization
● Normalized start and end scores for all paragraphs from the same context with
the same normalized factor
● Produce comparable scores across paragraphs
17
Merge
● Concatenate all paragraphs from the same context
● Add paragraph separator token & a learned embedding before each paragraph
18
No-Answer Option
● For each paragraph, allow model to return “no-answer”
● Objective function:
● Calculate z as “no-answer” probability. Then, the objective will be:
19
where:
● sj
and gj
are start & end scores produced by the model for token j
● a and b are the correct start and end tokens.
where δ is 1 if an answer exists and 0 otherwise.
Sigmoid
● Sigmoid loss objective function
● Start/end probability for each token: sigmoid function to the start/end scores
● Cross entropy loss is used on each individual probability
● Scores are calculated independently → comparable between different
paragraphs
20
Experiments
21
● TriviaQA (Joshi et al., 2017)
○ TriviaQA unfiltered: paired with documents found by completing a web search of the questions
○ TriviaQA wiki: the same dataset but only including Wikipedia articles
○ TriviaQA web: a dataset derived from TriviaQA unfiltered by treating each question-document
pair where the document contains the question answer as an individual training point
● SQuAD (Rajpurkar et al., 2016)
○ A collection of Wikipedia articles and crowdsourced questions
Datasets
22
● GloVe word vectors: 300 dimensional
● On SQuAD: 100-dim for GRUs, 200-dim for the linear layers, batch size 45
● On TriviaQA: 140-dim for GRUs, 280-dim for the linear layers, batch size 60
● Optimizer: Adadelta
● On training: maintain an exponential moving average of the weights with a
decay rate of 0.999.
● On testing: select the most probable answer span of length less than or
equal to 8 for TriviaQA and 17 for SQuAD.
Implementation details
23
Results
24
● Exact match
○ Average number of predictions which exactly match any one of the ground truth answers of the
question.
● Macro-averaged F1-score
○ Treat prediction and ground truth answer as bags of tokens, then compute their F1.
○ Take the maximum F1 over all of the ground truth answers for a given question, and then
average over all of the questions
Evaluation scores
25
Trivia web
26
TriviaQA web & wiki
27
TriviaQA Unfiltered
28
SQuAD
29
Curated TREC
30
Discussion
● Drawback of SQuAD for document-level QA
○ Models trained on SQuAD data perform very poorly in the multi-paragraph setting
○ Reasons:
■ Only paragraph-specific questions are provided
■ All questions are answerable
■ Paragraphs are short
● Shared-norm model performs well even more paragraphs are added
● No-answer and merge approaches are effective, but do not provide confidence
score
● Sigmoid object function reduces paragraph-level performance (Fig4) →
vulnerable to label noise 31
Label 200 random TriviaQA web errors of shared-norm model
Error analysis
32
■ Sources of errors on multi-sentence reading:
1. Connecting multiple statements in the same paragraph
2. Long-range coreference
3. Knowledge background (few)
1. Continue advancing the sentence and paragraph level reading comprehension
2. Adding a mechanism to handle document-level coreferences.
Conclusion
● Proposed techniques:
○ Sampling non-answer-containing paragraphs
○ Shared-norm objective function
○ Paragraph selection
● This work can be applied to build open Question Answering system.
33
Thank you!
34
Happy discussion!

Weitere ähnliche Inhalte

Ähnlich wie Vjai paper reading201808-acl18-simple-and_effective multi-paragraph reading comprehension

Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia VoulibasiISSEL
 
Artificial Intelligence Chapter 9 Negnevitsky
Artificial Intelligence Chapter 9 NegnevitskyArtificial Intelligence Chapter 9 Negnevitsky
Artificial Intelligence Chapter 9 Negnevitskylopanath
 
[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep LearningNAVER D2
 
Naver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNaver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNAVER Engineering
 
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...krisztianbalog
 
Universal job embedding in recommendation (public ver.)
Universal job embedding in recommendation (public ver.)Universal job embedding in recommendation (public ver.)
Universal job embedding in recommendation (public ver.)Marsan Ma
 
Writing clean scientific software Murphy cleancoding
Writing clean scientific software Murphy cleancodingWriting clean scientific software Murphy cleancoding
Writing clean scientific software Murphy cleancodingsaber tabatabaee
 
Software Craftmanship - Cours Polytech
Software Craftmanship - Cours PolytechSoftware Craftmanship - Cours Polytech
Software Craftmanship - Cours Polytechyannick grenzinger
 
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...Thien Q. Tran
 
Chain-Of-Thought Prompting.pptx
Chain-Of-Thought Prompting.pptxChain-Of-Thought Prompting.pptx
Chain-Of-Thought Prompting.pptxatharva553835
 
Pragmatic software testing education - SIGCSE 2019
Pragmatic software testing education - SIGCSE 2019Pragmatic software testing education - SIGCSE 2019
Pragmatic software testing education - SIGCSE 2019Maurício Aniche
 
NTU DBME5028 Week8 Transfer Learning
NTU DBME5028 Week8 Transfer LearningNTU DBME5028 Week8 Transfer Learning
NTU DBME5028 Week8 Transfer LearningSean Yu
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Sparkdatamantra
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learningBig Data Colombia
 
Improving neural question generation using answer separation
Improving neural question generation using answer separationImproving neural question generation using answer separation
Improving neural question generation using answer separationNAVER Engineering
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain
 
A brief introduction to Searn Algorithm
A brief introduction to Searn AlgorithmA brief introduction to Searn Algorithm
A brief introduction to Searn AlgorithmSupun Abeysinghe
 
Hard-Negatives Selection Strategy for Cross-Modal Retrieval
Hard-Negatives Selection Strategy for Cross-Modal RetrievalHard-Negatives Selection Strategy for Cross-Modal Retrieval
Hard-Negatives Selection Strategy for Cross-Modal RetrievalVasileiosMezaris
 
Causal Repair of Learning-Enabled Cyber-physical Systems
Causal Repair of Learning-Enabled Cyber-physical SystemsCausal Repair of Learning-Enabled Cyber-physical Systems
Causal Repair of Learning-Enabled Cyber-physical SystemsIvan Ruchkin
 

Ähnlich wie Vjai paper reading201808-acl18-simple-and_effective multi-paragraph reading comprehension (20)

Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia Voulibasi
 
HW03 (1).pdf
HW03 (1).pdfHW03 (1).pdf
HW03 (1).pdf
 
Artificial Intelligence Chapter 9 Negnevitsky
Artificial Intelligence Chapter 9 NegnevitskyArtificial Intelligence Chapter 9 Negnevitsky
Artificial Intelligence Chapter 9 Negnevitsky
 
[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning
 
Naver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNaver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltc
 
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
 
Universal job embedding in recommendation (public ver.)
Universal job embedding in recommendation (public ver.)Universal job embedding in recommendation (public ver.)
Universal job embedding in recommendation (public ver.)
 
Writing clean scientific software Murphy cleancoding
Writing clean scientific software Murphy cleancodingWriting clean scientific software Murphy cleancoding
Writing clean scientific software Murphy cleancoding
 
Software Craftmanship - Cours Polytech
Software Craftmanship - Cours PolytechSoftware Craftmanship - Cours Polytech
Software Craftmanship - Cours Polytech
 
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
 
Chain-Of-Thought Prompting.pptx
Chain-Of-Thought Prompting.pptxChain-Of-Thought Prompting.pptx
Chain-Of-Thought Prompting.pptx
 
Pragmatic software testing education - SIGCSE 2019
Pragmatic software testing education - SIGCSE 2019Pragmatic software testing education - SIGCSE 2019
Pragmatic software testing education - SIGCSE 2019
 
NTU DBME5028 Week8 Transfer Learning
NTU DBME5028 Week8 Transfer LearningNTU DBME5028 Week8 Transfer Learning
NTU DBME5028 Week8 Transfer Learning
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
 
Improving neural question generation using answer separation
Improving neural question generation using answer separationImproving neural question generation using answer separation
Improving neural question generation using answer separation
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
A brief introduction to Searn Algorithm
A brief introduction to Searn AlgorithmA brief introduction to Searn Algorithm
A brief introduction to Searn Algorithm
 
Hard-Negatives Selection Strategy for Cross-Modal Retrieval
Hard-Negatives Selection Strategy for Cross-Modal RetrievalHard-Negatives Selection Strategy for Cross-Modal Retrieval
Hard-Negatives Selection Strategy for Cross-Modal Retrieval
 
Causal Repair of Learning-Enabled Cyber-physical Systems
Causal Repair of Learning-Enabled Cyber-physical SystemsCausal Repair of Learning-Enabled Cyber-physical Systems
Causal Repair of Learning-Enabled Cyber-physical Systems
 

Kürzlich hochgeladen

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 

Kürzlich hochgeladen (20)

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 

Vjai paper reading201808-acl18-simple-and_effective multi-paragraph reading comprehension

  • 1. Simple and Effective Multi-Paragraph Reading Comprehension Christopher Clark, Matt Gardner (ACL’18) Paper Reading Fest - 2018/8 Nguyen Phuoc Tat Dat
  • 2. Question answering task 2 • Return a span of the answer from text • Challenges: • Understand the natural language • Knowledge on the domain Given text Generate answer to questions Rajpurkar et al., 2016
  • 3. Paper overview ● Title: Simple and Effective Multi-Paragraph Reading Comprehension (ACL’18) → http://aclweb.org/anthology/P18-1078 ● Authors: Christopher Clark, Matt Gardner ● Abstract: ○ Current question answering (QA) models cannot scale to document or multi-document input. ○ Propose a method to apply neural paragraph-level QA models to document-level problem. ● Reasons I chose this paper: ○ My personal research interest on Natural Language Understanding, especially on QA system ○ Proposed method in this paper can work well with document retrieval system (search engine), which bridges the gap between current QA research on single paragraph and practical QA system. 3
  • 4. Agenda ● Introduction ● Pipeline method ● Confidence method ● Experiments ● Results ● Conclusion 4
  • 8. Two approaches for document-level QA ● Pipeline approaches: ○ Select one paragraph ○ Extract an answer from the paragraph ● Confidence-based methods: ○ Seek for answers and produce confidence scores from multiple paragraphs ○ Return the answer with highest confidence score 8
  • 10. Paragraph selection ● Single document: ○ TF-IDF cosine distance between paragraph & query ○ IDF is computed using single paragraphs in the document ● Multi-documents: ○ Linear classification with the following features for each paragraph: ■ TF-IDF score as above ■ Whether the paragraph is the first in its document or not ■ How many tokens preceded it ■ Number of question words included in the paragraph ● Ground truth: select paragraphs containing at least one answer span ● Train the classifier on distantly supervised objective on positive paragraphs 10
  • 11. where: A is set of tokens that start the answer pi : answer start probability predicted by the model for token i ● Some spans of answer may not relate to the question -> noisy ● Use summed objective function to optimize the negative log-likelihood of any correct answer span. ● Apply for both start and end token of the answer span independently. ● The objective for predicting the answer start token: Noisy labels 11
  • 12. The model 12Input text Query text Embedding layer: ● Word embedding: pre-trained ● Character-derived word embedding: learn Preprocess layer: ● Bi-directional GRU
  • 13. Attention layer The model 13Input text Query text 1. Attention between context word i and question word j: ● nq , nc : the lengths of the question and context respectively ● hi , qj : vector for context word i and question word j respectively where w1 , w2 , and w3 are learned vectors 2. Compute attended vector ci for each context token: 3. Compute query-to-context vector qc 4. Concatenate 5. Linear layer with ReLU activations
  • 14. Variational dropout is applied before all GRUs and attention mechanisms at rate of 0.2 The model 14Input text Query text Self-Attention layer ● Residual style self-attention ● Bi-directional GRU ● Only context-to-query attends itself Prediction layer ● Start score: Bi-directional GRU, then linear layer ● End score: residual branch of Bi-directional GRU is added to the input, then pass to another Bi-directional GRU and finally a linear layer
  • 16. Confidence method ● Span confidence score: sum of start and end score of the span ● At test time: ○ Run the model on each paragraph ○ Select the span with highest confidence score ● Experiment with four approaches to train the confidence model ○ Shared-Normalization ○ Merge ○ No-Answer Option ○ Sigmoid 16
  • 17. Shared-Normalization ● Normalized start and end scores for all paragraphs from the same context with the same normalized factor ● Produce comparable scores across paragraphs 17
  • 18. Merge ● Concatenate all paragraphs from the same context ● Add paragraph separator token & a learned embedding before each paragraph 18
  • 19. No-Answer Option ● For each paragraph, allow model to return “no-answer” ● Objective function: ● Calculate z as “no-answer” probability. Then, the objective will be: 19 where: ● sj and gj are start & end scores produced by the model for token j ● a and b are the correct start and end tokens. where δ is 1 if an answer exists and 0 otherwise.
  • 20. Sigmoid ● Sigmoid loss objective function ● Start/end probability for each token: sigmoid function to the start/end scores ● Cross entropy loss is used on each individual probability ● Scores are calculated independently → comparable between different paragraphs 20
  • 22. ● TriviaQA (Joshi et al., 2017) ○ TriviaQA unfiltered: paired with documents found by completing a web search of the questions ○ TriviaQA wiki: the same dataset but only including Wikipedia articles ○ TriviaQA web: a dataset derived from TriviaQA unfiltered by treating each question-document pair where the document contains the question answer as an individual training point ● SQuAD (Rajpurkar et al., 2016) ○ A collection of Wikipedia articles and crowdsourced questions Datasets 22
  • 23. ● GloVe word vectors: 300 dimensional ● On SQuAD: 100-dim for GRUs, 200-dim for the linear layers, batch size 45 ● On TriviaQA: 140-dim for GRUs, 280-dim for the linear layers, batch size 60 ● Optimizer: Adadelta ● On training: maintain an exponential moving average of the weights with a decay rate of 0.999. ● On testing: select the most probable answer span of length less than or equal to 8 for TriviaQA and 17 for SQuAD. Implementation details 23
  • 25. ● Exact match ○ Average number of predictions which exactly match any one of the ground truth answers of the question. ● Macro-averaged F1-score ○ Treat prediction and ground truth answer as bags of tokens, then compute their F1. ○ Take the maximum F1 over all of the ground truth answers for a given question, and then average over all of the questions Evaluation scores 25
  • 27. TriviaQA web & wiki 27
  • 31. Discussion ● Drawback of SQuAD for document-level QA ○ Models trained on SQuAD data perform very poorly in the multi-paragraph setting ○ Reasons: ■ Only paragraph-specific questions are provided ■ All questions are answerable ■ Paragraphs are short ● Shared-norm model performs well even more paragraphs are added ● No-answer and merge approaches are effective, but do not provide confidence score ● Sigmoid object function reduces paragraph-level performance (Fig4) → vulnerable to label noise 31
  • 32. Label 200 random TriviaQA web errors of shared-norm model Error analysis 32 ■ Sources of errors on multi-sentence reading: 1. Connecting multiple statements in the same paragraph 2. Long-range coreference 3. Knowledge background (few) 1. Continue advancing the sentence and paragraph level reading comprehension 2. Adding a mechanism to handle document-level coreferences.
  • 33. Conclusion ● Proposed techniques: ○ Sampling non-answer-containing paragraphs ○ Shared-norm objective function ○ Paragraph selection ● This work can be applied to build open Question Answering system. 33