Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Vjai paper reading201808-acl18-simple-and_effective multi-paragraph reading comprehension
1. Simple and Effective Multi-Paragraph
Reading Comprehension
Christopher Clark, Matt Gardner
(ACL’18)
Paper Reading Fest - 2018/8
Nguyen Phuoc Tat Dat
2. Question answering task
2
• Return a span of the answer from text
• Challenges:
• Understand the natural language
• Knowledge on the domain
Given text Generate answer to questions
Rajpurkar et al., 2016
3. Paper overview
● Title: Simple and Effective Multi-Paragraph Reading Comprehension (ACL’18)
→ http://aclweb.org/anthology/P18-1078
● Authors: Christopher Clark, Matt Gardner
● Abstract:
○ Current question answering (QA) models cannot scale to document or multi-document input.
○ Propose a method to apply neural paragraph-level QA models to document-level problem.
● Reasons I chose this paper:
○ My personal research interest on Natural Language Understanding, especially on QA system
○ Proposed method in this paper can work well with document retrieval system (search engine),
which bridges the gap between current QA research on single paragraph and practical QA
system.
3
8. Two approaches for document-level QA
● Pipeline approaches:
○ Select one paragraph
○ Extract an answer from the paragraph
● Confidence-based methods:
○ Seek for answers and produce confidence scores from multiple paragraphs
○ Return the answer with highest confidence score
8
10. Paragraph selection
● Single document:
○ TF-IDF cosine distance between paragraph & query
○ IDF is computed using single paragraphs in the document
● Multi-documents:
○ Linear classification with the following features for each paragraph:
■ TF-IDF score as above
■ Whether the paragraph is the first in its document or not
■ How many tokens preceded it
■ Number of question words included in the paragraph
● Ground truth: select paragraphs containing at least one answer span
● Train the classifier on distantly supervised objective on positive paragraphs
10
11. where:
A is set of tokens that start the answer
pi
: answer start probability predicted by the model for token i
● Some spans of answer may not relate to the
question -> noisy
● Use summed objective function to optimize the
negative log-likelihood of any correct answer span.
● Apply for both start and end token of the answer
span independently.
● The objective for predicting the answer start token:
Noisy labels
11
12. The model
12Input text Query text
Embedding layer:
● Word embedding: pre-trained
● Character-derived word embedding: learn
Preprocess layer:
● Bi-directional GRU
13. Attention layer
The model
13Input text Query text
1. Attention between context word i and question word j:
● nq
, nc
: the lengths of the question and context respectively
● hi
, qj
: vector for context word i and question word j respectively
where w1
, w2
, and w3
are learned vectors
2. Compute attended vector ci
for each context token:
3. Compute query-to-context vector qc
4. Concatenate
5. Linear layer with ReLU activations
14. Variational dropout is applied before all GRUs and
attention mechanisms at rate of 0.2
The model
14Input text Query text
Self-Attention layer
● Residual style self-attention
● Bi-directional GRU
● Only context-to-query attends itself
Prediction layer
● Start score: Bi-directional GRU, then linear layer
● End score: residual branch of Bi-directional GRU
is added to the input, then pass to another
Bi-directional GRU and finally a linear layer
16. Confidence method
● Span confidence score: sum of start and end score of the span
● At test time:
○ Run the model on each paragraph
○ Select the span with highest confidence score
● Experiment with four approaches to train the confidence model
○ Shared-Normalization
○ Merge
○ No-Answer Option
○ Sigmoid
16
17. Shared-Normalization
● Normalized start and end scores for all paragraphs from the same context with
the same normalized factor
● Produce comparable scores across paragraphs
17
18. Merge
● Concatenate all paragraphs from the same context
● Add paragraph separator token & a learned embedding before each paragraph
18
19. No-Answer Option
● For each paragraph, allow model to return “no-answer”
● Objective function:
● Calculate z as “no-answer” probability. Then, the objective will be:
19
where:
● sj
and gj
are start & end scores produced by the model for token j
● a and b are the correct start and end tokens.
where δ is 1 if an answer exists and 0 otherwise.
20. Sigmoid
● Sigmoid loss objective function
● Start/end probability for each token: sigmoid function to the start/end scores
● Cross entropy loss is used on each individual probability
● Scores are calculated independently → comparable between different
paragraphs
20
22. ● TriviaQA (Joshi et al., 2017)
○ TriviaQA unfiltered: paired with documents found by completing a web search of the questions
○ TriviaQA wiki: the same dataset but only including Wikipedia articles
○ TriviaQA web: a dataset derived from TriviaQA unfiltered by treating each question-document
pair where the document contains the question answer as an individual training point
● SQuAD (Rajpurkar et al., 2016)
○ A collection of Wikipedia articles and crowdsourced questions
Datasets
22
23. ● GloVe word vectors: 300 dimensional
● On SQuAD: 100-dim for GRUs, 200-dim for the linear layers, batch size 45
● On TriviaQA: 140-dim for GRUs, 280-dim for the linear layers, batch size 60
● Optimizer: Adadelta
● On training: maintain an exponential moving average of the weights with a
decay rate of 0.999.
● On testing: select the most probable answer span of length less than or
equal to 8 for TriviaQA and 17 for SQuAD.
Implementation details
23
25. ● Exact match
○ Average number of predictions which exactly match any one of the ground truth answers of the
question.
● Macro-averaged F1-score
○ Treat prediction and ground truth answer as bags of tokens, then compute their F1.
○ Take the maximum F1 over all of the ground truth answers for a given question, and then
average over all of the questions
Evaluation scores
25
31. Discussion
● Drawback of SQuAD for document-level QA
○ Models trained on SQuAD data perform very poorly in the multi-paragraph setting
○ Reasons:
■ Only paragraph-specific questions are provided
■ All questions are answerable
■ Paragraphs are short
● Shared-norm model performs well even more paragraphs are added
● No-answer and merge approaches are effective, but do not provide confidence
score
● Sigmoid object function reduces paragraph-level performance (Fig4) →
vulnerable to label noise 31
32. Label 200 random TriviaQA web errors of shared-norm model
Error analysis
32
■ Sources of errors on multi-sentence reading:
1. Connecting multiple statements in the same paragraph
2. Long-range coreference
3. Knowledge background (few)
1. Continue advancing the sentence and paragraph level reading comprehension
2. Adding a mechanism to handle document-level coreferences.
33. Conclusion
● Proposed techniques:
○ Sampling non-answer-containing paragraphs
○ Shared-norm objective function
○ Paragraph selection
● This work can be applied to build open Question Answering system.
33