Reasoning about Quantitiesin Natural Language.
Subhro Roy, Tim Vieira, Dan Roth.
TACL (Transactions of the Association of Computational Linguistics) – Volume 3, Issue 1, pages 1–13.
乾・岡崎研究室 談話研究会 論文紹介にて
3. About six and a half hours later,
Mr. Armstrong opened the landing craft’s hatch.
[About six and a half hours later],
Mr. Armstrong opened the landing craft’s hatch.
3
7. The number of member nations was 80 in 2000,
and then it increased to 95.
The number of adults and children with
HIV/AIDS reached 39.4 million in 2004.
7
8. CERN has now grown to include 20 member
states and enjoys the active participation of many
other countries world-wide.
8
CERN has 20 member states.
9. CERN has now grown to include 20 member
states and enjoys the active participation of many
other countries world-wide.
9
14.
•
•
14
down” we would like to segment together ”nearly
two years after” . We consider a quantity to be
correctly detected only when we have the exact
phrase that we want, otherwise we consider the
segment to be undetected.
Model P% R% F%
Train Test
Time Time
Semi-CRF (SC) 75.6 77.7 76.6 15.8 1.5
C+I (PR) 80.3 79.3 79.8 1.0 1.0
Table 2: 10-fold cross-validation results of segmentation
accuracy and time required for segmentation, the columns for
runtime have been normalized and expressed as ratios
Table 2 describes the segmentation accuracy, as
well as the ratio between the time taken by both
approaches. The bank of classifiers approach gives
slightly better accuracy than the semi-CRF model,
and is also significantly faster.
Task
Entailm
Contradi
No Rela
Table 3:
consistently
quantities ca
15.
•
•
•
15
e increased 10%”, we would like
her “increased 10%”, since this
quantity denotes a rise in value.
nce “Apple restores push email in
two years after Motorola shut it
like to segment together ”nearly
. We consider a quantity to be
only when we have the exact
want, otherwise we consider the
etected.
P% R% F%
Train Test
Time Time
75.6 77.7 76.6 15.8 1.5
80.3 79.3 79.8 1.0 1.0
ross-validation results of segmentation
quired for segmentation, the columns for
malized and expressed as ratios
es the segmentation accuracy, as
between the time taken by both
bank of classifiers approach gives
uracy than the semi-CRF model,
antly faster.
exact match only supports 43.3% of the entailment
decisions. It is also evident that the deeper semantic
analysis using SRL and Coreference improves the
quantitative inference.
Task System P% R% F%
Entailment
Baseline 100.0 43.3 60.5
GOLDSEG 98.5 88.0 92.9
+SEM 97.8 88.6 93.0
PREDSEG 94.9 76.2 84.5
+SEM 95.4 78.3 86.0
Contradiction
Baseline 16.6 48.5 24.8
GOLDSEG 61.6 92.9 74.2
+SEM 64.3 91.5 75.5
PREDSEG 51.9 79.7 62.8
+SEM 52.8 81.1 64.0
No Relation
Baseline 41.8 71.9 52.9
GOLDSEG 81.1 76.7 78.8
+SEM 80.0 78.5 79.3
PREDSEG 54.0 75.4 62.9
+SEM 56.3 72.7 63.5
Table 3: Results of QE; Adding Semantics(+SEM)
consistently improves performance; Only 43.3% of entailing
quantities can be recovered by simple string matching
16.
•
•
•
16
ge, divide its
obtain a new
e of the two
tity with the
., time-stamp
of time.
h )
value triples
contradicts or
( Q )
in Algorithm 3.
5.2 Scope of QE Inference
Our current QE procedure is limited in
several ways. In all cases, we attribute these
limitations to subtle and deeper language
understanding, which we delegate to the application
module that will use our QE procedure as a
subroutine. Consider the following examples:
T : Adam has exactly 100 dollars in the bank.
H1 : Adam has 50 dollars in the bank.
H2 : Adam’s bank balance is 50 dollars.
Here, T implies H1 but not H2. However for both
H1 and H2, QE will infer that “50 dollars” is a
contradiction to sentence T, since it cannot make
the subtle distinction required here.
T : Ten students passed the exam, but six students
failed it.
H : At least eight students failed the exam.
17. •
•
17
., time-stamp
of time.
h )
value triples
contradicts or
Q )
Q do
entails then
= contradicts
module that will use our QE procedure as a
subroutine. Consider the following examples:
T : Adam has exactly 100 dollars in the bank.
H1 : Adam has 50 dollars in the bank.
H2 : Adam’s bank balance is 50 dollars.
Here, T implies H1 but not H2. However for both
H1 and H2, QE will infer that “50 dollars” is a
contradiction to sentence T, since it cannot make
the subtle distinction required here.
T : Ten students passed the exam, but six students
failed it.
H : At least eight students failed the exam.
Here again, QE will only output that T implies
“At least eight students”, despite the second part of
T. QE reasons about the quantities, and there needs
to be an application specific module that understands
which quantity is related to the predicate “failed”.
There also exists limitations regarding inferences
with respect to events that could occur over a period
of time. In “It was raining from 5 pm to 7 pm” one