Abductive commonsense reasoning

Abductive Commonsense
Reasoning
San Kim
2020.07.03
Chandra Bhagavatula(1), Ronan Le Bras(1), Chaitanya Malaviya(1), Keisuke Sakaguchi(1),
Ari Holtzman(1), Hannah Rashkin(1), Doug Downey(1), Scott Wen-tau Yih(2), Yejin Choi(1,3)
1. Allen Institute for AI
2. Facebook AI
3. Paul G. Allen School of Computer Scient & Engineering

Contributions
1. Abductive Commonsense Reasoning Challenges
1. Abductive NLI (𝛼𝑁𝐿𝐼)
2. Abductive NLG (𝛼𝑁𝐿𝐺)
2. ART: Dataset for Abductive Reasoning in Narrative Text
1. 20K Narrative Context
2. 200K Hypotheses
1. Experiments & Key Insights

Abductive Commonsense Reasoning
𝜶𝑵𝑳𝑰 is formulated as multiple choice problems consisting of a pair of
observations as context and a pair of hypothesis choices.
𝑂1: The observation at time 𝑡1
𝑂2: The observation at time 𝑡2 > 𝑡1.
ℎ+
: A plausible hypothesis that explains the two observations 𝑂1 and 𝑂2.
ℎ−: An implausible (or less plausible) hypothesis for observations 𝑂1 and 𝑂2.
Abductive Natural Language Generation
𝜶𝑵𝑳𝑮 is the task of generating a valid hypothesis ℎ+ given the two
observation 𝑂1 and 𝑂2. Formally, the task requires to maximize 𝑃 ℎ+
𝑂1, 𝑂2 .
Abductive Natural Language Inference

Abductive Reasoning
Adopted from [1]

A Probabilistic Framework for 𝛼𝑁𝐿𝐼
The 𝛼𝑁𝐿𝐼 task is to select the hypothesis ℎ∗
that is most probable given the
observations.
ℎ∗
= arg max
ℎ 𝑖
𝑃 𝐻 = ℎ𝑖
𝑂1, 𝑂2
Rewriting the objective using Bayes Rule conditioned on 𝑂1, we have:
𝑃 ℎ𝑖 𝑂1, 𝑂2 ∝ 𝑃 𝑂2 𝐻 𝑖, 𝑂1 𝑃 ℎ𝑖 𝑂1
Illustration of the graphical models described in the probabilistic framework.
Adopted from [1]

(a) Hypothesis Only
• the strong assumption: the hypothesis is entirely independent of both observations, i.e.
(𝐻 ⊥ 𝑂1, 𝑂2).
• Maximize the marginal 𝑷 𝑯 .
(b, c) First (or Second) Observation Only
• Weaker assumption: the hypothesis depends on only one of the first 𝑂1 or second 𝑂2
observation.
• Maximize the conditional probability 𝑷 𝑯 𝑶 𝟏 or 𝑷 𝑶 𝟐 𝑯 .
Adopted from [1]

(d) Linear Chain
• Uses both observations, but consider each observation’s influence on the hypothesis
independently.
• The model assumes that the three variables < 𝑶 𝟏, 𝑯, 𝑶 𝟐 > form a linear Markov chain.
• The second observation is conditionally independent of the first, given the hypothesis (i.e.
𝑂1 ⊥ 𝑂2 𝐻))
• ℎ∗
= arg max
ℎ 𝑖
𝑃 𝑂2 ℎ∗
𝑃 ℎ∗
𝑂1 𝑤ℎ𝑒𝑟𝑒 𝑂1 ⊥ 𝑂2 𝐻
(e) Fully Connected
• Jointly models all three random variables.
• Combine information across both observations to choose the correct hypothesis.
• ℎ∗ = arg max
ℎ 𝑖
𝑃 𝑂2 ℎ𝑖, 𝑂1 𝑃 ℎ𝑖 𝑂1
Adopted from [1]

Difference btw/ the Linear Chain and Fully Connected model
𝑂1: Carl went to the store desperately searching for flour tortillas for a recipe.
𝑂2: Carl left the store very frustrated.
ℎ1: The cashier was rude. ℎ2: The store had corn tortillas, but not flour ones.
𝑡0
𝑡1
𝑡2
Linear Chain
𝑂2 ℎ1 ℎ1
𝑂1𝑃 ) 𝑃 )
𝑃 ) 𝑃 )
:Plausible! :Plausible!
𝑂2 ℎ2
ℎ2
𝑂1 :Plausible!:Plausible…
Fully Connected
𝑂2 ℎ1 ℎ1 𝑂1𝑃 , ) 𝑃 )
𝑃 )
:Plausible… :Plausible!
𝑂2 ℎ2
ℎ2 𝑂2 :Plausible!:Plausible!
𝑂1
𝑃 , )𝑂1
ℎ1 ℎ2
ℎ1 ℎ2
>
<

𝛼𝑁𝐿𝐺 model
Given ℎ+
= 𝑤1
ℎ
⋯ 𝑤𝑙
ℎ
, 𝑂1 = 𝑤1
𝑜1
⋯ 𝑤 𝑚
𝑜1
, 𝑂2 = 𝑤1
𝑜2
⋯ 𝑤 𝑛
𝑜2
as sequence of tokens,
The 𝛼𝑁𝐿𝐺 task can be modeled as 𝑷 𝒉+ 𝑶 𝟏, 𝑶 𝟐 = 𝑷 𝒘𝒊
𝒉
𝒘<𝒊
𝒉
, 𝒘 𝟏
𝒐 𝟏
⋯ 𝒘 𝒎
𝒐 𝟏
, 𝒘 𝟏
𝒐 𝟐
⋯ 𝒘 𝒏
𝒐 𝟐
Optionally, The model can also be conditioned on background knowledge 𝓚.
ℒ = −
𝑖=1
𝑁
log 𝑃 𝑤𝑖
ℎ
𝑤<𝑖
ℎ
, 𝑤1
𝑜1
⋯ 𝑤 𝑚
𝑜1
, 𝑤1
𝑜2
⋯ 𝑤 𝑛
𝑜2
, 𝒦
Adopted from [1]

ConceptNet [5]
• ConceptNet is a multilingual knowledge base
• Is constructed of:
• Nodes ( a word or a short phrase)
• Edges (relation and weight)
• ConceptNet is a multilingual knowledge base
• Is constructed of:
• Nodes ( a word or a short phrase)
• Edges (relation and weight)
Birds is not capable of … {bark, chew their food,
breathe water}
• Related Works
• Wordnet
• Microsoft Concept Net
• Google knowledge base
Adopted from [6]

ATOMIC [2]
1. xIntent: Why does X cause an event?
2. xNeed: What does X need to do before the
event?
3. xAttr: How would X be described?
4. xEffect: What effects does the event have on X?
5. xWant: What would X likely want to do after
the event?
6. xReaction: How does X feel after the event?
7. oReact: How do others’ feel after the event?
8. oWant: What would others likely want to do
after the event?
9. oEffect: What effects does the event have on
others?
If-Then Relation Types
• If-Event-Then-Mental-State
• If-Event-Then-Event
• If-Event-Then-Persona
Inference dimension
Adopted from [2]

ATOMIC
• If-Event-Then-Mental-State: three relations relating to the mental pre- and post-conditions of an event.
X compliments Y X wants to be nice
X feels good
Y feels flattered
xIntent: likely intents of the event
xReaction: likely (emotional) reactions of the event’s subject
oReaction: likely (emotional) reactions of others
Adopted from [2]

ATOMIC
• If-Event-Then-Event: five relations relating to events that constitute probable pre- and post-conditions of
a given event.
X calls the police X needs to dial 911
X starts to panic
X wants to explain everything to the police
xNeeds
xEffect
xWant
oWant
oEffect
Others want to dispatch some officers
Y will smileX pays Y a compliment
• If-Event-Then-Persona: a stative relation that describes how the subject of an event is described or
perceived.
xAttrX is flatteringX pays Y a compliment
xAttrX is caring

COMET: COMmonsEnse Transformers for Automatic Knowledge Graph
Construction [3]
Adopted from [3]

Construction
Adopted from [3]

Construction
Loss function
ℒ = −
𝑡= 𝑠 +|𝑟|
𝑠 + 𝑟 +|𝑜|
log 𝑃 𝑥 𝑡 𝑥<𝑡
Input Template
Notation
• Natural language tuples in {s: subject, r: relation, o: object}
• Subject tokens: 𝑿 𝒔 = 𝑥0
𝑠
, ⋯ , 𝑥 𝑠
𝑠
• Relation tokens: 𝑿 𝒓
= 𝑥0
𝑟
, ⋯ , 𝑥 𝑟
𝑟
• Object tokens: 𝑿 𝒐
= 𝑥0
𝑜
, ⋯ , 𝑥 𝑜
𝑜
Adopted from [3]

Construction

Adversarial Filtering Algorithm
• A significant challenge in creating datasets is avoiding annotation artifacts.
• Annotation artifacts: unintentional patterns in the data that leak information about the
target label.
In each iteration 𝑖,
• 𝑀𝑖: adversarial model
• 𝒯𝑖: a random subset for training
• 𝒱𝑖: validation set
• ℎ 𝑘
+
, ℎ 𝑘
−
: plausible and implausible
hypotheses for an instance 𝑘.
• 𝛿 = ∆ 𝑀 𝑖
ℎ 𝑘
+
, ℎ 𝑘
−
: the difference in the
model evaluation of ℎ 𝑘
+
and ℎ 𝑘
−
.
With probability 𝑡𝑖, update instance 𝑘 that
𝑀𝑖 gets correct with a pair ℎ+, ℎ− ∈ ℋ𝑘
+
×
ℋ𝑘
−
of hypotheses that reduces the value
of 𝛿, where ℋ𝑘
+
(resp. ℋ𝑘
−
) is the pool of
plausible (resp. implausible) hypotheses for
instance 𝑘.
Adopted from [1]

Performance of baselines (𝛼𝑁𝐿𝐼)
Adopted from [1]

Performance of baselines (𝛼𝑁𝐿𝐺)
Adopted from [1]
Adopted from [1]

BERT-Score
Evaluating Text Generation with BERT (like inception-score, Frechet-inception-distance, kernel-inception-distance in image generation)
Adopted from [4]

BERT-Score
BERT-Score
𝑅 𝐵𝐸𝑅𝑇 =
1
|𝑥|
𝑥 𝑖∈𝑥
max
𝑥 𝑗∈ 𝑥
𝐱 𝑖
⊺
𝐱𝑗 𝑃𝐵𝐸𝑅𝑇 =
1
| 𝑥|
𝑥ℎ∈ 𝑥
max
𝑥 𝑖∈𝑥
𝐱 𝑖
⊺
𝐱𝑗 𝐹𝐵𝐸𝑅𝑇 =
2 𝑃𝐵𝐸𝑅𝑇 ⋅ 𝑅 𝐵𝐸𝑅𝑇
𝑃𝐵𝐸𝑅𝑇 + 𝑅 𝐵𝐸𝑅𝑇
Importance Weighting using inverse document frequency(idf)
𝑖𝑑𝑓 𝑤 = − log
1
𝑀 𝑖=1
𝑀
𝕀 𝑤 ∈ 𝑥 𝑖 , where 𝕀 ⋅ is an indicator function.
Σ 𝑥 𝑖∈𝑥 𝑖𝑑𝑓 𝑥𝑖 max
𝑥 𝑗∈ 𝑥
𝐱 𝑖
⊺
𝐱𝑗
Σ 𝑥 𝑖∈𝑥 𝑖𝑑𝑓(𝑥𝑖)
Baseline Rescaling: compute baseline b using Common Crawl monolingual datasets.
𝑅 𝐵𝐸𝑅𝑇 − 𝑏
1 − 𝑏

1. 𝑂1 ↛ ℎ−: ℎ− is unlikely to follow after the first observation 𝑂1.
2. ℎ− ↛ 𝑂2: ℎ− is plausible after 𝑂1 but unlikely to precede the second observation 𝑂2.
3. Plausible: 𝑂1, ℎ−
, 𝑂2 is a coherent narrative and forms a plausible alternative, but it less plausible
than 𝑂1, ℎ+
, 𝑂2 .
Adopted from [1]
Bert performance

Adopted from [1]

Reference
[1] Chandra Bhagavatula et. al., Abductive Commonsense Reasoning, ICLR 2020.
[2] Maarten Sap et. al., ATOMIC: An atlas of Machine Commonsense for If-Then Reasoning,
arXiv:1811.00146
[3] Antonie Bosselut et. al., COMET: Commonsense Transformers for Automatic Knowledge Graph
Construction, ACL 2019
[4] Tianyi Zhang et. al., BERTScore: Evaluating Text Generation with BERT, ICLR 2020.
[5] Robyn Speer et. al., ConceptNet 5.5: An Open Multilingual Graph of General Knowledge,
arXiv:1612.03975
[6] MIT medialab, URL: https://conceptnet.io/

Abductive commonsense reasoning

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Abductive commonsense reasoning

Ähnlich wie Abductive commonsense reasoning (20)

Mehr von San Kim

Mehr von San Kim (18)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Abductive commonsense reasoning