SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Abductive Commonsense
Reasoning
San Kim
2020.07.03
Chandra Bhagavatula(1), Ronan Le Bras(1), Chaitanya Malaviya(1), Keisuke Sakaguchi(1),
Ari Holtzman(1), Hannah Rashkin(1), Doug Downey(1), Scott Wen-tau Yih(2), Yejin Choi(1,3)
1. Allen Institute for AI
2. Facebook AI
3. Paul G. Allen School of Computer Scient & Engineering
Contributions
1. Abductive Commonsense Reasoning Challenges
1. Abductive NLI (𝛼𝑁𝐿𝐼)
2. Abductive NLG (𝛼𝑁𝐿𝐺)
2. ART: Dataset for Abductive Reasoning in Narrative Text
1. 20K Narrative Context
2. 200K Hypotheses
1. Experiments & Key Insights
Abductive Commonsense Reasoning
𝜶𝑵𝑳𝑰 is formulated as multiple choice problems consisting of a pair of
observations as context and a pair of hypothesis choices.
𝑂1: The observation at time 𝑡1
𝑂2: The observation at time 𝑡2 > 𝑡1.
ℎ+
: A plausible hypothesis that explains the two observations 𝑂1 and 𝑂2.
ℎ−: An implausible (or less plausible) hypothesis for observations 𝑂1 and 𝑂2.
Abductive Natural Language Generation
𝜶𝑵𝑳𝑮 is the task of generating a valid hypothesis ℎ+ given the two
observation 𝑂1 and 𝑂2. Formally, the task requires to maximize 𝑃 ℎ+
𝑂1, 𝑂2 .
Abductive Natural Language Inference
Abductive Reasoning
Adopted from [1]
A Probabilistic Framework for 𝛼𝑁𝐿𝐼
The 𝛼𝑁𝐿𝐼 task is to select the hypothesis ℎ∗
that is most probable given the
observations.
ℎ∗
= arg max
ℎ 𝑖
𝑃 𝐻 = ℎ𝑖
𝑂1, 𝑂2
Rewriting the objective using Bayes Rule conditioned on 𝑂1, we have:
𝑃 ℎ𝑖 𝑂1, 𝑂2 ∝ 𝑃 𝑂2 𝐻 𝑖, 𝑂1 𝑃 ℎ𝑖 𝑂1
Illustration of the graphical models described in the probabilistic framework.
Adopted from [1]
A Probabilistic Framework for 𝛼𝑁𝐿𝐼
(a) Hypothesis Only
• the strong assumption: the hypothesis is entirely independent of both observations, i.e.
(𝐻 ⊥ 𝑂1, 𝑂2).
• Maximize the marginal 𝑷 𝑯 .
(b, c) First (or Second) Observation Only
• Weaker assumption: the hypothesis depends on only one of the first 𝑂1 or second 𝑂2
observation.
• Maximize the conditional probability 𝑷 𝑯 𝑶 𝟏 or 𝑷 𝑶 𝟐 𝑯 .
Adopted from [1]
A Probabilistic Framework for 𝛼𝑁𝐿𝐼
(d) Linear Chain
• Uses both observations, but consider each observation’s influence on the hypothesis
independently.
• The model assumes that the three variables < 𝑶 𝟏, 𝑯, 𝑶 𝟐 > form a linear Markov chain.
• The second observation is conditionally independent of the first, given the hypothesis (i.e.
𝑂1 ⊥ 𝑂2 𝐻))
• ℎ∗
= arg max
ℎ 𝑖
𝑃 𝑂2 ℎ∗
𝑃 ℎ∗
𝑂1 𝑤ℎ𝑒𝑟𝑒 𝑂1 ⊥ 𝑂2 𝐻
(e) Fully Connected
• Jointly models all three random variables.
• Combine information across both observations to choose the correct hypothesis.
• ℎ∗ = arg max
ℎ 𝑖
𝑃 𝑂2 ℎ𝑖, 𝑂1 𝑃 ℎ𝑖 𝑂1
Adopted from [1]
Difference btw/ the Linear Chain and Fully Connected model
𝑂1: Carl went to the store desperately searching for flour tortillas for a recipe.
𝑂2: Carl left the store very frustrated.
ℎ1: The cashier was rude. ℎ2: The store had corn tortillas, but not flour ones.
𝑡0
𝑡1
𝑡2
Linear Chain
𝑂2 ℎ1 ℎ1
𝑂1𝑃 ) 𝑃 )
𝑃 ) 𝑃 )
:Plausible! :Plausible!
𝑂2 ℎ2
ℎ2
𝑂1 :Plausible!:Plausible…
Fully Connected
𝑂2 ℎ1 ℎ1 𝑂1𝑃 , ) 𝑃 )
𝑃 )
:Plausible… :Plausible!
𝑂2 ℎ2
ℎ2 𝑂2 :Plausible!:Plausible!
𝑂1
𝑃 , )𝑂1
ℎ1 ℎ2
ℎ1 ℎ2
>
<
𝛼𝑁𝐿𝐺 model
Given ℎ+
= 𝑤1
ℎ
⋯ 𝑤𝑙
ℎ
, 𝑂1 = 𝑤1
𝑜1
⋯ 𝑤 𝑚
𝑜1
, 𝑂2 = 𝑤1
𝑜2
⋯ 𝑤 𝑛
𝑜2
as sequence of tokens,
The 𝛼𝑁𝐿𝐺 task can be modeled as 𝑷 𝒉+ 𝑶 𝟏, 𝑶 𝟐 = 𝑷 𝒘𝒊
𝒉
𝒘<𝒊
𝒉
, 𝒘 𝟏
𝒐 𝟏
⋯ 𝒘 𝒎
𝒐 𝟏
, 𝒘 𝟏
𝒐 𝟐
⋯ 𝒘 𝒏
𝒐 𝟐
Optionally, The model can also be conditioned on background knowledge 𝓚.
ℒ = −
𝑖=1
𝑁
log 𝑃 𝑤𝑖
ℎ
𝑤<𝑖
ℎ
, 𝑤1
𝑜1
⋯ 𝑤 𝑚
𝑜1
, 𝑤1
𝑜2
⋯ 𝑤 𝑛
𝑜2
, 𝒦
Adopted from [1]
ConceptNet [5]
• ConceptNet is a multilingual knowledge base
• Is constructed of:
• Nodes ( a word or a short phrase)
• Edges (relation and weight)
• ConceptNet is a multilingual knowledge base
• Is constructed of:
• Nodes ( a word or a short phrase)
• Edges (relation and weight)
Birds is not capable of … {bark, chew their food,
breathe water}
• Related Works
• Wordnet
• Microsoft Concept Net
• Google knowledge base
Adopted from [6]
ATOMIC [2]
1. xIntent: Why does X cause an event?
2. xNeed: What does X need to do before the
event?
3. xAttr: How would X be described?
4. xEffect: What effects does the event have on X?
5. xWant: What would X likely want to do after
the event?
6. xReaction: How does X feel after the event?
7. oReact: How do others’ feel after the event?
8. oWant: What would others likely want to do
after the event?
9. oEffect: What effects does the event have on
others?
If-Then Relation Types
• If-Event-Then-Mental-State
• If-Event-Then-Event
• If-Event-Then-Persona
Inference dimension
Adopted from [2]
ATOMIC
• If-Event-Then-Mental-State: three relations relating to the mental pre- and post-conditions of an event.
X compliments Y X wants to be nice
X feels good
Y feels flattered
xIntent: likely intents of the event
xReaction: likely (emotional) reactions of the event’s subject
oReaction: likely (emotional) reactions of others
Adopted from [2]
ATOMIC
• If-Event-Then-Event: five relations relating to events that constitute probable pre- and post-conditions of
a given event.
X calls the police X needs to dial 911
X starts to panic
X wants to explain everything to the police
xNeeds
xEffect
xWant
oWant
oEffect
Others want to dispatch some officers
Y will smileX pays Y a compliment
• If-Event-Then-Persona: a stative relation that describes how the subject of an event is described or
perceived.
xAttrX is flatteringX pays Y a compliment
xAttrX is caring
ATOMIC
Adopted from [2]
COMET: COMmonsEnse Transformers for Automatic Knowledge Graph
Construction [3]
Adopted from [3]
COMET: COMmonsEnse Transformers for Automatic Knowledge Graph
Construction
Adopted from [3]
COMET: COMmonsEnse Transformers for Automatic Knowledge Graph
Construction
Loss function
ℒ = −
𝑡= 𝑠 +|𝑟|
𝑠 + 𝑟 +|𝑜|
log 𝑃 𝑥 𝑡 𝑥<𝑡
Input Template
Notation
• Natural language tuples in {s: subject, r: relation, o: object}
• Subject tokens: 𝑿 𝒔 = 𝑥0
𝑠
, ⋯ , 𝑥 𝑠
𝑠
• Relation tokens: 𝑿 𝒓
= 𝑥0
𝑟
, ⋯ , 𝑥 𝑟
𝑟
• Object tokens: 𝑿 𝒐
= 𝑥0
𝑜
, ⋯ , 𝑥 𝑜
𝑜
Adopted from [3]
COMET: COMmonsEnse Transformers for Automatic Knowledge Graph
Construction
Adversarial Filtering Algorithm
• A significant challenge in creating datasets is avoiding annotation artifacts.
• Annotation artifacts: unintentional patterns in the data that leak information about the
target label.
In each iteration 𝑖,
• 𝑀𝑖: adversarial model
• 𝒯𝑖: a random subset for training
• 𝒱𝑖: validation set
• ℎ 𝑘
+
, ℎ 𝑘
−
: plausible and implausible
hypotheses for an instance 𝑘.
• 𝛿 = ∆ 𝑀 𝑖
ℎ 𝑘
+
, ℎ 𝑘
−
: the difference in the
model evaluation of ℎ 𝑘
+
and ℎ 𝑘
−
.
With probability 𝑡𝑖, update instance 𝑘 that
𝑀𝑖 gets correct with a pair ℎ+, ℎ− ∈ ℋ𝑘
+
×
ℋ𝑘
−
of hypotheses that reduces the value
of 𝛿, where ℋ𝑘
+
(resp. ℋ𝑘
−
) is the pool of
plausible (resp. implausible) hypotheses for
instance 𝑘.
Adopted from [1]
Performance of baselines (𝛼𝑁𝐿𝐼)
Adopted from [1]
Performance of baselines (𝛼𝑁𝐿𝐼)
Adopted from [1]
Performance of baselines (𝛼𝑁𝐿𝐺)
Adopted from [1]
Adopted from [1]
BERT-Score
Evaluating Text Generation with BERT (like inception-score, Frechet-inception-distance, kernel-inception-distance in image generation)
Adopted from [4]
BERT-Score
BERT-Score
𝑅 𝐵𝐸𝑅𝑇 =
1
|𝑥|
𝑥 𝑖∈𝑥
max
𝑥 𝑗∈ 𝑥
𝐱 𝑖
⊺
𝐱𝑗 𝑃𝐵𝐸𝑅𝑇 =
1
| 𝑥|
𝑥ℎ∈ 𝑥
max
𝑥 𝑖∈𝑥
𝐱 𝑖
⊺
𝐱𝑗 𝐹𝐵𝐸𝑅𝑇 =
2 𝑃𝐵𝐸𝑅𝑇 ⋅ 𝑅 𝐵𝐸𝑅𝑇
𝑃𝐵𝐸𝑅𝑇 + 𝑅 𝐵𝐸𝑅𝑇
Importance Weighting using inverse document frequency(idf)
𝑖𝑑𝑓 𝑤 = − log
1
𝑀 𝑖=1
𝑀
𝕀 𝑤 ∈ 𝑥 𝑖 , where 𝕀 ⋅ is an indicator function.
𝑅 𝐵𝐸𝑅𝑇 =
Σ 𝑥 𝑖∈𝑥 𝑖𝑑𝑓 𝑥𝑖 max
𝑥 𝑗∈ 𝑥
𝐱 𝑖
⊺
𝐱𝑗
Σ 𝑥 𝑖∈𝑥 𝑖𝑑𝑓(𝑥𝑖)
Baseline Rescaling: compute baseline b using Common Crawl monolingual datasets.
𝑅 𝐵𝐸𝑅𝑇 =
𝑅 𝐵𝐸𝑅𝑇 − 𝑏
1 − 𝑏
BERT-Score
Adopted from [4]
Abductive Commonsense Reasoning
1. 𝑂1 ↛ ℎ−: ℎ− is unlikely to follow after the first observation 𝑂1.
2. ℎ− ↛ 𝑂2: ℎ− is plausible after 𝑂1 but unlikely to precede the second observation 𝑂2.
3. Plausible: 𝑂1, ℎ−
, 𝑂2 is a coherent narrative and forms a plausible alternative, but it less plausible
than 𝑂1, ℎ+
, 𝑂2 .
Adopted from [1]
Bert performance
Abductive Commonsense Reasoning
Adopted from [1]
Abductive Commonsense Reasoning
Adopted from [1]
Reference
[1] Chandra Bhagavatula et. al., Abductive Commonsense Reasoning, ICLR 2020.
[2] Maarten Sap et. al., ATOMIC: An atlas of Machine Commonsense for If-Then Reasoning,
arXiv:1811.00146
[3] Antonie Bosselut et. al., COMET: Commonsense Transformers for Automatic Knowledge Graph
Construction, ACL 2019
[4] Tianyi Zhang et. al., BERTScore: Evaluating Text Generation with BERT, ICLR 2020.
[5] Robyn Speer et. al., ConceptNet 5.5: An Open Multilingual Graph of General Knowledge,
arXiv:1612.03975
[6] MIT medialab, URL: https://conceptnet.io/

Weitere ähnliche Inhalte

Was ist angesagt?

Fundamental computing algorithms
Fundamental computing algorithmsFundamental computing algorithms
Fundamental computing algorithmsGanesh Solanke
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsParinaz Ameri
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoHridyesh Bisht
 
Algorithms Design Patterns
Algorithms Design PatternsAlgorithms Design Patterns
Algorithms Design PatternsAshwin Shiv
 
Fuzzy Logic and Neural Network
Fuzzy Logic and Neural NetworkFuzzy Logic and Neural Network
Fuzzy Logic and Neural NetworkSHIMI S L
 
How principal components analysis is different from factor
How principal components analysis is different from factorHow principal components analysis is different from factor
How principal components analysis is different from factorArup Guha
 
(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_Kim(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_KimSundong Kim
 
GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesNatan Katz
 
Fuzzy Logic Ppt
Fuzzy Logic PptFuzzy Logic Ppt
Fuzzy Logic Pptrafi
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisJaclyn Kokx
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationPier Luca Lanzi
 
lecture 26
lecture 26lecture 26
lecture 26sajinsc
 
Dataflow Analysis
Dataflow AnalysisDataflow Analysis
Dataflow AnalysisMiller Lee
 
Linear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its GeneralizationLinear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its Generalization일상 온
 
Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...
Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...
Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...Professor Lili Saghafi
 
Greedy method1
Greedy method1Greedy method1
Greedy method1Rajendran
 

Was ist angesagt? (20)

Fundamental computing algorithms
Fundamental computing algorithmsFundamental computing algorithms
Fundamental computing algorithms
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data Scientists
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demo
 
Algorithms Design Patterns
Algorithms Design PatternsAlgorithms Design Patterns
Algorithms Design Patterns
 
Fuzzy Logic and Neural Network
Fuzzy Logic and Neural NetworkFuzzy Logic and Neural Network
Fuzzy Logic and Neural Network
 
How principal components analysis is different from factor
How principal components analysis is different from factorHow principal components analysis is different from factor
How principal components analysis is different from factor
 
Cwkaa 2010
Cwkaa 2010Cwkaa 2010
Cwkaa 2010
 
(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_Kim(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_Kim
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectives
 
Fuzzy Logic Ppt
Fuzzy Logic PptFuzzy Logic Ppt
Fuzzy Logic Ppt
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
algorithm Unit 2
algorithm Unit 2 algorithm Unit 2
algorithm Unit 2
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparation
 
lecture 26
lecture 26lecture 26
lecture 26
 
Dataflow Analysis
Dataflow AnalysisDataflow Analysis
Dataflow Analysis
 
Linear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its GeneralizationLinear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its Generalization
 
Chapter 17
Chapter 17Chapter 17
Chapter 17
 
Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...
Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...
Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...
 
Greedy method1
Greedy method1Greedy method1
Greedy method1
 

Ähnlich wie Abductive commonsense reasoning

Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2San Kim
 
SPIE Conference V3.0
SPIE Conference V3.0SPIE Conference V3.0
SPIE Conference V3.0Robert Fry
 
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...Association for Computational Linguistics
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmKaniska Mandal
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelineChenYiHuang5
 
2. polynomial interpolation
2. polynomial interpolation2. polynomial interpolation
2. polynomial interpolationEasyStudy3
 
GDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentGDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentElectronic Arts / DICE
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural NetworksNatan Katz
 
Domain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewDomain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewChia-Ching Lin
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsJason Tsai
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)ananth
 
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionShow, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionEun Ji Lee
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017MLconf
 
Bill howe 6_machinelearning_1
Bill howe 6_machinelearning_1Bill howe 6_machinelearning_1
Bill howe 6_machinelearning_1Mahammad Valiyev
 
Dcn 20170823 yjy
Dcn 20170823 yjyDcn 20170823 yjy
Dcn 20170823 yjy재연 윤
 
Functors, Comonads, and Digital Image Processing
Functors, Comonads, and Digital Image ProcessingFunctors, Comonads, and Digital Image Processing
Functors, Comonads, and Digital Image ProcessingJustin Le
 

Ähnlich wie Abductive commonsense reasoning (20)

Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
 
SPIE Conference V3.0
SPIE Conference V3.0SPIE Conference V3.0
SPIE Conference V3.0
 
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning Algorithm
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithm
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
2. polynomial interpolation
2. polynomial interpolation2. polynomial interpolation
2. polynomial interpolation
 
GDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentGDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game Development
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
 
Icra 17
Icra 17Icra 17
Icra 17
 
Domain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewDomain adaptation: A Theoretical View
Domain adaptation: A Theoretical View
 
nnml.ppt
nnml.pptnnml.ppt
nnml.ppt
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)
 
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionShow, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
 
Bill howe 6_machinelearning_1
Bill howe 6_machinelearning_1Bill howe 6_machinelearning_1
Bill howe 6_machinelearning_1
 
Dcn 20170823 yjy
Dcn 20170823 yjyDcn 20170823 yjy
Dcn 20170823 yjy
 
Functors, Comonads, and Digital Image Processing
Functors, Comonads, and Digital Image ProcessingFunctors, Comonads, and Digital Image Processing
Functors, Comonads, and Digital Image Processing
 
Lec10.pptx
Lec10.pptxLec10.pptx
Lec10.pptx
 

Mehr von San Kim

20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...San Kim
 
2023 EMNLP day_san.pptx
2023 EMNLP day_san.pptx2023 EMNLP day_san.pptx
2023 EMNLP day_san.pptxSan Kim
 
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptxLongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptxSan Kim
 
slide-acl2022-combined_san.pptx
slide-acl2022-combined_san.pptxslide-acl2022-combined_san.pptx
slide-acl2022-combined_san.pptxSan Kim
 
Compeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptxCompeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptxSan Kim
 
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...San Kim
 
AI2 day.pptx
AI2 day.pptxAI2 day.pptx
AI2 day.pptxSan Kim
 
Temporal reasoning task
Temporal reasoning taskTemporal reasoning task
Temporal reasoning taskSan Kim
 
Answering complex open domain questions with multi-hop dense retrieval
Answering complex open domain questions with multi-hop dense retrievalAnswering complex open domain questions with multi-hop dense retrieval
Answering complex open domain questions with multi-hop dense retrievalSan Kim
 
Measuring massive multitask language understanding
Measuring massive multitask language understandingMeasuring massive multitask language understanding
Measuring massive multitask language understandingSan Kim
 
XLnet RoBERTa Reformer
XLnet RoBERTa ReformerXLnet RoBERTa Reformer
XLnet RoBERTa ReformerSan Kim
 
Transformer xl
Transformer xlTransformer xl
Transformer xlSan Kim
 
Face recognition v1
Face recognition v1Face recognition v1
Face recognition v1San Kim
 
Gan seminar
Gan seminarGan seminar
Gan seminarSan Kim
 
Deep learning study 3
Deep learning study 3Deep learning study 3
Deep learning study 3San Kim
 
Deep learning study 1
Deep learning study 1Deep learning study 1
Deep learning study 1San Kim
 
Back propagation
Back propagationBack propagation
Back propagationSan Kim
 

Mehr von San Kim (18)

20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
 
2023 EMNLP day_san.pptx
2023 EMNLP day_san.pptx2023 EMNLP day_san.pptx
2023 EMNLP day_san.pptx
 
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptxLongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
 
slide-acl2022-combined_san.pptx
slide-acl2022-combined_san.pptxslide-acl2022-combined_san.pptx
slide-acl2022-combined_san.pptx
 
Compeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptxCompeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptx
 
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
 
AI2 day.pptx
AI2 day.pptxAI2 day.pptx
AI2 day.pptx
 
Temporal reasoning task
Temporal reasoning taskTemporal reasoning task
Temporal reasoning task
 
Answering complex open domain questions with multi-hop dense retrieval
Answering complex open domain questions with multi-hop dense retrievalAnswering complex open domain questions with multi-hop dense retrieval
Answering complex open domain questions with multi-hop dense retrieval
 
Measuring massive multitask language understanding
Measuring massive multitask language understandingMeasuring massive multitask language understanding
Measuring massive multitask language understanding
 
Electra
ElectraElectra
Electra
 
XLnet RoBERTa Reformer
XLnet RoBERTa ReformerXLnet RoBERTa Reformer
XLnet RoBERTa Reformer
 
Transformer xl
Transformer xlTransformer xl
Transformer xl
 
Face recognition v1
Face recognition v1Face recognition v1
Face recognition v1
 
Gan seminar
Gan seminarGan seminar
Gan seminar
 
Deep learning study 3
Deep learning study 3Deep learning study 3
Deep learning study 3
 
Deep learning study 1
Deep learning study 1Deep learning study 1
Deep learning study 1
 
Back propagation
Back propagationBack propagation
Back propagation
 

Kürzlich hochgeladen

COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)AkefAfaneh2
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Silpa
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingadibshanto115
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfrohankumarsinghrore1
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptRakeshMohan42
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptxryanrooker
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 

Kürzlich hochgeladen (20)

COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mapping
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdf
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.ppt
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 

Abductive commonsense reasoning

  • 1. Abductive Commonsense Reasoning San Kim 2020.07.03 Chandra Bhagavatula(1), Ronan Le Bras(1), Chaitanya Malaviya(1), Keisuke Sakaguchi(1), Ari Holtzman(1), Hannah Rashkin(1), Doug Downey(1), Scott Wen-tau Yih(2), Yejin Choi(1,3) 1. Allen Institute for AI 2. Facebook AI 3. Paul G. Allen School of Computer Scient & Engineering
  • 2. Contributions 1. Abductive Commonsense Reasoning Challenges 1. Abductive NLI (𝛼𝑁𝐿𝐼) 2. Abductive NLG (𝛼𝑁𝐿𝐺) 2. ART: Dataset for Abductive Reasoning in Narrative Text 1. 20K Narrative Context 2. 200K Hypotheses 1. Experiments & Key Insights
  • 3. Abductive Commonsense Reasoning 𝜶𝑵𝑳𝑰 is formulated as multiple choice problems consisting of a pair of observations as context and a pair of hypothesis choices. 𝑂1: The observation at time 𝑡1 𝑂2: The observation at time 𝑡2 > 𝑡1. ℎ+ : A plausible hypothesis that explains the two observations 𝑂1 and 𝑂2. ℎ−: An implausible (or less plausible) hypothesis for observations 𝑂1 and 𝑂2. Abductive Natural Language Generation 𝜶𝑵𝑳𝑮 is the task of generating a valid hypothesis ℎ+ given the two observation 𝑂1 and 𝑂2. Formally, the task requires to maximize 𝑃 ℎ+ 𝑂1, 𝑂2 . Abductive Natural Language Inference
  • 5. A Probabilistic Framework for 𝛼𝑁𝐿𝐼 The 𝛼𝑁𝐿𝐼 task is to select the hypothesis ℎ∗ that is most probable given the observations. ℎ∗ = arg max ℎ 𝑖 𝑃 𝐻 = ℎ𝑖 𝑂1, 𝑂2 Rewriting the objective using Bayes Rule conditioned on 𝑂1, we have: 𝑃 ℎ𝑖 𝑂1, 𝑂2 ∝ 𝑃 𝑂2 𝐻 𝑖, 𝑂1 𝑃 ℎ𝑖 𝑂1 Illustration of the graphical models described in the probabilistic framework. Adopted from [1]
  • 6. A Probabilistic Framework for 𝛼𝑁𝐿𝐼 (a) Hypothesis Only • the strong assumption: the hypothesis is entirely independent of both observations, i.e. (𝐻 ⊥ 𝑂1, 𝑂2). • Maximize the marginal 𝑷 𝑯 . (b, c) First (or Second) Observation Only • Weaker assumption: the hypothesis depends on only one of the first 𝑂1 or second 𝑂2 observation. • Maximize the conditional probability 𝑷 𝑯 𝑶 𝟏 or 𝑷 𝑶 𝟐 𝑯 . Adopted from [1]
  • 7. A Probabilistic Framework for 𝛼𝑁𝐿𝐼 (d) Linear Chain • Uses both observations, but consider each observation’s influence on the hypothesis independently. • The model assumes that the three variables < 𝑶 𝟏, 𝑯, 𝑶 𝟐 > form a linear Markov chain. • The second observation is conditionally independent of the first, given the hypothesis (i.e. 𝑂1 ⊥ 𝑂2 𝐻)) • ℎ∗ = arg max ℎ 𝑖 𝑃 𝑂2 ℎ∗ 𝑃 ℎ∗ 𝑂1 𝑤ℎ𝑒𝑟𝑒 𝑂1 ⊥ 𝑂2 𝐻 (e) Fully Connected • Jointly models all three random variables. • Combine information across both observations to choose the correct hypothesis. • ℎ∗ = arg max ℎ 𝑖 𝑃 𝑂2 ℎ𝑖, 𝑂1 𝑃 ℎ𝑖 𝑂1 Adopted from [1]
  • 8. Difference btw/ the Linear Chain and Fully Connected model 𝑂1: Carl went to the store desperately searching for flour tortillas for a recipe. 𝑂2: Carl left the store very frustrated. ℎ1: The cashier was rude. ℎ2: The store had corn tortillas, but not flour ones. 𝑡0 𝑡1 𝑡2 Linear Chain 𝑂2 ℎ1 ℎ1 𝑂1𝑃 ) 𝑃 ) 𝑃 ) 𝑃 ) :Plausible! :Plausible! 𝑂2 ℎ2 ℎ2 𝑂1 :Plausible!:Plausible… Fully Connected 𝑂2 ℎ1 ℎ1 𝑂1𝑃 , ) 𝑃 ) 𝑃 ) :Plausible… :Plausible! 𝑂2 ℎ2 ℎ2 𝑂2 :Plausible!:Plausible! 𝑂1 𝑃 , )𝑂1 ℎ1 ℎ2 ℎ1 ℎ2 > <
  • 9. 𝛼𝑁𝐿𝐺 model Given ℎ+ = 𝑤1 ℎ ⋯ 𝑤𝑙 ℎ , 𝑂1 = 𝑤1 𝑜1 ⋯ 𝑤 𝑚 𝑜1 , 𝑂2 = 𝑤1 𝑜2 ⋯ 𝑤 𝑛 𝑜2 as sequence of tokens, The 𝛼𝑁𝐿𝐺 task can be modeled as 𝑷 𝒉+ 𝑶 𝟏, 𝑶 𝟐 = 𝑷 𝒘𝒊 𝒉 𝒘<𝒊 𝒉 , 𝒘 𝟏 𝒐 𝟏 ⋯ 𝒘 𝒎 𝒐 𝟏 , 𝒘 𝟏 𝒐 𝟐 ⋯ 𝒘 𝒏 𝒐 𝟐 Optionally, The model can also be conditioned on background knowledge 𝓚. ℒ = − 𝑖=1 𝑁 log 𝑃 𝑤𝑖 ℎ 𝑤<𝑖 ℎ , 𝑤1 𝑜1 ⋯ 𝑤 𝑚 𝑜1 , 𝑤1 𝑜2 ⋯ 𝑤 𝑛 𝑜2 , 𝒦 Adopted from [1]
  • 10. ConceptNet [5] • ConceptNet is a multilingual knowledge base • Is constructed of: • Nodes ( a word or a short phrase) • Edges (relation and weight) • ConceptNet is a multilingual knowledge base • Is constructed of: • Nodes ( a word or a short phrase) • Edges (relation and weight) Birds is not capable of … {bark, chew their food, breathe water} • Related Works • Wordnet • Microsoft Concept Net • Google knowledge base Adopted from [6]
  • 11. ATOMIC [2] 1. xIntent: Why does X cause an event? 2. xNeed: What does X need to do before the event? 3. xAttr: How would X be described? 4. xEffect: What effects does the event have on X? 5. xWant: What would X likely want to do after the event? 6. xReaction: How does X feel after the event? 7. oReact: How do others’ feel after the event? 8. oWant: What would others likely want to do after the event? 9. oEffect: What effects does the event have on others? If-Then Relation Types • If-Event-Then-Mental-State • If-Event-Then-Event • If-Event-Then-Persona Inference dimension Adopted from [2]
  • 12. ATOMIC • If-Event-Then-Mental-State: three relations relating to the mental pre- and post-conditions of an event. X compliments Y X wants to be nice X feels good Y feels flattered xIntent: likely intents of the event xReaction: likely (emotional) reactions of the event’s subject oReaction: likely (emotional) reactions of others Adopted from [2]
  • 13. ATOMIC • If-Event-Then-Event: five relations relating to events that constitute probable pre- and post-conditions of a given event. X calls the police X needs to dial 911 X starts to panic X wants to explain everything to the police xNeeds xEffect xWant oWant oEffect Others want to dispatch some officers Y will smileX pays Y a compliment • If-Event-Then-Persona: a stative relation that describes how the subject of an event is described or perceived. xAttrX is flatteringX pays Y a compliment xAttrX is caring
  • 15. COMET: COMmonsEnse Transformers for Automatic Knowledge Graph Construction [3] Adopted from [3]
  • 16. COMET: COMmonsEnse Transformers for Automatic Knowledge Graph Construction Adopted from [3]
  • 17. COMET: COMmonsEnse Transformers for Automatic Knowledge Graph Construction Loss function ℒ = − 𝑡= 𝑠 +|𝑟| 𝑠 + 𝑟 +|𝑜| log 𝑃 𝑥 𝑡 𝑥<𝑡 Input Template Notation • Natural language tuples in {s: subject, r: relation, o: object} • Subject tokens: 𝑿 𝒔 = 𝑥0 𝑠 , ⋯ , 𝑥 𝑠 𝑠 • Relation tokens: 𝑿 𝒓 = 𝑥0 𝑟 , ⋯ , 𝑥 𝑟 𝑟 • Object tokens: 𝑿 𝒐 = 𝑥0 𝑜 , ⋯ , 𝑥 𝑜 𝑜 Adopted from [3]
  • 18. COMET: COMmonsEnse Transformers for Automatic Knowledge Graph Construction
  • 19. Adversarial Filtering Algorithm • A significant challenge in creating datasets is avoiding annotation artifacts. • Annotation artifacts: unintentional patterns in the data that leak information about the target label. In each iteration 𝑖, • 𝑀𝑖: adversarial model • 𝒯𝑖: a random subset for training • 𝒱𝑖: validation set • ℎ 𝑘 + , ℎ 𝑘 − : plausible and implausible hypotheses for an instance 𝑘. • 𝛿 = ∆ 𝑀 𝑖 ℎ 𝑘 + , ℎ 𝑘 − : the difference in the model evaluation of ℎ 𝑘 + and ℎ 𝑘 − . With probability 𝑡𝑖, update instance 𝑘 that 𝑀𝑖 gets correct with a pair ℎ+, ℎ− ∈ ℋ𝑘 + × ℋ𝑘 − of hypotheses that reduces the value of 𝛿, where ℋ𝑘 + (resp. ℋ𝑘 − ) is the pool of plausible (resp. implausible) hypotheses for instance 𝑘. Adopted from [1]
  • 20. Performance of baselines (𝛼𝑁𝐿𝐼) Adopted from [1]
  • 21. Performance of baselines (𝛼𝑁𝐿𝐼) Adopted from [1]
  • 22. Performance of baselines (𝛼𝑁𝐿𝐺) Adopted from [1] Adopted from [1]
  • 23. BERT-Score Evaluating Text Generation with BERT (like inception-score, Frechet-inception-distance, kernel-inception-distance in image generation) Adopted from [4]
  • 24. BERT-Score BERT-Score 𝑅 𝐵𝐸𝑅𝑇 = 1 |𝑥| 𝑥 𝑖∈𝑥 max 𝑥 𝑗∈ 𝑥 𝐱 𝑖 ⊺ 𝐱𝑗 𝑃𝐵𝐸𝑅𝑇 = 1 | 𝑥| 𝑥ℎ∈ 𝑥 max 𝑥 𝑖∈𝑥 𝐱 𝑖 ⊺ 𝐱𝑗 𝐹𝐵𝐸𝑅𝑇 = 2 𝑃𝐵𝐸𝑅𝑇 ⋅ 𝑅 𝐵𝐸𝑅𝑇 𝑃𝐵𝐸𝑅𝑇 + 𝑅 𝐵𝐸𝑅𝑇 Importance Weighting using inverse document frequency(idf) 𝑖𝑑𝑓 𝑤 = − log 1 𝑀 𝑖=1 𝑀 𝕀 𝑤 ∈ 𝑥 𝑖 , where 𝕀 ⋅ is an indicator function. 𝑅 𝐵𝐸𝑅𝑇 = Σ 𝑥 𝑖∈𝑥 𝑖𝑑𝑓 𝑥𝑖 max 𝑥 𝑗∈ 𝑥 𝐱 𝑖 ⊺ 𝐱𝑗 Σ 𝑥 𝑖∈𝑥 𝑖𝑑𝑓(𝑥𝑖) Baseline Rescaling: compute baseline b using Common Crawl monolingual datasets. 𝑅 𝐵𝐸𝑅𝑇 = 𝑅 𝐵𝐸𝑅𝑇 − 𝑏 1 − 𝑏
  • 26. Abductive Commonsense Reasoning 1. 𝑂1 ↛ ℎ−: ℎ− is unlikely to follow after the first observation 𝑂1. 2. ℎ− ↛ 𝑂2: ℎ− is plausible after 𝑂1 but unlikely to precede the second observation 𝑂2. 3. Plausible: 𝑂1, ℎ− , 𝑂2 is a coherent narrative and forms a plausible alternative, but it less plausible than 𝑂1, ℎ+ , 𝑂2 . Adopted from [1] Bert performance
  • 29. Reference [1] Chandra Bhagavatula et. al., Abductive Commonsense Reasoning, ICLR 2020. [2] Maarten Sap et. al., ATOMIC: An atlas of Machine Commonsense for If-Then Reasoning, arXiv:1811.00146 [3] Antonie Bosselut et. al., COMET: Commonsense Transformers for Automatic Knowledge Graph Construction, ACL 2019 [4] Tianyi Zhang et. al., BERTScore: Evaluating Text Generation with BERT, ICLR 2020. [5] Robyn Speer et. al., ConceptNet 5.5: An Open Multilingual Graph of General Knowledge, arXiv:1612.03975 [6] MIT medialab, URL: https://conceptnet.io/