Maximally Invariant Data Perturbation as Explanation

Maximally Invariant Data Perturbation
as Explanation
Satoshi Hara (Osaka Univ., Japan)
Kouichi Ikeno (Osaka Univ., Japan)
Tasuku Soma (The Univ. of Tokyo, Japan)
Takanori Maehara (RIKEN AIP, Japan)
1
2018 Workshop on Human Interpretability
in Machine Learning (WHI)

Let’s define “the solution.”
Let’s define “the solution”
of
feature scoring/attribution.
2
Motivation

Let’s define “the solution.”
Let’s define “the solution”
of
feature scoring/attribution.
3
What is “the solution” we want to obtain?
Motivation

[Motivation] Let’s define “the solution.”
n What is “the solution” we want to obtain?
• So far, no rigid definition of the solution.
Setting a goal is the driving force.
n By defining a solution, we can
• design algorithms to estimate the solution.
• improve algorithms to perform better.
• evaluate whether the algorithm is good.
4
Motivation

[Contribution]
n New definition and methods for feature scoring.
• We use the idea of data perturbation.
5
Proposed
Contribution

[Idea] Measure the allowable input perturbation.
Our Idea
Feature Irrelevance ≈ Size of Allowable Perturbation
n Measure the robustness of the model’s decision against
the input perturbation.
• To keep the model’s decision to be “bear”:
6
Relevant Features
Only small perturbations are allowed.
Irrelevant Features
Large perturbations are allowed.
Idea

Illustrative Example
n Our Idea
1. Given the model 𝒇, and the point 𝒙 to be explained.
2. Fit the boundary from inside using the box.
3. Measure the side length as the irrelevance of each feature.
7
𝑥%
𝑥&
decision boundary of 𝑓
𝑥

n Our Idea
1. Given the model 𝑓, and the point 𝑥 to be explained.
8
𝑥%
𝑥&
𝑥

n Our Idea
1. Given the model 𝑓, and the point 𝑥 to be explained.
9
𝑥%
𝑥&
𝑥
The irrelevance of the feature 𝑥%.
(Short = Highly relevant)
The irrelevance
of the feature 𝑥&.
(Long = Less
relevant)

[Mathematical Definition]
n Consider the box 𝑅 𝑢, 𝑣 .
• 𝑅 𝑢, 𝑣 ≔ −𝑢%, 𝑣% × −𝑢&, 𝑣& × ⋯×[−𝑢1, 𝑣1]
n Def. Maximal Invariant Perturbation
𝑢3, 𝑣3 = argmax;,<=> ∑ 𝑢@ + 𝑣@@
s. t. 𝑐 = argmaxG 𝑓G 𝑥 + 𝑟 , ∀𝑟 ∈ 𝑅(𝑢, 𝑣)
• Find the largest box that fits the boundary from inside.
n Def. Feature Relevance
Measure the relevance of the feature 𝑥@ by −(𝑢3@ + 𝑣3@).
10
𝑥
𝑣%𝑢%
𝑣&
𝑢&
Mathematical Definition

[Algorithms] LP w/ linear approximation.
n Difficulty
• The constraint 𝑐 = argmaxG 𝑓G 𝑥 + 𝑟 is highly complex.
n Idea: Use linear approximation.
• 𝑓G 𝑥 + 𝑟 ≈ 𝑓G 𝑥 + 𝛻𝑓G 𝑥 N 𝑟
n Approximate Problem
𝑢3, 𝑣3 = argmax;,<=> ∑ 𝑢@ + 𝑣@@
s. t. 𝑓O 𝑥 + 𝛻𝑓O 𝑥 N 𝑟 ≥ 𝑓G 𝑥 + 𝛻𝑓G 𝑥 N 𝑟,
∀𝑗 ≠ 𝑐, ∀𝑟 ∈ 𝑅 𝑢, 𝑣 ∩ {𝑟: 𝑟 V ≤ 𝛿}
cf. 𝑓O 𝑥 + 𝑟 ≥ 𝑓G 𝑥 + 𝑟 ⇔ 𝑐 = argmaxG 𝑓G 𝑥 + 𝑟
• Several extensions of the LP formulation in the paper.
11
Linear Programming
Linear objective + Linear constraints
Limit the perturbation
size to be smaller than 𝛿.
Algorithm

[Experiment] Evaluation on VGG16
n Model: VGG16 [Simonyan & Zisserman, ICLR’15]
n Dataset：COCO-animals (cs231n.stanford.edu/coco-animals.zip)
• Images of eight species of animals.
• Use 200 images in the validation set.
n Baseline methods
• saliency (https://github.com/PAIR-code/saliency)
- Gradient [Simonyan et al., arXiv’14]
- GuidedBP [Springenberg et al., arXiv’14]
- SmoothGrad [Smilkov et al., arXiv’17]
• DeepExplain (https://github.com/marcoancona/DeepExplain)
- LRP [Bach et al., PloS ONE’15]
- IntegratedGrad [Sundararajan et al., arXiv’17]
- DeepLIFT [Shrikumar et al., ICML’17]
- Occlusion
12
Experiment

n Evaluation: Identifying relevant image patches.
1. Compute the sum of scores for each of 8 × 8 patches.
2. Flip top-K important patches to gray.
3. Observe whether the decision has changed to other classes.
n Result
13
Proposed Methods
Most effective in identifying
relevant image patches.
More than half of the images
resulted in class changes with
a few flips.
Experiment

n Result Examples
• Proposed methods: Highlight entire body of the bear.
• Other methods: Highlight mostly the head.
14
Experiment

n Result Examples
• Proposed methods: Highlight entire body of the bear.
• Other methods: Highlight mostly the head.
15
Experiment

n Let’s define “the solution.”
• Setting a goal is the driving force.
n Proposed a new definition of feature scoring/attribution.
• Measure the robustness of the model’s decision against the
input perturbation.
n Proposed an LP-based algorithm.
• Use linear approximation.
n The algorithm is still primitive.
• Cannot fully handle the non-linearity of 𝑓.
• Better algorithms under preparation.
16
Summary
Sample Code in GitHub: sato9hara/PertMap

[Algorithms] Approx. LP w/ several extensions.
n Extension 1: Soft-Constraint
• 𝑓O 𝑥 + 𝛻𝑓O 𝑥 N 𝑟 + 𝑤 ≥ 𝑓G 𝑥 + 𝛻𝑓G 𝑥 N 𝑟
- Allow the constraint violations up to the tolerance 𝑤 > 0.
n Extension 2: Parameter-Sharing
• For a feature subset 𝐼, share parameters:
(𝑢@, 𝑣@) = (𝑢@`, 𝑣@`) = (𝑢a, 𝑣a), ∀𝑖, 𝑖′ ∈ 𝐼
n Extension 3: Smoothing
• Use additional constraints (w/ sufficiently small 𝑛):
- Gain more information of the model 𝑓.
18
Spatial smoothness of relevance
≈ Parameter sharing
𝑓O 𝑥 + 𝑛 + 𝛻𝑓O 𝑥 + 𝑛 N(𝑟 − 𝑛) ≥ 𝑓G 𝑥 + 𝑛 + 𝛻𝑓G 𝑥 + 𝑛 N(𝑟 − 𝑛)
Algorithm

Maximally Invariant Data Perturbation as Explanation

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Maximally Invariant Data Perturbation as Explanation

Ähnlich wie Maximally Invariant Data Perturbation as Explanation (20)

Mehr von Satoshi Hara

Mehr von Satoshi Hara (12)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Maximally Invariant Data Perturbation as Explanation