[2024]Digital Global Overview Report 2024 Meltwater.pdf
Explanation in Machine Learning and Its Reliability
1. NeurIPS Meetup Japan 2021, Satoshi Hara
Explanation in ML
and Its Reliability
Satoshi Hara
Osaka University
1
NeurIPS Meetup Japan 2021
2. NeurIPS Meetup Japan 2021, Satoshi Hara
“Explanation” in ML
◼ Most of ML models are highly complex, or “black-box”.
◼ “Explanation in ML”: Obtain some useful information
from the model (in addition to prediction).
2
Preliminary
You are
sick.
Why?
Your XX
score is
too high.
You are
sick.
Why?
???
I don’t
know.
…
XX score is
too high.
Oh…
3. NeurIPS Meetup Japan 2021, Satoshi Hara
[Typical Explanation 1] Saliency Map
◼ Generate heatmaps where the model has focused on
when making predictions.
3
Preliminary
The outline of zebra
seems to be relevant.
4. NeurIPS Meetup Japan 2021, Satoshi Hara
[Typical Explanation 2] Similar Examples
◼ Provide some similar examples to the input of interest.
4
These images look similar.
The prediction “Lapwing” will
be correct.
Lapwing
Database
Provide some similar examples
Input
Prediction
Lapwing
Preliminary
5. NeurIPS Meetup Japan 2021, Satoshi Hara
History of “Explanation”
◼ History of Saliency Map
5
Dawn
2014 2016 2018 2020
2015 2017 2019
Exponential Growth of
Saliency Map Algos
Attack & Manipulation
Sanity Check
[Adebayo+,2018]
GuidedBP
[Springenberg+,2014]
DeepLIFT
[Shrikumar+,2017]
Grad-CAM
[Selvaraju+,2017]
ROAR
[Hooker+,2019]
MoRF/Deletion Metric
[Bach+,2015; Vitali+,2018]
LeRF/Insertion Metric
[Arras+,2017; Vitali+,2018]
Sensitivity
[Kindermans+,2017]
Evaluation Methods
Saliency
[Simonyan+,2014]
IntGrad
[Sundararajan+,2017]
SHAP
[Lundberg+,2017]
LIME
[Ribeiro+,2016]
LRP
[Bach+,2015]
Fairwashing
[Aivodji+,2019]
SmoothGrad
[Smilkov+,2017]
DeepTaylor
[Montavon+,2017]
Occlusion
[Zeiler+,2014]
CAM
[Zhou+,2016]
Manipulation
[Domobrowski+,2019]
The papers on “Explanation”
increased exponentially.
2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008
800
700
600
500
400
300
200
100
0
Searched
“Interpretable Machine Learning”
and
“Explainable AI”
on Web of Science
Preliminary
6. NeurIPS Meetup Japan 2021, Satoshi Hara
History of “Explanation”
◼ History of Saliency Map
6
Dawn
2014 2016 2018 2020
2015 2017 2019
Exponential Growth of
Saliency Map Algos
Attack & Manipulation
Sanity Check
[Adebayo+,2018]
GuidedBP
[Springenberg+,2014]
DeepLIFT
[Shrikumar+,2017]
Grad-CAM
[Selvaraju+,2017]
ROAR
[Hooker+,2019]
MoRF/Deletion Metric
[Bach+,2015; Vitali+,2018]
LeRF/Insertion Metric
[Arras+,2017; Vitali+,2018]
Sensitivity
[Kindermans+,2017]
Evaluation Methods
Saliency
[Simonyan+,2014]
IntGrad
[Sundararajan+,2017]
SHAP
[Lundberg+,2017]
LIME
[Ribeiro+,2016]
LRP
[Bach+,2015]
Fairwashing
[Aivodji+,2019]
SmoothGrad
[Smilkov+,2017]
DeepTaylor
[Montavon+,2017]
Occlusion
[Zeiler+,2014]
CAM
[Zhou+,2016]
Manipulation
[Domobrowski+,2019]
The papers on “Explanation”
increased exponentially.
800
700
600
500
400
300
200
100
0
Searched
“Interpretable Machine Learning”
and
“Explainable AI”
on Web of Science
Reliability of “Explanation” has raised
as a crucial concern.
Are the “Explanation” truly valid?
With “Explanation”, how malicious
we can be?
Preliminary
2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008
7. NeurIPS Meetup Japan 2021, Satoshi Hara
Technical / Social Reliability of “Explanation”
Technical Reliability “Is the explanation valid?”
What we care:
• Do the algorithms output valid “Explanation”?
Research Question:
• How can we evaluate the validity of “Explanation”?
Social Reliability “Does explanation harm the society?”
What we care:
• What will happen if we introduce “Explanation” to society?
Research Question:
• Are there any malicious use cases of “Explanation”?
7
Technical Reliability
8. NeurIPS Meetup Japan 2021, Satoshi Hara
Faithfulness & Plausibility of “Explanation”
◼ Faithfulness [Lakkaraju+’19; Jacovi+’20]
• Does “Explanation” reflect the model’s reasoning process?
- Our interest is “How and why the model predicted that way.”
• Any “Explanation” irrelevant to the reasoning process is invalid.
- e.g. “Explanation” outputs something independent of the model.
◼ Plausibility [Lage+’19; Strout+’19]
• Does “Explanation” make sense to the users?
• Any “Explanation” unacceptable by the users is not ideal.
- e.g. Entire program code; Very noisy saliency map.
8
Technical Reliability
9. NeurIPS Meetup Japan 2021, Satoshi Hara
Evaluation of “Explanation”
◼ Based on Faithfulness
• Sanity Checks for Saliency Maps, NeurIPS’18.
- Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, Been Kim
• An epoch-making paper by Google Brain.
• Evaluation of Faithfulness for saliency maps.
◼ Based on Plausibility
• Evaluation of Similarity-based Explanations, ICLR’21.
- Kazuaki Hanawa, Sho Yokoi, Satoshi Hara, Kentaro Inui
• Evaluation of Plausibility for similarity-based explanations.
9
10. NeurIPS Meetup Japan 2021, Satoshi Hara
Evaluation of Saliency Map
◼ Plausibility
• All the maps look more or less plausible.
• Gradient, IntegratedGrad are bit noisy.
◼ Faithfulness?
10
Technical Reliability
The outline of zebra
seems to be relevant.
11. NeurIPS Meetup Japan 2021, Satoshi Hara
Evaluation of Faithfulness is Not Possible.
◼ Faithfulness
• Does “Explanation” reflect the model’s reasoning process?
◼ Alternative: Sanity Check
• Check the necessary condition for faithful “Explanation”.
◼ Q. What is the necessary condition?
• “Explanation” is model-dependent.
- Any “Explanation” irrelevant to the reasoning process is invalid.
11
Unknown
→ We cannot compare with Ground Truth.
[Remark] Passing Sanity Check alone
does not guarantee faithfulness.
Technical Reliability
12. NeurIPS Meetup Japan 2021, Satoshi Hara
Model Parameter Randomization Test
◼ Compare “Explanation” of two models with different
reasoning processes.
• Faithful “Explanation” → Outputs are different.
• Non-Faithful “Explanation” → Outputs can be identical.
12
Satisfies the necessary condition.
Passed the sanity check.
Technical Reliability
[Assumption]
These models have
different reasoning
processes.
Model 1: Fully Trained Model 2: Randomly Initialized
Input “Explanation”
by Algo. 1
“Explanation”
by Algo. 2
“Explanation” by Algo. 1 are different.
“Explanation” by Algo. 2 are identical.
Violates the necessary condition.
Failed the sanity check.
13. NeurIPS Meetup Japan 2021, Satoshi Hara
Model Parameter Randomization Test
◼ Model 2: DNN with last few layers randomized.
• Saliency Maps of Guided Backprop and Guided GradCAM are
invariant against model randomization.
→ They violate the necessary condition for faithfulness.
13
Model
1
Model
2
[Ref] Sanity Checks for Saliency Maps
Technical Reliability
14. NeurIPS Meetup Japan 2021, Satoshi Hara
Evaluation of “Explanation”
◼ Based on Faithfulness
• Sanity Checks for Saliency Maps, NeurIPS’18.
- Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, Been Kim
• An epoch-making paper by Google Brain.
• Evaluation of Faithfulness for saliency maps.
◼ Based on Plausibility
• Evaluation of Similarity-based Explanations, ICLR’21.
- Kazuaki Hanawa, Sho Yokoi, Satoshi Hara, Kentaro Inui
• Evaluation of Plausibility for similarity-based explanations.
14
15. NeurIPS Meetup Japan 2021, Satoshi Hara
Evaluation of Similarity-based Explanation
◼ Faithfulness
• We can use Model Parameter Randomization Test.
◼ Plausibility?
15
These images look similar.
The prediction “Lapwing” will
be correct.
Lapwing
Database
Provide some similar examples
Input
Prediction
Lapwing
Technical Reliability
16. NeurIPS Meetup Japan 2021, Satoshi Hara
Plausibility in Similarity-based Explanation
◼ Example
• Explanation B won’t be acceptable by the users.
- Plausibility of Explanation A > Plausibility of Explanation B
16
Database
frog
Explanation A
Database
truck
Explanation B
frog
Input
Prediction
Technical Reliability
17. NeurIPS Meetup Japan 2021, Satoshi Hara
Evaluation of Plausibility is Not Possible.
◼ There is no universal criterion that determines the
acceptability of the users.
◼ Alternative: Sanity Check
• Check the necessary condition for faithful “Plausibility”.
◼ Q. What is the necessary condition?
• Obtained similar instance should belong to the same class.
17
is cat because a similar is cat.
is cat because a similar is dog.
Plausible
Non-Plausible
Identical Class Test
Technical Reliability
18. NeurIPS Meetup Japan 2021, Satoshi Hara
Identical Class Test
18
Input
Dot Last Layer
All Layers
Input
Cos Last Layer
All Layers
Input
L2 Dist. Last Layer
All Layers
Influence Function
Relative IF
Fisher Kernel
Dot
Cos
Parameter Grad.
Fraction of Test Instances Passed Identical Class Test
0 0.5 1.0 0 0.5 1.0
(Image Clf.)
CIFAR10
+ CNN
(Text Clf.)
AGNews
+ Bi-LSTM
Cosine similarity of the
parameter gradient
performed almost perfectly.
Technical Reliability
19. NeurIPS Meetup Japan 2021, Satoshi Hara
Cosine of Parameter Gradient
• GC 𝑧, 𝑧′ =
∇𝜃ℓ 𝑦,𝑓𝜃 𝑥 ,∇𝜃ℓ 𝑦′,𝑓𝜃 𝑥′
∇𝜃ℓ 𝑦,𝑓𝜃 𝑥 ∇𝜃ℓ 𝑦′,𝑓𝜃 𝑥′
19
Sussex spaniel beer bottle mobile house
Technical Reliability
20. NeurIPS Meetup Japan 2021, Satoshi Hara
Technical / Social Reliability of “Explanation”
Technical Reliability “Is the explanation valid?”
What we care:
• Do the algorithms output valid “Explanation”?
Research Question:
• How can we evaluate the validity of “Explanation”?
Social Reliability “Does explanation harm the society?”
What we care:
• What will happen if we introduce “Explanation” to society?
Research Question:
• Are there any malicious use cases of “Explanation”?
20
Social Reliability
21. NeurIPS Meetup Japan 2021, Satoshi Hara
Malicious Use Cases of “Explanation”
◼ Q. Are there malicious use cases of “Explanation”?
A. Some may try to deceive people
by providing fake explanations.
◼ Q. When and why fake explanations can be used?
A. Fake explanations can show models better,
e.g., by pretending as if the models are fair.
◼ Q. Why we need to research fake explanations?
Are you evil?
A. We need to know how malicious we can be with fake
explanations. Otherwise, we cannot defend against
possible maliciousness.
21
Social Reliability
22. NeurIPS Meetup Japan 2021, Satoshi Hara
Fake “Explanation” for Fairness
◼ Fairness in ML
• Models can be biased towards gender, race, etc.
• Ensuring fairness of the models is crucial nowadays.
◼ What if we cannot detect the use of unfair models?
• Some may use unfair models.
- Unfair models are typically more accurate than the fair ones.
22
Social Reliability
Our model is the most accurate one in this business field.
(because of the use of unfair yet accurate model)
Moreover, our model is fair without any bias.
(by showing fake explanation)
23. NeurIPS Meetup Japan 2021, Satoshi Hara
Fake “Explanation” for Fairness
◼ Fake “Explanation” by Surrogate Models
• Fairwashing: the risk of rationalization, ICML’19.
- Ulrich Aïvodji, Hiromi Arai, Olivier Fortineau, Sébastien Gambs, Satoshi Hara, Alain Tapp
• Characterizing the risk of fairwashing, NeurIPS’21.
- Ulrich Aïvodji, Hiromi Arai, Sébastien Gambs, Satoshi Hara
◼ Fake “Explanation” by Examples
• Faking Fairness via Stealthily Biased Sampling, AAAI’20.
- Kazuto Fukuchi, Satoshi Hara, Takanori Maehara
◼ Ref.
• It’s Too Easy to Hide Bias in Deep-Learning Systems,
IEEE Spectrum, 2021.
23
24. NeurIPS Meetup Japan 2021, Satoshi Hara
The risk of “Fairwashing”
◼ Explaining fairness
24
an honest explanation
Your loan application is rejected
because your gender is …
Unfair AI: reject applicants
based on their gender.
Social Reliability
25. NeurIPS Meetup Japan 2021, Satoshi Hara
The risk of “Fairwashing”
◼ Explaining fairness
25
a dishonest explanation
Your loan application is rejected
because your income is low.
Unfair AI: reject applicants
based on their gender.
Social Reliability
26. NeurIPS Meetup Japan 2021, Satoshi Hara
The risk of “Fairwashing”
◼ Explaining fairness
26
Unfair AI: reject applicants
based on their gender.
a dishonest explanation
Your loan application is rejected
because your income is low.
“Fairwashing”
Malicious decision-makers can disclose a fake
explanation to rationalize their unfair decisions.
“Fairwashing”
Social Reliability
27. NeurIPS Meetup Japan 2021, Satoshi Hara
The risk of “Fairwashing”
◼ Explaining fairness
27
Unfair AI: reject applicants
based on their gender.
a dishonest explanation
Your loan application is rejected
because your income is low.
This Study: LaundryML
Possible to systematically generate
fake explanations.
Raise the awareness of the risk of
“Fairwashing”.
“Fairwashing”
Malicious decision-makers can disclose a fake
explanation to rationalize their unfair decisions.
“Fairwashing”
Social Reliability
28. NeurIPS Meetup Japan 2021, Satoshi Hara
◼ The idea
Generate many explanations,
and pick one that is useful for “Fairwashing”.
◼ many explanations
• Use “Model Enumeration” [Hara & Maehara’17; Hara & Ishihata’18]
• Enumerate explanation models.
◼ pick one
• Use fairness metrices such as demographic parity (DP).
• Pick an explanation most faithful to the model, with DP less
than a threshold.
28
LaundryML
Systematically generating fake explanations
The idea
Social Reliability
29. NeurIPS Meetup Japan 2021, Satoshi Hara
Result
◼ “Fairwashing” for decisions on Adult dataset
• Feature importance by FairML on “gender” has dropped.
29
A naïve explanation A fake explanation
gender
gender
Social Reliability
30. NeurIPS Meetup Japan 2021, Satoshi Hara
Result
◼ “Fairwashing” for decisions on Adult dataset
• Feature importance by FairML on “gender” has dropped.
30
A naïve explanation A false explanation
gender
gender
If
else if
else if
else if
else if
else low-income
then high-income
then low-income
then low-income
then low-income
then high-income
capital gain > 7056
marital = single
education = HS-grad
occupation = other
occupation = white-colloar
Fake Explanation
Social Reliability
31. NeurIPS Meetup Japan 2021, Satoshi Hara
Fake “Explanation” for Fairness
◼ Fake “Explanation” by Surrogate Models
• Fairwashing: the risk of rationalization, ICML’19.
- Ulrich Aïvodji, Hiromi Arai, Olivier Fortineau, Sébastien Gambs, Satoshi Hara, Alain Tapp
• Characterizing the risk of fairwashing, NeurIPS’21.
- Ulrich Aïvodji, Hiromi Arai, Sébastien Gambs, Satoshi Hara
◼ Fake “Explanation” by Examples
• Faking Fairness via Stealthily Biased Sampling, AAAI’20.
- Kazuto Fukuchi, Satoshi Hara, Takanori Maehara
◼ Ref.
• It’s Too Easy to Hide Bias in Deep-Learning Systems,
IEEE Spectrum, 2021.
31
32. NeurIPS Meetup Japan 2021, Satoshi Hara
Fairness Metrics
◼ Quantifying fairness of the models
• Several metrics + toolboxes
- FairML, AI Fairness 360 [Bellamy+’19], Aequitas [Saleiro+’18]
32
AI Fairness 360
Social Reliability
33. NeurIPS Meetup Japan 2021, Satoshi Hara
Fake Fairness Metrics
33
Malicious Party
Unfair Model
Service
Fairness Metric
as Evidence
Is this a fake
metric.
No guarantee whether the metrics are
computed appropriately.
Impossible to determine fake or not.
Metric alone is not a valid evidence of fairness.
Social Reliability
34. NeurIPS Meetup Japan 2021, Satoshi Hara
Avoiding Fake Fairness Metrics
34
Malicious Party
Unfair Model
Service
Benchmark Data
as Evidence
Fairness metric computed
on the benchmark is fair!
The metric is reproducible
using the benchmark data.
We can avoid fake!
Social Reliability
35. NeurIPS Meetup Japan 2021, Satoshi Hara
(Failed) Avoiding Fake Fairness Metrics
35
Malicious Party
Unfair Model
Service
Benchmark Data
as Evidence
Fairness metric computed
on the benchmark is fair!
The metric is reproducible
using the benchmark data.
We can avoid fake!
The benchmark data can be fake.
Social Reliability
36. NeurIPS Meetup Japan 2021, Satoshi Hara
Generating Fake Benchmark
◼ Subsample the benchmark dataset 𝑆
from the original dataset 𝐷.
◼ “Ideal” Fake Benchmark Dataset 𝑆
• Fairness : Fairness metric computed on 𝑆 is fair.
• Stealthiness : The distribution of 𝑆 is close to 𝐷.
36
Benchmark
Fairness
Stealthiness
“Fair” Contingency Table
Original dataset
Social Reliability
37. NeurIPS Meetup Japan 2021, Satoshi Hara
参照用
データ
Goodness-of-Fit Test
Generating Fake Benchmark
◼ Optimization of 𝑆 as LP (Min-Cost Flow)
min𝑆𝑊 𝑆, 𝐷 , s. t. 𝐶 𝑆 = 𝐶𝑇
◼ Detection of fake benchmark using statistical test.
• Min. Distribution diff. ≈ small detection probability
• Rejecting 𝑝 𝑆 = 𝑝(𝐷′) with KS test is probability
at most 𝑂 𝑆 𝛼 × Distribution diff.
37
Stealthiness
(Min. Distribution diff.)
Fairness
(Constraint on Contingency Table)
Reference
Data
Social Reliability
38. NeurIPS Meetup Japan 2021, Satoshi Hara
Undetectability of Fake Benchmark
38
Positive Cases in Contingency Table Positive Cases in Contingency Table
Fairness
Metric
(DP)
Distribution
diff.
COMPAS
Positive Cases in Contingency Table Positive Cases in Contingency Table
Fairness
Metric
(DP)
Distribution
diff.
Adult
Random Sampling
Case-Control Sampling
Proposed Sampling
Random Sampling
Case-Control Sampling
Proposed Sampling
Proposed sampling resulted to
fairer metric.
(= achieved fake fairness)
Proposed sampling attained distribution
almost identical to the original distribution.
(= undetectable)
Social Reliability
39. NeurIPS Meetup Japan 2021, Satoshi Hara
Technical / Social Reliability of “Explanation”
Technical Reliability “Is the explanation valid?”
What we care:
• Do the algorithms output valid “Explanation”?
Research Question:
• How can we evaluate the validity of “Explanation”?
Social Reliability “Does explanation harm the society?”
What we care:
• What will happen if we introduce “Explanation” to society?
Research Question:
• Are there any malicious use cases of “Explanation”?
39
Summary
40. NeurIPS Meetup Japan 2021, Satoshi Hara
Technical / Social Reliability of “Explanation”
Technical Reliability “Is the explanation valid?”
What we care:
• Do the algorithms output valid “Explanation”?
Research Question:
• How can we evaluate the validity of “Explanation”?
Social Reliability “Does explanation harm the society?”
What we care:
• What will happen if we introduce “Explanation” to society?
Research Question:
• Are there any malicious use cases of “Explanation”?
40
Summary
How can we evaluate the validity of “Explanation”?
Which evaluation is good for which “Explanation”?
When “Explanation” can be used maliciously?
Can we detect malicious use cases?