[全脳アーキテクチャ若手の会45回カジュアルトーク]敵対的サンプル

AIを騙す
~敵対的サンプル
~
東京大学大学院
情報理工学系研究科
数理情報学専攻
福地成彦

コンテンツ
1. 自己紹介
2. 敵対的サンプルとは
3. 作り方
4. 防ぎ方
5. 敵対的サンプルの原因
6. 精度 vs 安全性
7. ニューラルネットワークの特性
8. まとめ
2019/11/29ADVERSARIAL EXAMPLE, FUKUCHI AKIHIKO 2

自己紹介
 福地成彦 (Fukuchi Akihiko)
 東京大学大学院情報理工学系研究科数字情報学専攻修士課程
 専門：reservoir computing, NLP
 今回話す内容は趣味
 つぶグミが好き

以下の画像は何？
1.パンダ
2.プードル
3.テナガザル
I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and Harnessing
Adversarial Examples,” Dec. 2014.
画像１

GoogleNetは以下の画像を何と判定する？
1.パンダ
2.プードル
3.テナガザル
画像２

敵対的サンプルとは
 機械学習のモデルが間違って識別するように加工された入力
 大抵の場合、ヒトにはわからない程度に加工されている
テナガザル
+ =
I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and Harnessing
Adversarial Examples,” Dec. 2014.

作り方
攻撃対象のネットワークにアクセスできる場合
（何でも良いから不正解にしたい場合）
パンダ
P(c|x)
cross entropy (予測と正解の差 )
cross entropy を上げる
+

モデルが無くても攻撃できる; 敵対的サンプルの転移性
モデルAパンダプードル
モデルB
(未知)
プードル
プードル

防御方法の一例; ADVERSARIAL TRAINING
敵対的サンプルを防ぎたいなら、敵対的サンプルを学習させればよい
プードルパンダ
パンダプードルではなく
* 敵対的サンプルなどのノイズで
外乱されないことを「ロバスト」
という

なぜ敵対的生成ができるのか？
複数の原因が指摘されている
 Batch normalization
 多次元性
 ニューラルネットワークの線形性
 ニューラルネットワークの精度

精度 VS 安全性
 D. Tsipras, S. Santurkar, L. Engstrom, A.
Turner, and A. Madry, “Robustness May Be
at Odds with Accuracy,” 2018.
 Adversarial trainingを行ったCNNは標準の
学習をしたCNNに比べて低精度
 精度と敵対的サンプルへの防御力は両立
しない！？
* CNN: 畳み込みニューラルネットワーク

精度 VS 安全性
 D. Su, H. Zhang, H. Chen, J. Yi, P. Y. Chen, and Y.
Gao, “Is robustness the cost of accuracy? – A
comprehensive study on the robustness of 18
deep image classification models,” Lect. Notes
Comput. Sci. (including Subser. Lect. Notes Artif.
Intell. Lect. Notes Bioinformatics), vol. 11216
LNCS, pp. 644–661, Aug. 2018.
 CNNのモデルのロバスト性の比較
 標準の精度が高いモデルが高いほ
ど敵対的サンプルの攻撃に弱い
 精度と安全性のトレードオフ

そもそもCNNはヒトと違う特徴を見てる
 R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann, and W. Brendel, “ImageNet-trained
CNNs are biased towards texture; increasing shape bias improves accuracy and robustness,” Nov. 2018.
 CNNはテクスチャを判断の根拠にしがち。あまり形（空間的な配置）を見ていない。

CNNは何を見ているんだ？
 A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B.
Tran, and A. Madry, “Adversarial Examples Are
Not Bugs, They Are Features,” May 2019.
 Adversarial trainingをしたネットワークを用い
て、敵対的サンプルに対して「ロバストな特徴
量」と「ロバストでない特徴量」を抽出
 人間には知覚できないような「弱い」特徴量を
使うことで、高精度を出しているのでは？
＊「ロバストでない特徴量」については批判的な議論もある
L. Engstrom et al., “A Discussion of ‘Adversarial Examples Are Not Bugs, They Are Features,’” Distill,
vol. 4, no. 8, p. e19, Aug. 2019.
“airplane’’ “ship’’ “dog’’ “frog’’“truck’’
DbDNR
bDR
(a)

まとめ
 敵対的サンプル (adversarial examples)は機械学習のモデルが間違って識別す
るように加工された入力
 手元にターゲットとなるモデルがなくても敵対的サンプルが可能
 Adversarial trainingをすれば、ある程度防御可能
 敵対的サンプルの原因はいくつか指摘されている
 CNNの高精度の裏には、敵対的サンプルに対する脆弱性が隠れているかも

参考文献
 [1] A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry, “Adversarial Examples Are Not Bugs, They Are Features,” May 2019.
 [2] D. Hendrycks and T. Dietterich, “Benchmarking Neural Network Robustness to Common Corruptions and Perturbations,” 2019.
 [3] D. Su, H. Zhang, H. Chen, J. Yi, P. Y. Chen, and Y. Gao, “Is robustness the cost of accuracy? – A comprehensive study on the robustness of 18 deep image
classification models,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11216 LNCS, pp. 644–661, Aug. 2018.
 [4] M. A. Alcorn et al., “Strike (with) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects,” Nov. 2018.
 [5] S. Thys, W. Van Ranst, and T. Goedemé, “Fooling automated surveillance cameras: adversarial patches to attack person detection,” 2019.
 [6] D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry, “Robustness May Be at Odds with Accuracy,” 2018.
 [7] R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann, and W. Brendel, “ImageNet-trained CNNs are biased towards texture; increasing shape bias
improves accuracy and robustness,” Nov. 2018.
 [8] A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok, “Synthesizing Robust Adversarial Examples,” 2018.
 [9] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and Harnessing Adversarial Examples,” Dec. 2014.
 [10] L. Engstrom, A. Ilyas, S. Santurkar, D. Tsipras, B. Tran, and A. Madry, “Learning Perceptually-Aligned Representations via Adversarial Robustness,” 2019.
 [11] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, “Practical Black-Box Attacks against Machine Learning,” Feb. 2016.
 [12] S. Santurkar, D. Tsipras, B. Tran, A. Ilyas, L. Engstrom, and A. Madry, “Computer Vision with a Single (Robust) Classifier,” Jun. 2019.
 [13] L. Engstrom et al., “A Discussion of ‘Adversarial Examples Are Not Bugs, They Are Features,’” Distill, vol. 4, no. 8, p. e19, Aug. 2019.

ご清聴ありがとうございました

作り方
攻撃対象のネットワークにアクセスできる場合
（特定のラベルに誤識別させたい場合）
プードル
P(c|x)
cross entropy (予測と正解の差 )
cross entropy
を下げる
+

いろいろな敵対的サンプル
物体検出器で人間と識別されないようなパッチを作成
S. Thys, W. Van Ranst, and T. Goedemé, “Fooling automated surveillance cameras: adversarial patches to attack person detection,” 2019.

ニューラルネットワークが誤識別するような置物を3Dプリンターで印刷
A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok, “Synthesizing Robust Adversarial Examples,” 2018.

M. A. Alcorn et al., “Strike (with) a Pose: Neural Networks Are
Easily Fooled by Strange Poses of Familiar Objects,” Nov.
2018.
ニューラルネットワークが誤識別するような3D オブジェクトの配置を生成

[全脳アーキテクチャ若手の会45回カジュアルトーク]敵対的サンプル

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (7)

Featured

Featured (20)

[全脳アーキテクチャ若手の会45回カジュアルトーク]敵対的サンプル