20190619 オートエンコーダーと異常検知入門

Kazuki Motohashi - Skymind K.K.実践者向けディープラーニング勉強会第4回 - 19/June/2019
スカイマインド株式会社
本橋和貴
オートエンコーダーと異常検知⼊⾨

Kazuki Motohashi - Skymind K.K.実践者向けディープラーニング勉強会第4回 - 19/June/2019
‣本橋和貴 @kmotohas
- スカイマインド株式会社
• Deep Learning Engineer (前職ではDL+ROS)
- 素粒⼦物理学実験（LHC-ATLAS実験）出⾝
• 博⼠（理学）
- 好きな定理︓ネーターの定理
2
自己紹介
系に連続的な対称性がある場合はそれに対応する保存則が存在すると述べる定理である。(Wikipedia)

Kazuki Motohashi - Skymind K.K.実践者向けディープラーニング勉強会第4回 - 19/June/2019 3
‣ オートエンコーダーとは何か
- Convolutional Autoencoder
‣ 異常検知への適⽤
- テーブルデータ
- 画像データ
今回話すこと

オートエンコーダーとは
https://developer.oracle.com/databases/neural-network-machine-learning/
https://medium.com/@curiousily/credit-card-fraud-detection-using-autoencoders-in-keras-tensorﬂow-for-hackers-part-vii-20e0c85301bd
一般的なニューラルネットワーク
オートエンコーダー
f(x) = y<latexit sha1_base64="vukw9W0rGZmSt2cyXd3Jlpp4eYk=">AAACaXichVFNLwNBGH66vuuruAiXxoZwaWZLQiQS4eJIqyWhaXbXlNX9yu60UY0/4OQmOJGIiJ/h4g849CeII4mLg3e3mwiCdzIzzzzzPu88M6O5puELxhoxqaW1rb2jsyve3dPb158YGMz7TsXTeU53TMfb1FSfm4bNc8IQJt90Pa5amsk3tPJysL9R5Z5vOPa6qLm8YKm7tlEydFUQlS9NHkwt1IoJmaVYGMmfQImAjChWncQNtrEDBzoqsMBhQxA2ocKntgUFDC5xBdSJ8wgZ4T7HEeKkrVAWpwyV2DKNu7Tailib1kFNP1TrdIpJ3SNlEuPskd2yF/bA7tgTe/+1Vj2sEXip0aw1tdwt9h8PZ9/+VVk0C+x9qv70LFDCXOjVIO9uyAS30Jv66uHpS3Y+M16fYFfsmfxfsga7pxvY1Vf9eo1nLhCnD1C+P/dPkE+nlOlUem1GXlyKvqIToxjDJL33LBaxglXk6Nx9nOAM57FnaUAalkaaqVIs0gzhS0jyB5Ffi5s=</latexit>
f(x) = x<latexit sha1_base64="tBr0KvrUUkvqCjnc8u1rKrjq+Tk=">AAACaXichVFNSwJBGH7cvsw+1LpEXSQx6iKjBUUQSF06pqYJJrK7jba57i67q2jSH+jULapTQUT0M7r0Bzr4E8JjQZcOva4LUVK9w8w888z7vPPMjGSoimUz1vYIA4NDwyPeUd/Y+MSkPxCcylp6zZR5RtZV3cxJosVVReMZW7FVnjNMLlYlle9Jla3u/l6dm5aia7t20+CFqljWlJIiizZR2dJiY2mjUQyEWZQ5EeoHMReE4caOHrjDPg6gQ0YNVXBosAmrEGFRyyMGBoO4AlrEmYQUZ5/jBD7S1iiLU4ZIbIXGMq3yLqvRulvTctQynaJSN0kZQoQ9s3v2yp7YA3thH7/Wajk1ul6aNEs9LTeK/tOZ9Pu/qirNNg6/VH96tlHCmuNVIe+Gw3RvIff09ePz1/R6KtJaYDesQ/6vWZs90g20+pt8m+SpK/joA2I/n7sfZOPR2HI0nlwJJzbdr/BiDvNYpPdeRQLb2EGGzj3CGS5w6ekIQWFGmO2lCh5XM41vIYQ/AY9fi5o=</latexit>

オートエンコーダーのイメージ
圧縮解凍
中間のボトルネックで
データの効率的な「表現」を抽出
イメージはファイルの圧縮・解凍
Autoencoder
＝自己符号化器
教師が自分自身の
「教師なし学習」

Tensorﬂow (Keras) での実装例
入力と目標の出力はどちらも訓練データとして学習
用いるモジュールのインポート
データの読み込み、正規化
二次元の画像を一次元のベクトルに変換

テストデータの復元
テストデータを一次元ベクトルにしてモデルに入力
matplotlibで一枚めの画像の復元結果を確認
元データ復元されたデータ
28 28 = 784 から 100 次元まで情報圧縮しても
データをうまく復元できている
（符号化しても情報がほとんど失われていない）

CAE (Convolutional AutoEncoder)
全結合層ではなく畳み込み層を用いたオートエンコーダー
https://seﬁks.com/2018/03/23/convolutional-autoencoder-clustering-images-with-neural-networks/

MaxPooling2D で画像サイズを小さくする
UpSampling2D で画像サイズを大きくする
Keras を用いた CAE の実装例

Keras を用いた CAE の実装例
MaxPooling2D で画像サイズを小さくする
UpSampling2D で画像サイズを大きくする

CAE による画像の符号化と復号化

同じモデルでもうまく復号できないケース
訓練データセットと異なる分布のデータはうまく復号できない
ノイズが乗っていたり形が特殊だったり

同じモデルでもうまく復号できないケース
訓練データセットと異なる分布のデータはうまく復号できない
ノイズが乗っていたり形が特殊だったり
異常検知に使える!!

異常検知タスクの例
Deep Learning for Anomaly Detection: A Survey
動画画像
時系列
ログ
（テキスト）

‣ 「異常」というくらいなのでデータ数が少ない
- ⼀般的に「正常データ数 >> 異常データ数」
‣ 分類問題として教師あり学習で解くのは難しい
‣ 教師なし学習（もしくは半教師あり学習）で解くことが多い
- 正常データのみで学習すると異常データの再構成誤差が⼤きくなることを利⽤
異常検知タスクと教師なし学習

‣ ULB (ブリュッセル⾃由⼤学) 提供のテーブルデータセット
- https://www.kaggle.com/mlg-ulb/creditcardfraud
‣ PCAで匿名化された特徴量28個と時間・取引⾦額・正常/不正の⼆値ラベル
クレジットカード不正利用検知の例
縦軸ログスケール
正常データ（>> 不正データ）のみを用いてオートエンコーダーを学習

‣ 以降３ページ、DATASCIENCE.COMのチュートリアルより拝借
- https://www.datascience.com/blog/fraud-detection-with-tensorﬂow
オートエンコーダーの実装例
nb_epoch = 100
batch_size = 128
input_dim = train_x.shape[1] #num of columns, 30
encoding_dim = 14
hidden_dim = int(encoding_dim / 2) #i.e. 7
learning_rate = 1e-7
input_layer = Input(shape=(input_dim, ))
encoder = Dense(encoding_dim, activation="tanh",
activity_regularizer=regularizers.l1(learning_rate))(input_layer)
encoder = Dense(hidden_dim, activation="relu")(encoder)
decoder = Dense(hidden_dim, activation='tanh')(encoder)
decoder = Dense(input_dim, activation='relu')(decoder)
autoencoder = Model(inputs=input_layer, outputs=decoder)

再構成誤差 (MSE) の分布
Reconstruction Error Reconstruction Error
NumberofTransactions
NumberofTransactions
0 2 4 6 8 10 0 50 100 150 200 250
分布形状が優位に異なっている ➡ 閾値を決めて正常/異常判別
https://medium.com/@curiousily/credit-card-fraud-detection-using-autoencoders-in-keras-tensorﬂow-for-hackers-part-vii-20e0c85301bd

各データに対する再構成誤差の散布図と混合行列

MVTec Anomaly Detection Dataset (MVTec AD)
aset of
rial ac-
ct-free
mages
dust or
le kind
evalu-
fferent
cal in-
a 2007
provide
es with
h class
ng and
nnota-
nerated
arance
ore, ar-
pproxi-
aly de-
gested
give a
ct our-
meth-
initial
Figure 2: Example images for all five textures and ten ob-
ject categories of the MVTec AD dataset. For each cate-
2.1.2 Segmentation of Anomalous Regions
For the evaluation of methods that segment anomalies in
images, only very few public datasets are currently avail-
able. All of them focus on the inspection of textured sur-
faces and, to the best of our knowledge, there does not yet
exist a comprehensive dataset that allows for the segmenta-
tion of anomalous regions in natural images.
Carrera et al. [6] provide NanoTWICE,2
a dataset of
45 gray-scale images that show a nanofibrous material ac-
quired by a scanning electron microscope. Five defect-free
images can be used for training. The remaining 40 images
contain anomalous regions in the form of specks of dust or
flattened areas. Since the dataset only provides a single kind
of texture, it is unclear how well algorithms that are evalu-
ated on this dataset generalize to other textures of different
domains.
A dataset that is specifically designed for optical in-
spection of textured surfaces was proposed during a 2007
DAGM workshop by Wieler and Hahn [28]. They provide
ten classes of artificially generated gray-scale textures with
defects weakly annotated in the form of ellipses. Each class
comprises 1000 defect-free texture patches for training and
150 defective patches for testing. However, their annota-
tions are quite coarse and since the textures were generated
by very similar texture models, the variance in appearance
between the different textures is quite low. Furthermore, ar-
tificially generated datasets can only be seen as an approxi-
MVTecはドイツ・ミュンヘンの画像処理ソフトウェア開発会社
https://www.mvtec.com/company/research/datasets/
5,000枚以上の異常検知用画像データセットを公開
15種類の異なる物体とテクスチャの正常/異常データを含む
解説論文はコンピュータビジョンのトップ会議 CVPR 2019 で採択

Category
AE
(SSIM)
AE
(L2)
AnoGAN
CNN
Feature
Dictionary
Texture
Inspection
Variation
Model
Textures
Carpet
0.43
0.90
0.57
0.42
0.82
0.16
0.89
0.36
0.57
0.61
-
Grid
0.38
1.00
0.57
0.98
0.90
0.12
0.57
0.33
1.00
0.05
-
Leather
0.00
0.92
0.06
0.82
0.91
0.12
0.63
0.71
0.00
0.99
-
Tile
1.00
0.04
1.00
0.54
0.97
0.05
0.97
0.44
1.00
0.43
-
Wood
0.84
0.82
1.00
0.47
0.89
0.47
0.79
0.88
0.42
1.00
-
Objects
Bottle
0.85
0.90
0.70
0.89
0.95
0.43
1.00
0.06
-
1.00
0.13
Cable
0.74
0.48
0.93
0.18
0.98
0.07
0.97
0.24
- -
Capsule
0.78
0.43
1.00
0.24
0.96
0.20
0.78
0.03
-
1.00
0.03
Hazelnut
1.00
0.07
0.93
0.84
0.83
0.16
0.90
0.07
- -
Metal nut
1.00
0.08
0.68
0.77
0.86
0.13
0.55
0.74
-
0.32
0.83
Pill
0.92
0.28
1.00
0.23
1.00
0.24
0.85
0.06
-
1.00
0.13
Screw
0.95
0.06
0.98
0.39
0.41
0.28
0.73
0.13
-
1.00
0.10
Toothbrush
0.75
0.73
1.00
0.97
1.00
0.13
1.00
0.03
-
1.00
0.60
Transistor
1.00
0.03
0.97
0.45
0.98
0.35
1.00
0.15
- -
Zipper
1.00
0.60
0.97
0.63
0.78
0.40
0.78
0.29
- -
Table 2: Results of the evaluated methods when ap-
plied to the classification of anomalous images. For each
dataset category, the ratio of correctly classified samples
of anomaly-free (top row) and anomalous images (bottom
ment is not possible for every ob
strict the evaluation of this meth
(Table 2). We use 30 randomly se
each object category in its origin
and variance parameters at each p
are converted to gray-scale before
Anomaly maps are obtained b
of each test pixel’s gray value to
relative to its predicted standard
GMM-based texture inspection, w
plementation of the HALCON ma
4.2. Data Augmentation
Since the evaluated methods ba
are typically trained on large data
performed for these methods for b
For the texture images, we random
lar patches of fixed size from the
object category, we apply a random
Additional mirroring is applied w
We augment each category to crea
4.3. Evaluation Metric
Each of the evaluated method
spatial map in which large value
Regions
egment anomalies in
ts are currently avail-
ction of textured sur-
ge, there does not yet
ows for the segmenta-
mages.
WICE,2
a dataset of
nofibrous material ac-
cope. Five defect-free
remaining 40 images
m of specks of dust or
provides a single kind
rithms that are evalu-
r textures of different
igned for optical in-
oposed during a 2007
n [28]. They provide
ay-scale textures with
of ellipses. Each class
tches for training and
owever, their annota-
xtures were generated
ariance in appearance
low. Furthermore, ar-
be seen as an approxi-
pervised anomaly de-
s have been suggested
ntel et al. [20] give a
ork. We restrict our-
state-of-the art meth-
baseline for our initial
works
gory, the top row shows an anomaly-free image. The middle
全体的にオートエンコーダーを用いたモデルが強い
上段: 正常データの正解率
下段: 異常データの正解率

AE(L2) と AE(SSIM)
損失関数は必ず自乗誤差でなくてはいけないわけでもない
L2誤差 (自乗誤差) SSIM (Structual SIMilarity) index [2004]
Dosovitskiy and Brox (2016). It increases the quality of
the produced reconstructions by extracting features from
both the input image x and its reconstruction ˆx and enforc-
ing them to be equal. Consider F : Rk⇥h⇥w
! Rf
to be
a feature extractor that obtains an f-dimensional feature
vector from an input image. Then, a regularizer can be
added to the loss function of the autoencoder, yielding
the feature matching autoencoder (FM-AE) loss
LFM(x, ˆx) = L2(x, ˆx) + kF(x) F(ˆx)k2
2 , (3)
where > 0 denotes the weighting factor between the two
loss terms. F can be parameterized using the first layers of
a CNN pretrained on an image classification task. During
evaluation, a residual map RFM is obtained by comparing
the per-pixel `2
-distance of x and ˆx. The hope is that
sharper, more realistic reconstructions will lead to better
residual maps compared to a standard `2
-autoencoder.
3.1.4. SSIM Autoencoder. We show that employing more
elaborate architectures such as VAEs or FM-AEs does
not yield satisfactory improvements of the residial maps
over deterministic `2
-autoencoders in the unsupervised
defect segmentation task. They are all based on per-pixel
evaluation metrics that assume an unrealistic indepen-
dence between neighboring pixels. Therefore, they fail to
detect structural differences between the inputs and their
l(p, q) =
2µpµq + c1
µ2
p + µ2
q + c1
(5)
c(p, q) =
2 p q + c2
2
p + 2
q + c2
(6)
s(p, q) =
2 pq + c2
2 p q + c2
. (7)
The constants c1 and c2 ensure numerical stability and are
typically set to c1 = 0.01 and c2 = 0.03. By substituting
(5)-(7) into (4), the SSIM is given by
SSIM(p, q) =
(2µpµq + c1)(2 pq + c2)
(µ2
p + µ2
q + c1)( 2
p + 2
q + c2)
. (8)
It holds that SSIM(p, q) 2 [ 1, 1]. In particular,
SSIM(p, q) = 1 if and only if p and q are identical
(Wang et al., 2004). Figure 2 shows the different percep-
tions of the three similarity functions that form the SSIM
index. Each of the patch pairs p and q has a constant `2
-
residual of 0.25 per pixel and hence assigns low defect
scores to each of the three cases. SSIM on the other hand
is sensitive to variations in the patches’ mean, variance,
and covariance in its respective residual map and assigns
low similarity to each of the patch pairs in one of the
comparison functions.
training them purely on defect-free image data. During
testing, the autoencoder will fail to reconstruct defects that
have not been observed during training, which can thus be
segmented by comparing the original input to the recon-
struction and computing a residual map R(x, ˆx) 2 Rw⇥h
.
3.1.1. `2
-Autoencoder. To force the autoencoder to recon-
struct its input, a loss function must be defined that guides
it towards this behavior. For simplicity and computational
speed, one often chooses a per-pixel error measure, such
as the L2 loss
L2(x, ˆx) =
h 1X
r=0
w 1X
c=0
(x(r, c) ˆx(r, c))
2
, (2)
where x(r, c) denotes the intensity value of image x at
the pixel (r, c). To obtain a residual map R`2 (x, ˆx) during
evaluation, the per-pixel `2
-distance of x and ˆx is com-
puted.
3.1.2. Variational Autoencoder. Various extensions to
the deterministic autoencoder framework exist. VAEs
(Kingma and Welling, 2014) impose constraints on the
latent variables to follow a certain distribution z ⇠ P(z).
For simplicity, the distribution is typically chosen to be
人間が感じる違い
・画素値(輝度値)の変化
・コントラストの変化
・構造の変化
が符号化前後でどれくらい変化したかを表す指標
（画像を小さいWindowに分けて計算）
http://visualize.hatenablog.com/entry/2016/02/20/144657
https://dftalk.jp/?p=18111

AnoGAN [2017]
To be published in the proceedings of IPMI 2017
Fig. 2. (a) Deep convolutional generative adversarial network. (b) t-SNE embedding
of normal (blue) and anomalous (red) images on the feature representation of the last
convolution layer (orange in (a)) of the discriminator.
2.1 Unsupervised Manifold Learning of Normal Anatomical
Variability
GANを正常データで訓練し、正常データの生成モデル（確率分布）を構築
https://arxiv.org/abs/1703.05921

CNN Feature Dictionary [2018]
and outputs the best k clusters and corresponding k centroids. A centroid is the mean position of all
the elements of the cluster. For each cluster, we take the feature vector that is nearest to its centroid.
The set of these k feature vectors compose the dictionary W. Figure 4 shows the pipeline for dictionary
building. Figure 5 shows examples of dictionaries learned from images of the training set (anomaly
free) with different patch sizes and number of clusters. The ﬁgure shows the subregions corresponding
to each feature vector of the dictionary W.
Figure 4. Examples of dictionary achieved considering different patch sizes and different number
of subregions.
patch_size = 16 x 16
stride = 4
ResNet-18
512-dim avgpool layer
情報量95%
k-means
（クラスタ中心に一番近い
ベクトルを選択）
特徴量ベクトルのディクショナリの中で互いの距離がある閾値より遠いベクトルを異常領域とする
次元削減を行なっている部分
https://www.ncbi.nlm.nih.gov/pubmed/29329268

Category
AE
(SSIM)
AE
(L2)
AnoGAN
CNN
Feature
Dictionary
Texture
Inspection
Variation
Model
Textures
Carpet
0.43
0.90
0.57
0.42
0.82
0.16
0.89
0.36
0.57
0.61
-
Grid
0.38
1.00
0.57
0.98
0.90
0.12
0.57
0.33
1.00
0.05
-
Leather
0.00
0.92
0.06
0.82
0.91
0.12
0.63
0.71
0.00
0.99
-
Tile
1.00
0.04
1.00
0.54
0.97
0.05
0.97
0.44
1.00
0.43
-
Wood
0.84
0.82
1.00
0.47
0.89
0.47
0.79
0.88
0.42
1.00
-
Objects
Bottle
0.85
0.90
0.70
0.89
0.95
0.43
1.00
0.06
-
1.00
0.13
Cable
0.74
0.48
0.93
0.18
0.98
0.07
0.97
0.24
- -
Capsule
0.78
0.43
1.00
0.24
0.96
0.20
0.78
0.03
-
1.00
0.03
Hazelnut
1.00
0.07
0.93
0.84
0.83
0.16
0.90
0.07
- -
Metal nut
1.00
0.08
0.68
0.77
0.86
0.13
0.55
0.74
-
0.32
0.83
Pill
0.92
0.28
1.00
0.23
1.00
0.24
0.85
0.06
-
1.00
0.13
Screw
0.95
0.06
0.98
0.39
0.41
0.28
0.73
0.13
-
1.00
0.10
Toothbrush
0.75
0.73
1.00
0.97
1.00
0.13
1.00
0.03
-
1.00
0.60
Transistor
1.00
0.03
0.97
0.45
0.98
0.35
1.00
0.15
- -
Zipper
1.00
0.60
0.97
0.63
0.78
0.40
0.78
0.29
- -
Table 2: Results of the evaluated methods when ap-
plied to the classification of anomalous images. For each
dataset category, the ratio of correctly classified samples
of anomaly-free (top row) and anomalous images (bottom
ment is not possible for every ob
strict the evaluation of this meth
(Table 2). We use 30 randomly se
each object category in its origin
and variance parameters at each p
are converted to gray-scale before
Anomaly maps are obtained b
of each test pixel’s gray value to
relative to its predicted standard
GMM-based texture inspection, w
plementation of the HALCON ma
4.2. Data Augmentation
Since the evaluated methods ba
are typically trained on large data
performed for these methods for b
For the texture images, we random
lar patches of fixed size from the
object category, we apply a random
Additional mirroring is applied w
We augment each category to crea
4.3. Evaluation Metric
Each of the evaluated method
spatial map in which large value
Regions
egment anomalies in
ts are currently avail-
ction of textured sur-
ge, there does not yet
ows for the segmenta-
mages.
WICE,2
a dataset of
nofibrous material ac-
cope. Five defect-free
remaining 40 images
m of specks of dust or
provides a single kind
rithms that are evalu-
r textures of different
igned for optical in-
oposed during a 2007
n [28]. They provide
ay-scale textures with
of ellipses. Each class
tches for training and
owever, their annota-
xtures were generated
ariance in appearance
low. Furthermore, ar-
be seen as an approxi-
pervised anomaly de-
s have been suggested
ntel et al. [20] give a
ork. We restrict our-
state-of-the art meth-
baseline for our initial
works
gory, the top row shows an anomaly-free image. The middle
全体的にオートエンコーダーを用いたモデルが強い
上段: 正常データの正解率
下段: 異常データの正解率

‣ オートエンコーダーは⼊⼒と⽬的の出⼒に同じデータを⽤いてデータの
効率的な「表現」を学習
‣ Max Pooling - Up Sampling を⽤いて畳み込み層を⽤いたオートエン
コーダーも実現できる
‣ 再構成誤差を指標に教師なし学習で異常検知を⾏うことができる
‣ 「再構成誤差」の定義は⾊々ある
‣ MVTec AD データセットは実⽤的に⾯⽩そう
まとめ

異常検知とは

20190619 オートエンコーダーと異常検知入門

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 20190619 オートエンコーダーと異常検知入門

Similar to 20190619 オートエンコーダーと異常検知入門 (20)

More from Kazuki Motohashi

More from Kazuki Motohashi (6)

Recently uploaded

Recently uploaded (10)

20190619 オートエンコーダーと異常検知入門