20. Kazuki Motohashi - Skymind K.K.実践者向けディープラーニング勉強会 第4回 - 19/June/2019 20
MVTec Anomaly Detection Dataset (MVTec AD)
aset of
rial ac-
ct-free
mages
dust or
le kind
evalu-
fferent
cal in-
a 2007
provide
es with
h class
ng and
nnota-
nerated
arance
ore, ar-
pproxi-
aly de-
gested
give a
ct our-
meth-
initial
Figure 2: Example images for all five textures and ten ob-
ject categories of the MVTec AD dataset. For each cate-
2.1.2 Segmentation of Anomalous Regions
For the evaluation of methods that segment anomalies in
images, only very few public datasets are currently avail-
able. All of them focus on the inspection of textured sur-
faces and, to the best of our knowledge, there does not yet
exist a comprehensive dataset that allows for the segmenta-
tion of anomalous regions in natural images.
Carrera et al. [6] provide NanoTWICE,2
a dataset of
45 gray-scale images that show a nanofibrous material ac-
quired by a scanning electron microscope. Five defect-free
images can be used for training. The remaining 40 images
contain anomalous regions in the form of specks of dust or
flattened areas. Since the dataset only provides a single kind
of texture, it is unclear how well algorithms that are evalu-
ated on this dataset generalize to other textures of different
domains.
A dataset that is specifically designed for optical in-
spection of textured surfaces was proposed during a 2007
DAGM workshop by Wieler and Hahn [28]. They provide
ten classes of artificially generated gray-scale textures with
defects weakly annotated in the form of ellipses. Each class
comprises 1000 defect-free texture patches for training and
150 defective patches for testing. However, their annota-
tions are quite coarse and since the textures were generated
by very similar texture models, the variance in appearance
between the different textures is quite low. Furthermore, ar-
tificially generated datasets can only be seen as an approxi-
MVTecはドイツ・ミュンヘンの画像処理ソフトウェア開発会社
https://www.mvtec.com/company/research/datasets/
5,000枚以上の異常検知用画像データセットを公開
15種類の異なる物体とテクスチャの正常/異常データを含む
解説論文はコンピュータビジョンのトップ会議 CVPR 2019 で採択
21. Kazuki Motohashi - Skymind K.K.実践者向けディープラーニング勉強会 第4回 - 19/June/2019 21
MVTec Anomaly Detection Dataset (MVTec AD)
Category
AE
(SSIM)
AE
(L2)
AnoGAN
CNN
Feature
Dictionary
Texture
Inspection
Variation
Model
Textures
Carpet
0.43
0.90
0.57
0.42
0.82
0.16
0.89
0.36
0.57
0.61
-
Grid
0.38
1.00
0.57
0.98
0.90
0.12
0.57
0.33
1.00
0.05
-
Leather
0.00
0.92
0.06
0.82
0.91
0.12
0.63
0.71
0.00
0.99
-
Tile
1.00
0.04
1.00
0.54
0.97
0.05
0.97
0.44
1.00
0.43
-
Wood
0.84
0.82
1.00
0.47
0.89
0.47
0.79
0.88
0.42
1.00
-
Objects
Bottle
0.85
0.90
0.70
0.89
0.95
0.43
1.00
0.06
-
1.00
0.13
Cable
0.74
0.48
0.93
0.18
0.98
0.07
0.97
0.24
- -
Capsule
0.78
0.43
1.00
0.24
0.96
0.20
0.78
0.03
-
1.00
0.03
Hazelnut
1.00
0.07
0.93
0.84
0.83
0.16
0.90
0.07
- -
Metal nut
1.00
0.08
0.68
0.77
0.86
0.13
0.55
0.74
-
0.32
0.83
Pill
0.92
0.28
1.00
0.23
1.00
0.24
0.85
0.06
-
1.00
0.13
Screw
0.95
0.06
0.98
0.39
0.41
0.28
0.73
0.13
-
1.00
0.10
Toothbrush
0.75
0.73
1.00
0.97
1.00
0.13
1.00
0.03
-
1.00
0.60
Transistor
1.00
0.03
0.97
0.45
0.98
0.35
1.00
0.15
- -
Zipper
1.00
0.60
0.97
0.63
0.78
0.40
0.78
0.29
- -
Table 2: Results of the evaluated methods when ap-
plied to the classification of anomalous images. For each
dataset category, the ratio of correctly classified samples
of anomaly-free (top row) and anomalous images (bottom
ment is not possible for every ob
strict the evaluation of this meth
(Table 2). We use 30 randomly se
each object category in its origin
and variance parameters at each p
are converted to gray-scale before
Anomaly maps are obtained b
of each test pixel’s gray value to
relative to its predicted standard
GMM-based texture inspection, w
plementation of the HALCON ma
4.2. Data Augmentation
Since the evaluated methods ba
are typically trained on large data
performed for these methods for b
For the texture images, we random
lar patches of fixed size from the
object category, we apply a random
Additional mirroring is applied w
We augment each category to crea
4.3. Evaluation Metric
Each of the evaluated method
spatial map in which large value
Regions
egment anomalies in
ts are currently avail-
ction of textured sur-
ge, there does not yet
ows for the segmenta-
mages.
WICE,2
a dataset of
nofibrous material ac-
cope. Five defect-free
remaining 40 images
m of specks of dust or
provides a single kind
rithms that are evalu-
r textures of different
igned for optical in-
oposed during a 2007
n [28]. They provide
ay-scale textures with
of ellipses. Each class
tches for training and
owever, their annota-
xtures were generated
ariance in appearance
low. Furthermore, ar-
be seen as an approxi-
pervised anomaly de-
s have been suggested
ntel et al. [20] give a
ork. We restrict our-
state-of-the art meth-
baseline for our initial
works
Figure 2: Example images for all five textures and ten ob-
ject categories of the MVTec AD dataset. For each cate-
gory, the top row shows an anomaly-free image. The middle
全体的にオートエンコーダーを用いたモデルが強い
上段: 正常データの正解率
下段: 異常データの正解率
22. Kazuki Motohashi - Skymind K.K.実践者向けディープラーニング勉強会 第4回 - 19/June/2019 22
AE(L2) と AE(SSIM)
損失関数は必ず自乗誤差でなくてはいけないわけでもない
L2誤差 (自乗誤差) SSIM (Structual SIMilarity) index [2004]
Dosovitskiy and Brox (2016). It increases the quality of
the produced reconstructions by extracting features from
both the input image x and its reconstruction ˆx and enforc-
ing them to be equal. Consider F : Rk⇥h⇥w
! Rf
to be
a feature extractor that obtains an f-dimensional feature
vector from an input image. Then, a regularizer can be
added to the loss function of the autoencoder, yielding
the feature matching autoencoder (FM-AE) loss
LFM(x, ˆx) = L2(x, ˆx) + kF(x) F(ˆx)k2
2 , (3)
where > 0 denotes the weighting factor between the two
loss terms. F can be parameterized using the first layers of
a CNN pretrained on an image classification task. During
evaluation, a residual map RFM is obtained by comparing
the per-pixel `2
-distance of x and ˆx. The hope is that
sharper, more realistic reconstructions will lead to better
residual maps compared to a standard `2
-autoencoder.
3.1.4. SSIM Autoencoder. We show that employing more
elaborate architectures such as VAEs or FM-AEs does
not yield satisfactory improvements of the residial maps
over deterministic `2
-autoencoders in the unsupervised
defect segmentation task. They are all based on per-pixel
evaluation metrics that assume an unrealistic indepen-
dence between neighboring pixels. Therefore, they fail to
detect structural differences between the inputs and their
l(p, q) =
2µpµq + c1
µ2
p + µ2
q + c1
(5)
c(p, q) =
2 p q + c2
2
p + 2
q + c2
(6)
s(p, q) =
2 pq + c2
2 p q + c2
. (7)
The constants c1 and c2 ensure numerical stability and are
typically set to c1 = 0.01 and c2 = 0.03. By substituting
(5)-(7) into (4), the SSIM is given by
SSIM(p, q) =
(2µpµq + c1)(2 pq + c2)
(µ2
p + µ2
q + c1)( 2
p + 2
q + c2)
. (8)
It holds that SSIM(p, q) 2 [ 1, 1]. In particular,
SSIM(p, q) = 1 if and only if p and q are identical
(Wang et al., 2004). Figure 2 shows the different percep-
tions of the three similarity functions that form the SSIM
index. Each of the patch pairs p and q has a constant `2
-
residual of 0.25 per pixel and hence assigns low defect
scores to each of the three cases. SSIM on the other hand
is sensitive to variations in the patches’ mean, variance,
and covariance in its respective residual map and assigns
low similarity to each of the patch pairs in one of the
comparison functions.
training them purely on defect-free image data. During
testing, the autoencoder will fail to reconstruct defects that
have not been observed during training, which can thus be
segmented by comparing the original input to the recon-
struction and computing a residual map R(x, ˆx) 2 Rw⇥h
.
3.1.1. `2
-Autoencoder. To force the autoencoder to recon-
struct its input, a loss function must be defined that guides
it towards this behavior. For simplicity and computational
speed, one often chooses a per-pixel error measure, such
as the L2 loss
L2(x, ˆx) =
h 1X
r=0
w 1X
c=0
(x(r, c) ˆx(r, c))
2
, (2)
where x(r, c) denotes the intensity value of image x at
the pixel (r, c). To obtain a residual map R`2 (x, ˆx) during
evaluation, the per-pixel `2
-distance of x and ˆx is com-
puted.
3.1.2. Variational Autoencoder. Various extensions to
the deterministic autoencoder framework exist. VAEs
(Kingma and Welling, 2014) impose constraints on the
latent variables to follow a certain distribution z ⇠ P(z).
For simplicity, the distribution is typically chosen to be
人間が感じる違い
・画素値(輝度値)の変化
・コントラストの変化
・構造の変化
が符号化前後でどれくらい変化したかを表す指標
(画像を小さいWindowに分けて計算)
http://visualize.hatenablog.com/entry/2016/02/20/144657
https://dftalk.jp/?p=18111
23. Kazuki Motohashi - Skymind K.K.実践者向けディープラーニング勉強会 第4回 - 19/June/2019 23
AnoGAN [2017]
To be published in the proceedings of IPMI 2017
Fig. 2. (a) Deep convolutional generative adversarial network. (b) t-SNE embedding
of normal (blue) and anomalous (red) images on the feature representation of the last
convolution layer (orange in (a)) of the discriminator.
2.1 Unsupervised Manifold Learning of Normal Anatomical
Variability
GANを正常データで訓練し、正常データの生成モデル(確率分布)を構築
https://arxiv.org/abs/1703.05921
24. Kazuki Motohashi - Skymind K.K.実践者向けディープラーニング勉強会 第4回 - 19/June/2019 24
CNN Feature Dictionary [2018]
and outputs the best k clusters and corresponding k centroids. A centroid is the mean position of all
the elements of the cluster. For each cluster, we take the feature vector that is nearest to its centroid.
The set of these k feature vectors compose the dictionary W. Figure 4 shows the pipeline for dictionary
building. Figure 5 shows examples of dictionaries learned from images of the training set (anomaly
free) with different patch sizes and number of clusters. The figure shows the subregions corresponding
to each feature vector of the dictionary W.
Figure 4. Examples of dictionary achieved considering different patch sizes and different number
of subregions.
patch_size = 16 x 16
stride = 4
ResNet-18
512-dim avgpool layer
情報量95%
k-means
(クラスタ中心に一番近い
ベクトルを選択)
特徴量ベクトルのディクショナリの中で互いの距離がある閾値より遠いベクトルを異常領域とする
次元削減を行なっている部分
https://www.ncbi.nlm.nih.gov/pubmed/29329268
25. Kazuki Motohashi - Skymind K.K.実践者向けディープラーニング勉強会 第4回 - 19/June/2019 25
MVTec Anomaly Detection Dataset (MVTec AD)
Category
AE
(SSIM)
AE
(L2)
AnoGAN
CNN
Feature
Dictionary
Texture
Inspection
Variation
Model
Textures
Carpet
0.43
0.90
0.57
0.42
0.82
0.16
0.89
0.36
0.57
0.61
-
Grid
0.38
1.00
0.57
0.98
0.90
0.12
0.57
0.33
1.00
0.05
-
Leather
0.00
0.92
0.06
0.82
0.91
0.12
0.63
0.71
0.00
0.99
-
Tile
1.00
0.04
1.00
0.54
0.97
0.05
0.97
0.44
1.00
0.43
-
Wood
0.84
0.82
1.00
0.47
0.89
0.47
0.79
0.88
0.42
1.00
-
Objects
Bottle
0.85
0.90
0.70
0.89
0.95
0.43
1.00
0.06
-
1.00
0.13
Cable
0.74
0.48
0.93
0.18
0.98
0.07
0.97
0.24
- -
Capsule
0.78
0.43
1.00
0.24
0.96
0.20
0.78
0.03
-
1.00
0.03
Hazelnut
1.00
0.07
0.93
0.84
0.83
0.16
0.90
0.07
- -
Metal nut
1.00
0.08
0.68
0.77
0.86
0.13
0.55
0.74
-
0.32
0.83
Pill
0.92
0.28
1.00
0.23
1.00
0.24
0.85
0.06
-
1.00
0.13
Screw
0.95
0.06
0.98
0.39
0.41
0.28
0.73
0.13
-
1.00
0.10
Toothbrush
0.75
0.73
1.00
0.97
1.00
0.13
1.00
0.03
-
1.00
0.60
Transistor
1.00
0.03
0.97
0.45
0.98
0.35
1.00
0.15
- -
Zipper
1.00
0.60
0.97
0.63
0.78
0.40
0.78
0.29
- -
Table 2: Results of the evaluated methods when ap-
plied to the classification of anomalous images. For each
dataset category, the ratio of correctly classified samples
of anomaly-free (top row) and anomalous images (bottom
ment is not possible for every ob
strict the evaluation of this meth
(Table 2). We use 30 randomly se
each object category in its origin
and variance parameters at each p
are converted to gray-scale before
Anomaly maps are obtained b
of each test pixel’s gray value to
relative to its predicted standard
GMM-based texture inspection, w
plementation of the HALCON ma
4.2. Data Augmentation
Since the evaluated methods ba
are typically trained on large data
performed for these methods for b
For the texture images, we random
lar patches of fixed size from the
object category, we apply a random
Additional mirroring is applied w
We augment each category to crea
4.3. Evaluation Metric
Each of the evaluated method
spatial map in which large value
Regions
egment anomalies in
ts are currently avail-
ction of textured sur-
ge, there does not yet
ows for the segmenta-
mages.
WICE,2
a dataset of
nofibrous material ac-
cope. Five defect-free
remaining 40 images
m of specks of dust or
provides a single kind
rithms that are evalu-
r textures of different
igned for optical in-
oposed during a 2007
n [28]. They provide
ay-scale textures with
of ellipses. Each class
tches for training and
owever, their annota-
xtures were generated
ariance in appearance
low. Furthermore, ar-
be seen as an approxi-
pervised anomaly de-
s have been suggested
ntel et al. [20] give a
ork. We restrict our-
state-of-the art meth-
baseline for our initial
works
Figure 2: Example images for all five textures and ten ob-
ject categories of the MVTec AD dataset. For each cate-
gory, the top row shows an anomaly-free image. The middle
全体的にオートエンコーダーを用いたモデルが強い
上段: 正常データの正解率
下段: 異常データの正解率
26. Kazuki Motohashi - Skymind K.K.実践者向けディープラーニング勉強会 第4回 - 19/June/2019 26
‣ オートエンコーダーは⼊⼒と⽬的の出⼒に同じデータを⽤いてデータの
効率的な「表現」を学習
‣ Max Pooling - Up Sampling を⽤いて畳み込み層を⽤いたオートエン
コーダーも実現できる
‣ 再構成誤差を指標に教師なし学習で異常検知を⾏うことができる
‣ 「再構成誤差」の定義は⾊々ある
‣ MVTec AD データセットは実⽤的に⾯⽩そう
まとめ