1. Anomaly Detection in Noisy Images
Xavier Gibert Serra, Ph.D. Examination, 2015
November 18, 2015
Advisory Committee:
Professor Rama Chellappa, Chair/Advisor
Professor Piya Pal
Professor Shuvra Bhattacharyya
Professor Vishal M. Patel
Professor Amitabh Varshney, Dean’s Representative
1/55
2. Anomaly
Detection in
Noisy Images
Introduction
Background
Shearlets
Dictionaries
Deep Learning
EVT
Conclusions
References
2/55
Outline
1 Introduction
2 Background
3 Anomaly Detection on Textured Images
4 Image Dictionaries for Anomaly Detection
5 Deep Learning Methods for Anomaly Detection
6 Extreme Value Theory for Adaptive Anomaly Detection
7 Conclusions and Future Work
8 References
4. Anomaly
Detection in
Noisy Images
Introduction
Background
Shearlets
Dictionaries
Deep Learning
EVT
Conclusions
References
4/55
Anomaly Detection Problem Formulation
Hypothesis testing problem.
H0 : Y ∼ P0 (normal)
H1 : Y ∼ P1 (anomalous)
Anomalies are usually rare (P(H0) >> P(H1)).
Training data is often unbalanced (limited number of
examples of anomalies)
Both hypothesis are usually composite, due to the
presence of nuisance parameters.
7. Anomaly
Detection in
Noisy Images
Introduction
Background
Shearlets
Dictionaries
Deep Learning
EVT
Conclusions
References
7/55
Motivation
Anomalies are infrequent, but failure to detect them can lead
to disastrous consequences
Derailment of Canadian Pacific Railway Freight Train 292-16 and Subsequent Release of Anhydrous
Ammonia Near Minot, North Dakota
January 18, 2002
Source: http://www.ntsb.gov/doclib/reports/2004/RAR0401.pdf
9. Anomaly
Detection in
Noisy Images
Introduction
Background
Shearlets
Dictionaries
Deep Learning
EVT
Conclusions
References
9/55
Related Works
Automated visual railway component inspection methods:
Authors Year Components Defects Features Decision methods
Stella et al. 2002–09 Fasteners Missing DWT 3-layer NN
Singh et al. 2006 Fasteners Missing Edge density Threshold
Hsieh et al. 2007 Fasteners Broken DWT Threshold
Gibert et al. 2007–08 Joint Bars Cracks Edges SVM
Babenko 2008 Fasteners Missing/Def. Intensity OT-MACH corr.
Xia et al. 2010 Fasteners Broken Haar Adaboost
Yang et al. 2011 Fasteners Missing Direction Field Correlation
Resendiz et al. 2013 Ties/Turnouts – Gabor SVM
Li et al. 2014 Tie plates Missing spikes Lines/Haar Adaboost
Feng et al. 2014 Fasteners Missing/Def. Haar PGM
Gibert et al. 2014 Concrete ties Cracks DST Iter. shrinkage
Khan et al. 2014 Fasteners Defective Harris, Shi Matching errors
Gibert et al. 2015 Fasteners Missing/Def. HOG SVM
Gibert et al. 2015 Concrete ties Tie Condition Intensity Deep CNN
21. Anomaly
Detection in
Noisy Images
Introduction
Background
Shearlets
Dictionaries
Deep Learning
EVT
Conclusions
References
21/55
Iterative Shrinkage Algorithm
Input:
Initialization: Initialize and set and
and
repeat:
1. Update the estimate of and as
2. Update the residual as
3. Update the shrinkage parameter as
until: stopping criterion is satisfied
Output: The two representation vectors and .
25. Anomaly
Detection in
Noisy Images
Introduction
Background
Shearlets
Dictionaries
Deep Learning
EVT
Conclusions
References
25/55
CUDA-based Implementation
The Discrete Shearlet Transform (DST) and the Wavelet
Transform (DWT) are highly parallelizable.
DST steps:
Laplacian Pyramidal Decomposition
Convolution with directional filters using 2D FFT
DWT steps:
Convolution along rows: x → LL LH
Convolution along columns: LL LH → LL LH HL HH
26. Anomaly
Detection in
Noisy Images
Introduction
Background
Shearlets
Dictionaries
Deep Learning
EVT
Conclusions
References
26/55
GPU Acceleration Results
1 2 4 8 16 32 64 C1060 C2050 GTX480GTX690 K20c
10
2
10
3
10
4
10
5
Number of CPU cores/GPU model
time(msec)
2D Shearlet compute times
single precision
double precision
Time to denoise a 512×512 image via shearlet shrinkage
1 2 4 8 16 32 64 C1060 C2050 GTX480GTX690 K20c
10
0
10
1
10
2
10
3
Number of CPU cores/GPU model
time(seconds)
3D Shearlet compute times
single precision
double precision
Time to denoise a 192×192×192 video via 3D shearlet shrinkage
28. Anomaly
Detection in
Noisy Images
Introduction
Background
Shearlets
Dictionaries
Deep Learning
EVT
Conclusions
References
28/55
Problem Formulation
Hypothesis testing problem.
H0: Normal component.
H1: Anomalous (broken/missing) component.
missing
(background)
broken
PR
clip
e
clip
fastclip
c
clip
j
clip
Level
1
Level
2
Level
3
Defec,ve
Non-‐defec,ve
Level
4
good
fastener
ROI
29. Anomaly
Detection in
Noisy Images
Introduction
Background
Shearlets
Dictionaries
Deep Learning
EVT
Conclusions
References
29/55
Composite Hypothesis Testing
Hypothesis testing problem:
H0: x ∈ G
H1: x ∈ {B ∪ M}
where
G = {good (non-defective) configurations}
B = {broken configurations}
M = {background (missing) configurations}
However, data is highly unbalanced. For each candidate region,
P(x ∈ B) << P(x ∈ G) << P(x ∈ M)
Solution: Partition the configuration space into compact subsets:
→
30. Anomaly
Detection in
Noisy Images
Introduction
Background
Shearlets
Dictionaries
Deep Learning
EVT
Conclusions
References
30/55
3-Way Max-margin Formulation
For each class c ∈ C, C ≡ {G ∪ B}, we train a pair of binary
classifiers, bc (c vs M), and fc (c vs Cc)
Given a set of candidate regions X, we define the score for an
image as
S = min(Sb, Sm)
= min − max
c∈B
max
x∈X
fc · x, max
c∈G
max
x∈X
[bc · x + min(0, fc · x)]
Hypothesis testing:
H0 : S > τ
H1 : S ≤ τ
34. Anomaly
Detection in
Noisy Images
Introduction
Background
Shearlets
Dictionaries
Deep Learning
EVT
Conclusions
References
34/55
Background
Deep learning has become mainstream.
State-of-the-art results in many applications.
applicable to many problems: classification, recognition,
regression, segmentation, ...
in many modalities: image, video, speech, biometrics, ...
Choice of frameworks for quick prototyping (Cuda-Convnet2,
Caffe, Torch7, Theano, TensorFlow, and more).
With enough training data, a carefully chosen deep architecture
can outperform a hand-engineered system.
Still,
hard to debug when it does not work.
difficult to explain why it works.
35. Anomaly
Detection in
Noisy Images
Introduction
Background
Shearlets
Dictionaries
Deep Learning
EVT
Conclusions
References
35/55
Deep Multi-Task Learning
Why deep learning?
Shared representations: more efficient, and more compact.
Can learn arbitrarily complex transfer functions.
No need to design feature descriptors.
Why multi-task?
Limited number of examples of anomalies (one-shot learning).
Features learned for one task can be reused for other tasks, but
it is better to learn jointly to ensure acceptable performance on
both tasks.
Multi-task objective Single task objective
Φ =
T
t=1 λt
Nt
i=1 Et (f (xti ), yti ) Φt =
Nt
i=1 Et (ft(xti ), yti ) ,
t ∈ {1 . . . T}
36. Anomaly
Detection in
Noisy Images
Introduction
Background
Shearlets
Dictionaries
Deep Learning
EVT
Conclusions
References
36/55
Material Identification Task
We pose track inspection as a semantic segmentation
problem and train 10 relevant material classes.
ballast wood rough medium smooth
concrete concrete concrete
crumbling chipped lubricator rail fastener
concrete concrete
Training is done on representative image patches.
Deployed network runs on the whole image with trained
parameters.
37. Anomaly
Detection in
Noisy Images
Introduction
Background
Shearlets
Dictionaries
Deep Learning
EVT
Conclusions
References
37/55
Network Architecture
9
9
1
48
64
256
10
stride
2
pooling
5
5
5
5
1
1
relu
pooling
relu
pooling
input
conv1
conv2
conv3
conv4
Single Task (Material Classification)
9
9
1
48
64
256
10
stride
2
pooling
5
5
5
5
1
1
relu
pooling
relu
drop
pooling
input
conv1
conv2
conv3
conv4_t
512
conv4_f
5
5
5
1
1
conv5_f
Shared
network
Material
net
Fasteners
Shared
features
relu
drop
pooling
Training
Batch
size
128
Training
Batch
size
16
Fastener
Mul8class
32
conv5_fastVsBg
Fastener
Binary
SVMs
conv5_fastVsFast
Training
Batch
size
32
x
1
Multi-Task (Fastener Detection + Material Classification)
39. Anomaly
Detection in
Noisy Images
Introduction
Background
Shearlets
Dictionaries
Deep Learning
EVT
Conclusions
References
39/55
Experimental Results: Material Identification
Patch size: 80 × 80 pixels.
Cross-validation set: 500,000 samples (5 splits).
Method Accuracy
Deep CNN MTL 3 95.02%
Deep CNN MTL 2 93.60%
Deep CNN STL 93.35%
LBP-HF with FLANN 82.05%
LBPu2
8,1 with FLANN 82.70%
Gabor with FLANN 75.63%
40. Anomaly
Detection in
Noisy Images
Introduction
Background
Shearlets
Dictionaries
Deep Learning
EVT
Conclusions
References
40/55
Tie Assessment Procedure
Compute scores at each site for each anomalous class
b ∈ B:
Sb(x, y) = max
i /∈B
Φi (x, y) − Φb(x, y) (1)
Image score calculation:
Sb =
1
β − α
β
α
F−1
(t)dt (2)
where F−1 refers to the t sample quantile calculated from
all scores Sb(x, y) in the image.
Report alarm b if Sb > τb.
45. Anomaly
Detection in
Noisy Images
Introduction
Background
Shearlets
Dictionaries
Deep Learning
EVT
Conclusions
References
45/55
Extreme Value Theory for Adaptive Anomaly
Detection (I)
Theorem 1 (Fisher-Tippet-Gnedenko): Let X1, . . . , Xn be i.i.d.
samples from an unknown distribution F and
Mn = max(X1, . . . , Xn). If there exist a sequence of pairs of real
numbers (an, bn) such that an > 0 for all n and a distribution
Λ(x) such that
lim
n→∞
P
Mn − bn
an
≤ x = Λ(x)
for all x at which Λ(x) is continuous, then the limit distribution
Λ(x) belongs to either the Gumbel, the Fr´echet or the Weibull
family. These three families can be grouped into the Generalized
Extreme Value Distribution (GEVD)
Λ(x; µ, σ, ξ) = exp − 1 + ξ
x − µ
σ
−1/ξ
.
46. Anomaly
Detection in
Noisy Images
Introduction
Background
Shearlets
Dictionaries
Deep Learning
EVT
Conclusions
References
46/55
Extreme Value Theory for Adaptive Anomaly
Detection (II)
Theorem 2 (Pickands): Given an upper threshold u, we select
the Nn samples that exceed such threshold and define the
excesses Y1, . . . , YNn as Yi = Xj − u, where i is the excess index
and j is the index of the original sample. The probability of
exceeding the threshold is λ = 1 − F(u). For sufficiently large u,
the upper tail distribution function Fu(y) = F(u+y)−F(u)
1−F(u) can be
approximated by a Generalized Pareto Distribution (GPD)
G(y; σ, ξ) = 1 − 1 +
ξy
σ
−1/ξ
+
, y > 0.
47. Anomaly
Detection in
Noisy Images
Introduction
Background
Shearlets
Dictionaries
Deep Learning
EVT
Conclusions
References
47/55
Extreme Value Theory for Adaptive Anomaly
Detection (III)
EVT-Based Adaptive Thresholding Algorithm (Broadwater
and Chellappa, TSP 2010):
1 Set initial threshold u (for example u = F−1
x (0.95))
2 Select all samples greater than u
3 Fit GPD by maximizing the log-likelihood equation
ˆσ, ˆξ = argmax
σ,ξ
g(σ, ξ; X)
= argmax
σ,ξ
−n log σ −
1 + ξ
ξ
n
i=1
log 1 +
ξxi
σ
4 Find threshold for desired FAR α0 > u as
tα = u +
ˆσ
ˆξ
Nα0
n
−ˆξ
− 1
48. Anomaly
Detection in
Noisy Images
Introduction
Background
Shearlets
Dictionaries
Deep Learning
EVT
Conclusions
References
48/55
Statistical Model
Assumptions:
Under H0 (normal),
conditions on Theorem 2 hold, so
Fu(y) ≈ G(y; σ, ξ) for u = F−1(0.95),
ξ ≈ 0, i.e. fu(y) ≈ e−λy (hypothesized on the basis of
sparsity promoting prior induced by 1 hinge loss),
Fu is time-variant and λ is drawn from the Gamma
conjugate prior
π(λ; α, β) = βα
Γ(α)λα−1e−βλ
with slowly varying α and β.
Under H1 (anomalous) this model does not hold.
49. Anomaly
Detection in
Noisy Images
Introduction
Background
Shearlets
Dictionaries
Deep Learning
EVT
Conclusions
References
49/55
Training Procedure
Algorithm 1 EVT training algorithm
1: procedure TRAIN(T , pu, w0)
2: n 0, s 0 . Initialize sufficient statistics
3: for all (~x, ~y) 2 T do . Training set T contains ~x scores, ~y labels
4: ~g {xi | yi = 0} . Select negative samples
5: u u | #{gi > u} = #~g pu . Find upper threshold
6: ~t {gi | gi > u} - u . Extract upper tail
7: n n + #~t . Update counts
8: s s +
P
~t . Update sum
9: end for
10: ↵0 1 + s
11: 0
w0 s
n
12: return ↵0, 0 . Parameters of the Gamma prior
13: end procedure
50. Anomaly
Detection in
Noisy Images
Introduction
Background
Shearlets
Dictionaries
Deep Learning
EVT
Conclusions
References
50/55
Testing Procedure
Algorithm 2 EVT adaptive thresholding algorithm
1: procedure ADAPTSCORES(~x, ↵0, 0, pu, pf , w1, L, na)
2: ba0
0
↵0 1
. MLE in training set
3: ~y sort desc(~x) . Sort scores in descending order
4: k #~y pu
5: for i 1, na do . Training set T contains ~x scores, ~y labels
6: u yi+k . Find upper threshold
7: ~t {yi, . . . , yi+k} u . Extract upper tail
8: Dn,i = supx
bGn(x) G(x) . Compute KS statistic
9: end for
10: ˆi mini{Dn,i} . Estimate number of outliers
11: u0
yˆi . Set outlier rejection threshold
12: ~t {yˆi, . . . , yˆi+k} u . Extract upper tail
13: ↵1 ↵0 +
P
~t
14: 1 0 + w1
P
~t
#~t
15: for i 1, n do
16: ~w ~xi (L 1)/2:i+(L 1)/2 . Window centered at sample xi
17: u u | #{wi > u} = #~w pu . Find upper threshold
18: ~t {wi | wi > u} - u . Extract upper tail
19: ↵ ↵1 + #~t . Posterior
20: 1 +
P
~t . Posterior
21: ba ↵ 1
. MAP estimate
22: yi xi + u ba log(pf /pu) . Adapt score
23: end for
24: return ~y . Adapted scores
25: end procedure
54. Anomaly
Detection in
Noisy Images
Introduction
Background
Shearlets
Dictionaries
Deep Learning
EVT
Conclusions
References
54/55
Summary
Anomaly detection on images can be solved by modeling the
normal images, the anomalous ones, or both.
Enablers:
Availability of large amounts of training data.
Transfer learning techniques (i.e. multi-task).
Domain-specific modeling.
Approaches:
Analysis of image components (shearlets, wavelets,
dictionaries).
Learning features for normal/abnormal elements.
Statistical analysis (Extreme Value Theory).
Application domains:
Transportation (Railways, roads, bridges, signals, vehicles).
Medical (PET/SPECT/CT/ultrasound images).
Other (Industrial automation, security, remote sensing).
55. Anomaly
Detection in
Noisy Images
Introduction
Background
Shearlets
Dictionaries
Deep Learning
EVT
Conclusions
References
55/55
References
X. Gibert, V.M. Patel, R. Chellappa.
Deep multi-task learning for railway track
inspection.
submitted to IEEE Trans. on ITS (2015)
X. Gibert, V.M. Patel, R. Chellappa.
Sequential score adaptation with extreme
value theory for robust railway track
inspection.
IEEE-ICCV Workshop on CVRSUAD (2015)
X. Gibert, V.M. Patel, R. Chellappa.
Material classification and semantic
segmentation of railway track images with
deep convolutional neural networks.
IEEE Int. Conf. on Image Processing (2015)
R. Chellappa, X. Gibert, V.M. Patel.
Robust anomaly detection for vision-based
inspection of railway components.
DOT/FRA/ORD-15/23 Tech. Report (2015)
X. Gibert, V.M. Patel, R. Chellappa.
Robust fastener detection for autonomous
visual track inspection.
IEEE Winter Conf. on Appl. of CV (2015)
X. Gibert, V.M. Patel, D. Labate,
R. Chellappa.
Discrete shearlet transform on GPU with
applications in anomaly detection and
denoising.
EURASIP Journal on ASP (2014)
K. Chodnicki, X. Gibert, J. Tian, F. Arrate,
R. Chellappa, T. Dickfeld, V. Dilsizian,
M. Smith.
Point-specific matching of cardiac
electrophysiological voltage and SPECT
perfusion measurements for myocardial tissue
characterization.
Journal of Nuclear Medicine 55 (suppl 1),
602 (2014)
M. Smith, X. Gibert, F. Arrate,
R. Chellappa, K. Chodnicki, J. Tian,
T. Dickfeld, V. Dilsizian.
CardioViewer: A novel modular software tool
for integrating cardiac electrophysiology
voltage measurements and PET/SPECT
data.
IEEE Medical Imaging Conference, Seattle,
Washington (2014)