Convolutional neural networks for image classification — evidence from Kaggle National Data Science Bowl
1. convolutional neural networks for image
classification
Evidence from Kaggle National Data Science Bowl
.
Dmytro Mishkin, ducha.aiki at gmail com
March 25, 2015
Czech Technical University in Prague
2. kaggle national data science bowl overview
The image classification problem
130,400 test images
30,336 train images
1 channel (grayscale)
121 (biased) classess
90% images ≤ 100x100 px
logloss score = - 1
N
N∑
i=1
M∑
j=1
yij log pij
No external data
1
6. lunch time chat at kth’s computer vision group
a computer vision scientist: How long does it take to train these
generic features on ImageNet?
Hossein: 2 weeks
Ali: almost 3 weeks depending on the hardware
the computer vision scientist: hmmmm...
Stefan: Well, you have to compare the three weeks to the last 40
years of computer vision2
2http://www.csc.kth.se/cvap/cvg/DL/ots/
5
7. convolutional networks
CNNs are state-of-art in such fields of image recognition as:3
:
– Object Image Classification
– Scene Image Classification
– Action Image Classification
– Object Detection
– Semantic Segmentation
– Fine-grained Recognition
– Attribute Detection
– Metric Learning
– Instance Retrieval (almost).
3beat classic computer vision methods in 19 datasets out of 20
http://www.csc.kth.se/cvap/cvg/DL/ots/
6
8. contents
1. Basics of convolutional networks
2. Image preprocessing
3. Network architectures
4. Ensembling
5. What (seems that) do and does not work
6. Winner‘s solution highlights
7
11. softmax classifier
Softmax(cross-entropy) loss
L = − log e
fyi
∑
j
e
fj
SVM (hinge)loss
L =
∑
j̸=yi
max(0, f(xi, W)j − f(xi, W)yi + ∆)
5
5http://vision.stanford.edu/teaching/cs231n/linear-classify-demo/
10
12. lenet-5. no other layers are necessary
6
Firstly idea proposed by LeCun7
in 1989, recently revived by
Springenberg et. al. in ”All Convolutional Net”8
,
6http://eblearn.sourceforge.net/beginner_tutorial2_train.html
7url: https://www.facebook.com/yann.lecun/posts/10152766574417143.
8J. T. Springenberg et al. “Striving for Simplicity: The All Convolutional Net”. In:
ArXiv e-prints (2014). arXiv: 1412.6806 [cs.LG].
11
14. regularization - dropout, weight decay
9
9Nitish Srivastava et al. “Dropout: A Simple Way to Prevent Neural Networks from
Overfitting”. In: Journal of Machine Learning Research 15 (2014), pp. 1929–1958.
url: http://jmlr.org/papers/v15/srivastava14a.html.
13
15. deep learning libraries
Table 1: Popular deep learning GPU libraries
Name url languages Notes
caffe github.com/BVLC/caffe C++/Python/no largest community
cxxnet github.com/dmlc/cxxnet C++/no good memory management
Theano github.com/Theano/Theano Python huge flexibility
Torch facebook/fbcunn lua LeCun Facebook library
cuda-convnet2 code.google.com/p/cuda-convnet2/ C++/python
SparseConvNet http://tinyurl.com/pu65cfp C++/CUDA differs from others
14
21. regularization methods
Table 5: 5-layer network experiments, 64x64 input image, LeakyReLU
Name, augmentation Val logloss
h+v mirror, scale + rot, vanilla 1.08
h+v mirror, scale + rot, PReLU (but slow down a lot)10
1.03
h+v mirror, scale + rot, BatchNorm11
1.10
h+v mirror, scale + rot, StochPool12
0.98
10K. He et al. “Delving Deep into Rectifiers: Surpassing Human-Level Performance on
ImageNet Classification”. In: ArXiv e-prints (2015). arXiv: 1502.01852 [cs.CV].
11S. Ioffe and C. Szegedy. “Batch Normalization: Accelerating Deep Network Training by
Reducing Internal Covariate Shift”. In: ArXiv e-prints (2015). arXiv: 1502.03167
[cs.LG].
12M. D. Zeiler and R. Fergus. “Stochastic Pooling for Regularization of Deep
Convolutional Neural Networks”. In: ArXiv e-prints (2013). arXiv: 1301.3557 [cs.LG].
20
22. data augmentation - don‘t forget about it during test time
for i = 0,90,180,270 degrees rotation
for 9 crops (N, NE, E, ...)
get predictions for mirrored/non-mirrored
21
24. cifar/lenet for testing
Pro‘s
+ Training time 20 min
+ Can be done in parallel
+ therefore lots of experiments
Con‘s
- Not complex enough to check smth (i.e. BatchNorm)
- That is why might lead to wrong conclusions about ”bad” things (i.e.
random rotations hurts CifarNets, but helps VGGNets)
- Or ”good” things (i.e. Stochastic pooling helps CifarNets, but none
for VGGNets)
23
28. internal ensemble
Take mean of all auxiliary classifiers instead of just throwing away them
Table 6: GoogLeNet,validation loss
Name Public LB
clf on inc3 0.722
clf on inc4a 0.754
clf on inc4b 0.757
clf on inc5b 0.855
average 0.693
Table 7: VGGNet,validation loss
Name Public LB
clf on pool4 0.762
clf on pool5 0.657
clf on fc7 0.707
average 0.630
14
14J. Xie, B. Xu, and Z. Chuang. “Horizontal and Vertical Ensemble with Deep
Representation for Classification”. In: ArXiv e-prints (2013). arXiv: 1306.2759
[cs.LG].
27
29. googlenet-results
Table 8: GoogLeNet, 64x64 input image, Leaky ReLU (if not stated other),
AlexNet-oversample
Name Public LB
No inv, scale, ReLU, last-clf 0.910
No inv, scale, ReLU 0.859
No inv, scale 0.816
No inv scale, maxout-clf 0.785
Inv, scale, maxout-clf, retrain 0.703
96x96, inv, scale, maxout-clf, retrained, no-aug-ft15
0.684
112x112, inv, scale, maxout-clf, retrained, no-aug-ft. 0.716
48x48, inv, scale, maxout-clf, retrained, no-aug-ft. + test rot 0.749
96x96, inv, scale, maxout-clf, retrained, no-aug-ft. + test rot 0.679
48x48+96x96+112x112, inv, scale, maxout-clf, retrained, no-aug-ft 0.677
15Ben Graham‘s trick: finetune converged model for 1-5 epochs without
data-augmentation with small lrhttp://blog.kaggle.com/2015/01/02/
cifar-10-competition-winners-interviews-with-dr-ben-graham-phil-culliton-zygmu
28
30. vggnet
VGGNet architectures16
Differences: Dropout in conv-layers (0.3), SPP-pooling for pool5, LeakyReLU,
aux. clf.
16K. Simonyan and A. Zisserman. “Very Deep Convolutional Networks for Large-Scale
Image Recognition”. In: ArXiv e-prints (Sept. 2014). arXiv: 1409.1556 [cs.CV].
29
31. spatial pyramid pooling
17
17K. He et al. “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual
Recognition”. In: ArXiv e-prints (2014). arXiv: 1406.4729 [cs.CV].
30
32. vggnet-results
Table 9: GoogLeNet, 64x64 input image, Leaky ReLU (if not stated other),
AlexNet-oversample, no-SPP
Name Public LB
No inv, scale, ReLU, fc-maxout 0.752
Inv, scale, single random crop 0.773
Inv, scale, 50 random crops 0.751
Inv, scale, 0.729
Inv, scale, retrained 0.720
Inv, scale, fc-maxout 0.662
Inv, scale, fc-maxout, SPP 0.654
All VGGNets Mix 0.650
31
36. batchnorm
Works for CIFAR
But no big difference for VGGNet in KNDB for me. However, works for
other people, i.e. Jae Hyun Lim18
, 22nd place
18https://github.com/lim0606/ndsb
35
37. what else seems to work here
– Retrain top layers with different non-linearity (cheat diversity)
– Figure-skating average – throw away max and min prediction (0.003
LB score)
36
38. what seems, that does not work here
– Dense SIFT + BOW / Fisher Vector 6̃0% accuracy
– Random forest on CNN features 6̃5% accuracy
– Mix of Hinge and Cross-Entropy losses
– Averaging with other mean than arithmetical
– Image enhancement or preprocessing (histogram equalization, etc.)
37
42. thanks
This nice presentation theme is taken from
github.com/matze/mtheme
The theme itself is licensed under a Creative Commons
Attribution-ShareAlike 4.0 International License.
cba
41