SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Downloaden Sie, um offline zu lesen
CNNS
FROM THE BASICS TO RECENT ADVANCES
Dmytro Mishkin
Center for Machine Perception
Czech Technical University in Prague
ducha.aiki@gmail.com
MY BACKGROUND
 PhD student of Czech Technical university in
Prague. Now fully working in Deep Learning,
recent paper “All you need is a good init”
added to Stanford CS231n course.
 Kaggler. 9th out of 1049 teams at National
Data Science Bowl
 CTO of Clear Research (clear.sx) Using deep
learning at work since 2014.
2
Short review of the CNN design
Architecture progress
AlexNet → VGGNet → ResNet
→ GoogLeNet
Initialization
Design choices
3
OUTLINE
IMAGENET WINNERS
4
AlexNet (original):Krizhevsky et.al., ImageNet Classification with Deep Convolutional Neural Networks, 2012.
CaffeNet: Jia et.al., Caffe: Convolutional Architecture for Fast Feature Embedding, 2014.
Image credit: Roberto Matheus Pinheiro Pereira, “Deep Learning Talk”.
Srinivas et.al “A Taxonomy of Deep Convolutional Neural Nets for Computer Vision”, 2016.
5
CAFFENET ARCHITECTURE
Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016
6
CAFFENET ARCHITECTURE
VGGNET ARCHITECTURE
7
All convolutions are 3x3
Good performance,
but slow
INCEPTION (GOOGLENET):
BUILDING BLOCK
Szegedy et.al. Going Deeper with Convolutions. CVPR, 2015
Image credit: https://www.udacity.com/course/deep-learning--ud730
8
DEPTH LIMIT NOT REACHED YET
IN DEEP LEARNING
9
DEPTH LIMIT NOT REACHED YET
IN DEEP LEARNING
10
DEPTH LIMIT NOT REACHED YET
IN DEEP LEARNING
11
VGGNet – 19 layers, 19.6 billion FLOPS. Simonyan et.al., 2014
ResNet – 152 layers, 11.3 billion FLOPS. He et.al., 2015
Slide credit Šulc et.al. Very Deep Residual Networks with MaxOut for Plant Identification in the Wild, 2016.
Stochastic ResNet – 1200 layers.
Huang et.al., Deep Networks with Stochastic Depth, 2016
RESNET: RESIDUAL BLOCK
He et.al. Deep Residual Learning for Image Recognition, ICCV 2015
12
RESIDUAL NETWORKS: HOT TOPIC
 Identity mapping https://arxiv.org/abs/1603.05027
 Wide ResNets https://arxiv.org/abs/1605.07146
 Stochastic depth https://arxiv.org/abs/1603.09382
 Residual Inception https://arxiv.org/abs/1602.07261
 ResNets + ELU http://arxiv.org/pdf/1604.04112.pdf
 ResNet in ResNet
http://arxiv.org/pdf/1608.02908v1.pdf
 DC Nets http://arxiv.org/abs/1608.06993
 Weighted ResNet
http://arxiv.org/pdf/1605.08831v1.pdf
13
DATASETS USED IN PRESENTATION:
IMAGENET AND CIFAR-10
CIFAR-10:
50k training images
(32x32 px )
10k validation images
10 classes
ImageNet:
1.2M training images
(~ 256 x 256px )
50k validation images
1000 classes
Russakovskiy et.al, ImageNet Large Scale Visual Recognition Challenge, 2015
Krizhevsky, Learning Multiple Layers of Features from Tiny Images, 2009
14
Image credit: Hu et.al, 2015
Transferring Deep Convolutional Neural Networks for the Scene Classification
of High-Resolution Remote Sensing Imagery
15
CAFFENET ARCHITECTURE
LIST OF HYPER-PARAMETERS TESTED
16
REFERENCE METHODS: IMAGE SIZE SENSITIVE
Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016
17
CHOICE OF NON-LINEARITY
18
CHOICE OF NON-LINEARITY
Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016
19
NON-LINEARITIES ON CAFFENET
Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016
20
 Gaussian noise with variance.
 𝑣𝑎𝑟 𝜔𝑙 = 0.01 (AlexNet, Krizhevsky et.al, 2012)
 𝑣𝑎𝑟 𝜔𝑙 = 1/𝑛𝑖𝑛𝑝𝑢𝑡𝑠 (Glorot et.al. 2010)
 𝑣𝑎𝑟 𝜔𝑙 = 2/𝑛𝑖𝑛𝑝𝑢𝑡𝑠 (He et.al. 2015)
 Orthonormal: (Saxe et.al. 2013)
Glorot → SVD → 𝜔𝑙 = V
 Data-dependent: LSUV (Mishkin et.al, 2016)
Mishkin and Matas. All you need is a good init. ICLR, 2016
21
WEIGHT INITIALIZATION FOR A VERY DEEP NET
a) Layer gain 𝐺 𝐿 < 1→ vanishing activations variance
b) Layer gain 𝐺 𝐿 > 1 → exploding activation variance
Mishkin and Matas. All you need is a good init. ICLR, 2016
22
WEIGHT INITIALIZATION INFLUENCES ACTIVATIONS
ACTIVATIONS INFLUENCES MAGNITUDE OF
GRADIENT COMPONENTS
23
Mishkin and Matas. All you need is a good init. ICLR, 2016
Algorithm 1. Layer-sequential unit-variance orthogonal initialization.
𝑳 − convolution or fully-connected layer, 𝑊𝐿 − its weights, 𝑂 𝐿 − layer output,
𝜀 − variance tolerance,
𝑇𝑖 − iteration number, 𝑇 𝑚𝑎𝑥 − max number of iterations.
Pre-initialize network with orthonormal matrices as in Saxe et.al. (2013)
for each convolutional and fully-connected layer 𝑳 do
do forward pass with mini-batch
calculate v𝑎𝑟(𝑂 𝐿)
𝑊𝐿
𝑖+1
= ൗ𝑊𝐿
𝑖
𝑣𝑎𝑟(𝑂 𝐿)
until 𝑣𝑎𝑟 𝑂 𝐿 − 1.0 < 𝜀 or (𝑇𝑖 > 𝑇𝑚𝑎𝑥)
end for
*The LSUV algorithm does not deal with biases and initializes them with zeros
KEEPING PRODUCT OF PER-LAYER GAINS ~ 1:
LAYER-SEQUENTIAL UNIT-VARIANCE
ORTHOGONAL INITIALIZATION
Mishkin and Matas. All you need is a good init. ICLR, 2016
24
COMPARISON OF THE INITIALIZATIONS FOR
DIFFERENT ACTIVATIONS
 CIFAR-10 FitNet, accuracy [%]

 CIFAR-10 FitResNet, accuracy [%]
Mishkin and Matas. All you need is a good init. ICLR, 2016
25
LSUV INITIALIZATION IMPLEMENTATIONS
Mishkin and Matas. All you need is a good init. ICLR, 2016
 Caffe https://github.com/ducha-aiki/LSUVinit
 Keras https://github.com/ducha-aiki/LSUV-keras
 Torch https://github.com/yobibyte/torch-lsuv
26
BATCH NORMALIZATION
(AFTER EVERY CONVOLUTION LAYER)
Ioffe et.al, ICML 2015
27
BATCH NORMALIZATION: WHERE,
BEFORE OR AFTER NON-LINEARITY?
Mishkin and Matas. All you need is a good init. ICLR, 2016
Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016
ImageNet, top-1 accuracy
CIFAR-10, top-1 accuracy,
FitNet4 network
In short:
better to test
with
your architecture
and dataset :)
28
Network No BN Before ReLU After ReLU
CaffeNet128-FC2048 47.1 47.8 49.9
GoogLeNet128 61.9 60.3 59.6
Non-linearity BN Before BN After
TanH 88.1 89.2
ReLU 92.6 92.5
MaxOut 92.3 92.9
BATCH NORMALIZATION
SOMETIMES WORKS TOO GOOD AND HIDES PROBLEMS
29
Case:
CNN has less number outputs ( just typo),
than classes in dataset: 26 vs. 28
BatchNormed “learns well”
Plain CNN diverges
NON-LINEARITIES ON CAFFENET, WITH
BATCH NORMALIZATION
Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016
30
NON-LINEARITIES: TAKE AWAY MESSAGE
 Use ELU without batch normalization
 Or ReLU + BN
 Try maxout for the final layers
 Fallback solution (if something goes wrong) – ReLU
Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016
31
BUT IN SMALL DATA REGIME ( ~50K IMAGES)
TRY LEAKY OR RANDOMIZED RELU
 Accuracy [%], Network in Network architecture
 LogLoss, Plankton VGG architecture
Xu et.al. Empirical Evaluation of Rectified Activations in Convolutional Network ICLR 2015
ReLU VLReLU RReLU PReLU
CIFAR-10 87.55 88.80 88.81 88.20
CIFAR-100 57.10 59.60 59.80 58.4
ReLU VLReLU RReLU PReLU
KNDB 0.77 0.73 0.72 0.74
32
INPUT IMAGE SIZE
Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016
33
PADDING TYPES
Zero padding, stride = 2
Dumoulin and Visin. A guide to convolution arithmetic for deep learning. ArXiv 2016
No padding, stride = 2 Zero padding, stride = 1
34
PADDING
 Zero-padding:
 Preserving spatial size, not “washing out”
information
 Dropout-like augmentation by zeros
Caffenet128
 with conv padding: 47% top-1 acc
 w/o conv padding: 41% top-1 acc.
35
MAX POOLING:
PADDING AND KERNEL
Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016
36
POOLING METHODS
Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016
37
POOLING METHODS
Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016
38
LEARNING RATE POLICY
Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016
39
IMAGE PREPROCESSING
 Subtract mean pixel (training set), divide by std.
 RGB is the best (standard) colorspace for CNN
 Do nothing more…
 …unless you have specific dataset.
Subtract local mean pixel
B.Graham, 2015
Kaggle Diabetic Retinopathy Competition report
40
IMAGE PREPROCESSING:
WHAT DOESN`T WORK
Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016
41
IMAGE PREPROCESSING:
LET`S LEARN THE COLORSPACE
Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016
Image credit: https://www.udacity.com/course/deep-learning--ud730
42
DATASET QUALITY AND SIZE
Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016
43
NETWORK WIDTH: SATURATION AND SPEED
PROBLEM
Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016
44
BATCH SIZE AND LEARNING RATE
Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016
45
CLASSIFIER DESIGN
Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016
Ren et.al Object Detection Networks on Convolutional Feature Maps, arXiv 2016
Take home:
put
fully-connected
before final layer,
not earlier
46
APPLYING ALTOGETHER
Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016
47
>5 pp. additional
top-1 accuracy
for free.
THANK YOU FOR THE ATTENTION
 Any questions?
 All logs, graphs and network definitions:
https://github.com/ducha-aiki/caffenet-benchmark
Feel free to add your tests :)
 The paper is here: https://arxiv.org/abs/1606.02228
ducha.aiki@gmail.com
mishkdmy@cmp.felk.cvut.cz 48
ARCHITECTURE
 Use as small filters as possible
 3x3 + ReLU + 3x3 + ReLU > 5x5 + ReLU.
 3x1 + 1x3 > 3x3.
 2x2 + 2x2 > 3x3
 Exception: 1st layer. Too computationally
ineffective to use 3x3 there.
Convolutional Neural Networks at Constrained Time Cost. He and Sun, CVPR 2015
49
CAFFENET TRAINING
Mishkin and Matas. All you need is a good init. ICLR, 2016
50
GOOGLENET TRAINING
Mishkin and Matas. All you need is a good init. ICLR, 2016
51

Weitere ähnliche Inhalte

Was ist angesagt?

Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNNShuai Zhang
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network Yan Xu
 
CONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORKCONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORKMd Rajib Bhuiyan
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNNPradnya Saval
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural NetworksYogendra Tamang
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learningJörgen Sandig
 
Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural networkKIRAN R
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Suraj Aavula
 
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...Edureka!
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Basit Rafiq
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural NetworksAniket Maurya
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksJeremy Nixon
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Simplilearn
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNNAshray Bhandare
 
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning akira-ai
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Simplilearn
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkRichard Kuo
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnnSumeraHangi
 

Was ist angesagt? (20)

Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 
CONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORKCONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORK
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNN
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
 
Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural network
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural Networks
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNN
 
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
 

Ähnlich wie CNNs: from the Basics to Recent Advances

ImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural NetworksImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural NetworksWilly Marroquin (WillyDevNET)
 
UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxNoorUlHaq47
 
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSaptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSitakanta Mishra
 
Recent developments in Deep Learning
Recent developments in Deep LearningRecent developments in Deep Learning
Recent developments in Deep LearningBrahim HAMADICHAREF
 
Can AI say from our eyes when we read relevant information?
Can AI say from our eyes when we read relevant information?Can AI say from our eyes when we read relevant information?
Can AI say from our eyes when we read relevant information?Nilavra Bhattacharya
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Universitat Politècnica de Catalunya
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...Alex Conway
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentationOwin Will
 
BLOOD TISSUE IMAGE TO IDENTIFY MALARIA DISEASE CLASSIFICATION
BLOOD TISSUE IMAGE TO IDENTIFY MALARIA DISEASE CLASSIFICATIONBLOOD TISSUE IMAGE TO IDENTIFY MALARIA DISEASE CLASSIFICATION
BLOOD TISSUE IMAGE TO IDENTIFY MALARIA DISEASE CLASSIFICATIONIRJET Journal
 
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION cscpconf
 
Reservoir computing fast deep learning for sequences
Reservoir computing   fast deep learning for sequencesReservoir computing   fast deep learning for sequences
Reservoir computing fast deep learning for sequencesClaudio Gallicchio
 
ct_meeting_final_jcy (1).pdf
ct_meeting_final_jcy (1).pdfct_meeting_final_jcy (1).pdf
ct_meeting_final_jcy (1).pdfssuser2c7393
 
DeepFix: a fully convolutional neural network for predicting human fixations...
DeepFix:  a fully convolutional neural network for predicting human fixations...DeepFix:  a fully convolutional neural network for predicting human fixations...
DeepFix: a fully convolutional neural network for predicting human fixations...Universitat Politècnica de Catalunya
 
Surveillance scene classification using machine learning
Surveillance scene classification using machine learningSurveillance scene classification using machine learning
Surveillance scene classification using machine learningUtkarsh Contractor
 
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Universitat Politècnica de Catalunya
 
Pres Tesi LM-2016+transcript_eng
Pres Tesi LM-2016+transcript_engPres Tesi LM-2016+transcript_eng
Pres Tesi LM-2016+transcript_engDaniele Ciriello
 

Ähnlich wie CNNs: from the Basics to Recent Advances (20)

ImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural NetworksImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural Networks
 
UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptx
 
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSaptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
 
CNN
CNNCNN
CNN
 
Recent developments in Deep Learning
Recent developments in Deep LearningRecent developments in Deep Learning
Recent developments in Deep Learning
 
Can AI say from our eyes when we read relevant information?
Can AI say from our eyes when we read relevant information?Can AI say from our eyes when we read relevant information?
Can AI say from our eyes when we read relevant information?
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
 
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
PointNet
PointNetPointNet
PointNet
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
 
BLOOD TISSUE IMAGE TO IDENTIFY MALARIA DISEASE CLASSIFICATION
BLOOD TISSUE IMAGE TO IDENTIFY MALARIA DISEASE CLASSIFICATIONBLOOD TISSUE IMAGE TO IDENTIFY MALARIA DISEASE CLASSIFICATION
BLOOD TISSUE IMAGE TO IDENTIFY MALARIA DISEASE CLASSIFICATION
 
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
 
Reservoir computing fast deep learning for sequences
Reservoir computing   fast deep learning for sequencesReservoir computing   fast deep learning for sequences
Reservoir computing fast deep learning for sequences
 
ct_meeting_final_jcy (1).pdf
ct_meeting_final_jcy (1).pdfct_meeting_final_jcy (1).pdf
ct_meeting_final_jcy (1).pdf
 
DeepFix: a fully convolutional neural network for predicting human fixations...
DeepFix:  a fully convolutional neural network for predicting human fixations...DeepFix:  a fully convolutional neural network for predicting human fixations...
DeepFix: a fully convolutional neural network for predicting human fixations...
 
Surveillance scene classification using machine learning
Surveillance scene classification using machine learningSurveillance scene classification using machine learning
Surveillance scene classification using machine learning
 
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
 
Pres Tesi LM-2016+transcript_eng
Pres Tesi LM-2016+transcript_engPres Tesi LM-2016+transcript_eng
Pres Tesi LM-2016+transcript_eng
 

Kürzlich hochgeladen

Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxtuking87
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxzaydmeerab121
 
Explainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosExplainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosZachary Labe
 
Replisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfReplisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfAtiaGohar1
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsDobusch Leonhard
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGiovaniTrinidad
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfGABYFIORELAMALPARTID1
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and AnnovaMansi Rastogi
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPRPirithiRaju
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterHanHyoKim
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Sérgio Sacani
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxzeus70441
 
projectile motion, impulse and moment
projectile  motion, impulse  and  momentprojectile  motion, impulse  and  moment
projectile motion, impulse and momentdonamiaquintan2
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPRPirithiRaju
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxpriyankatabhane
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11GelineAvendao
 

Kürzlich hochgeladen (20)

Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptx
 
Explainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosExplainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenarios
 
Replisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfReplisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdf
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and Pitfalls
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptx
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annova
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarter
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptx
 
Interferons.pptx.
Interferons.pptx.Interferons.pptx.
Interferons.pptx.
 
projectile motion, impulse and moment
projectile  motion, impulse  and  momentprojectile  motion, impulse  and  moment
projectile motion, impulse and moment
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptx
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
 
PLASMODIUM. PPTX
PLASMODIUM. PPTXPLASMODIUM. PPTX
PLASMODIUM. PPTX
 

CNNs: from the Basics to Recent Advances

  • 1. CNNS FROM THE BASICS TO RECENT ADVANCES Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague ducha.aiki@gmail.com
  • 2. MY BACKGROUND  PhD student of Czech Technical university in Prague. Now fully working in Deep Learning, recent paper “All you need is a good init” added to Stanford CS231n course.  Kaggler. 9th out of 1049 teams at National Data Science Bowl  CTO of Clear Research (clear.sx) Using deep learning at work since 2014. 2
  • 3. Short review of the CNN design Architecture progress AlexNet → VGGNet → ResNet → GoogLeNet Initialization Design choices 3 OUTLINE
  • 5. AlexNet (original):Krizhevsky et.al., ImageNet Classification with Deep Convolutional Neural Networks, 2012. CaffeNet: Jia et.al., Caffe: Convolutional Architecture for Fast Feature Embedding, 2014. Image credit: Roberto Matheus Pinheiro Pereira, “Deep Learning Talk”. Srinivas et.al “A Taxonomy of Deep Convolutional Neural Nets for Computer Vision”, 2016. 5 CAFFENET ARCHITECTURE
  • 6. Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016 6 CAFFENET ARCHITECTURE
  • 7. VGGNET ARCHITECTURE 7 All convolutions are 3x3 Good performance, but slow
  • 8. INCEPTION (GOOGLENET): BUILDING BLOCK Szegedy et.al. Going Deeper with Convolutions. CVPR, 2015 Image credit: https://www.udacity.com/course/deep-learning--ud730 8
  • 9. DEPTH LIMIT NOT REACHED YET IN DEEP LEARNING 9
  • 10. DEPTH LIMIT NOT REACHED YET IN DEEP LEARNING 10
  • 11. DEPTH LIMIT NOT REACHED YET IN DEEP LEARNING 11 VGGNet – 19 layers, 19.6 billion FLOPS. Simonyan et.al., 2014 ResNet – 152 layers, 11.3 billion FLOPS. He et.al., 2015 Slide credit Šulc et.al. Very Deep Residual Networks with MaxOut for Plant Identification in the Wild, 2016. Stochastic ResNet – 1200 layers. Huang et.al., Deep Networks with Stochastic Depth, 2016
  • 12. RESNET: RESIDUAL BLOCK He et.al. Deep Residual Learning for Image Recognition, ICCV 2015 12
  • 13. RESIDUAL NETWORKS: HOT TOPIC  Identity mapping https://arxiv.org/abs/1603.05027  Wide ResNets https://arxiv.org/abs/1605.07146  Stochastic depth https://arxiv.org/abs/1603.09382  Residual Inception https://arxiv.org/abs/1602.07261  ResNets + ELU http://arxiv.org/pdf/1604.04112.pdf  ResNet in ResNet http://arxiv.org/pdf/1608.02908v1.pdf  DC Nets http://arxiv.org/abs/1608.06993  Weighted ResNet http://arxiv.org/pdf/1605.08831v1.pdf 13
  • 14. DATASETS USED IN PRESENTATION: IMAGENET AND CIFAR-10 CIFAR-10: 50k training images (32x32 px ) 10k validation images 10 classes ImageNet: 1.2M training images (~ 256 x 256px ) 50k validation images 1000 classes Russakovskiy et.al, ImageNet Large Scale Visual Recognition Challenge, 2015 Krizhevsky, Learning Multiple Layers of Features from Tiny Images, 2009 14
  • 15. Image credit: Hu et.al, 2015 Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery 15 CAFFENET ARCHITECTURE
  • 17. REFERENCE METHODS: IMAGE SIZE SENSITIVE Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016 17
  • 19. CHOICE OF NON-LINEARITY Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016 19
  • 20. NON-LINEARITIES ON CAFFENET Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016 20
  • 21.  Gaussian noise with variance.  𝑣𝑎𝑟 𝜔𝑙 = 0.01 (AlexNet, Krizhevsky et.al, 2012)  𝑣𝑎𝑟 𝜔𝑙 = 1/𝑛𝑖𝑛𝑝𝑢𝑡𝑠 (Glorot et.al. 2010)  𝑣𝑎𝑟 𝜔𝑙 = 2/𝑛𝑖𝑛𝑝𝑢𝑡𝑠 (He et.al. 2015)  Orthonormal: (Saxe et.al. 2013) Glorot → SVD → 𝜔𝑙 = V  Data-dependent: LSUV (Mishkin et.al, 2016) Mishkin and Matas. All you need is a good init. ICLR, 2016 21 WEIGHT INITIALIZATION FOR A VERY DEEP NET
  • 22. a) Layer gain 𝐺 𝐿 < 1→ vanishing activations variance b) Layer gain 𝐺 𝐿 > 1 → exploding activation variance Mishkin and Matas. All you need is a good init. ICLR, 2016 22 WEIGHT INITIALIZATION INFLUENCES ACTIVATIONS
  • 23. ACTIVATIONS INFLUENCES MAGNITUDE OF GRADIENT COMPONENTS 23 Mishkin and Matas. All you need is a good init. ICLR, 2016
  • 24. Algorithm 1. Layer-sequential unit-variance orthogonal initialization. 𝑳 − convolution or fully-connected layer, 𝑊𝐿 − its weights, 𝑂 𝐿 − layer output, 𝜀 − variance tolerance, 𝑇𝑖 − iteration number, 𝑇 𝑚𝑎𝑥 − max number of iterations. Pre-initialize network with orthonormal matrices as in Saxe et.al. (2013) for each convolutional and fully-connected layer 𝑳 do do forward pass with mini-batch calculate v𝑎𝑟(𝑂 𝐿) 𝑊𝐿 𝑖+1 = ൗ𝑊𝐿 𝑖 𝑣𝑎𝑟(𝑂 𝐿) until 𝑣𝑎𝑟 𝑂 𝐿 − 1.0 < 𝜀 or (𝑇𝑖 > 𝑇𝑚𝑎𝑥) end for *The LSUV algorithm does not deal with biases and initializes them with zeros KEEPING PRODUCT OF PER-LAYER GAINS ~ 1: LAYER-SEQUENTIAL UNIT-VARIANCE ORTHOGONAL INITIALIZATION Mishkin and Matas. All you need is a good init. ICLR, 2016 24
  • 25. COMPARISON OF THE INITIALIZATIONS FOR DIFFERENT ACTIVATIONS  CIFAR-10 FitNet, accuracy [%]   CIFAR-10 FitResNet, accuracy [%] Mishkin and Matas. All you need is a good init. ICLR, 2016 25
  • 26. LSUV INITIALIZATION IMPLEMENTATIONS Mishkin and Matas. All you need is a good init. ICLR, 2016  Caffe https://github.com/ducha-aiki/LSUVinit  Keras https://github.com/ducha-aiki/LSUV-keras  Torch https://github.com/yobibyte/torch-lsuv 26
  • 27. BATCH NORMALIZATION (AFTER EVERY CONVOLUTION LAYER) Ioffe et.al, ICML 2015 27
  • 28. BATCH NORMALIZATION: WHERE, BEFORE OR AFTER NON-LINEARITY? Mishkin and Matas. All you need is a good init. ICLR, 2016 Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016 ImageNet, top-1 accuracy CIFAR-10, top-1 accuracy, FitNet4 network In short: better to test with your architecture and dataset :) 28 Network No BN Before ReLU After ReLU CaffeNet128-FC2048 47.1 47.8 49.9 GoogLeNet128 61.9 60.3 59.6 Non-linearity BN Before BN After TanH 88.1 89.2 ReLU 92.6 92.5 MaxOut 92.3 92.9
  • 29. BATCH NORMALIZATION SOMETIMES WORKS TOO GOOD AND HIDES PROBLEMS 29 Case: CNN has less number outputs ( just typo), than classes in dataset: 26 vs. 28 BatchNormed “learns well” Plain CNN diverges
  • 30. NON-LINEARITIES ON CAFFENET, WITH BATCH NORMALIZATION Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016 30
  • 31. NON-LINEARITIES: TAKE AWAY MESSAGE  Use ELU without batch normalization  Or ReLU + BN  Try maxout for the final layers  Fallback solution (if something goes wrong) – ReLU Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016 31
  • 32. BUT IN SMALL DATA REGIME ( ~50K IMAGES) TRY LEAKY OR RANDOMIZED RELU  Accuracy [%], Network in Network architecture  LogLoss, Plankton VGG architecture Xu et.al. Empirical Evaluation of Rectified Activations in Convolutional Network ICLR 2015 ReLU VLReLU RReLU PReLU CIFAR-10 87.55 88.80 88.81 88.20 CIFAR-100 57.10 59.60 59.80 58.4 ReLU VLReLU RReLU PReLU KNDB 0.77 0.73 0.72 0.74 32
  • 33. INPUT IMAGE SIZE Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016 33
  • 34. PADDING TYPES Zero padding, stride = 2 Dumoulin and Visin. A guide to convolution arithmetic for deep learning. ArXiv 2016 No padding, stride = 2 Zero padding, stride = 1 34
  • 35. PADDING  Zero-padding:  Preserving spatial size, not “washing out” information  Dropout-like augmentation by zeros Caffenet128  with conv padding: 47% top-1 acc  w/o conv padding: 41% top-1 acc. 35
  • 36. MAX POOLING: PADDING AND KERNEL Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016 36
  • 37. POOLING METHODS Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016 37
  • 38. POOLING METHODS Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016 38
  • 39. LEARNING RATE POLICY Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016 39
  • 40. IMAGE PREPROCESSING  Subtract mean pixel (training set), divide by std.  RGB is the best (standard) colorspace for CNN  Do nothing more…  …unless you have specific dataset. Subtract local mean pixel B.Graham, 2015 Kaggle Diabetic Retinopathy Competition report 40
  • 41. IMAGE PREPROCESSING: WHAT DOESN`T WORK Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016 41
  • 42. IMAGE PREPROCESSING: LET`S LEARN THE COLORSPACE Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016 Image credit: https://www.udacity.com/course/deep-learning--ud730 42
  • 43. DATASET QUALITY AND SIZE Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016 43
  • 44. NETWORK WIDTH: SATURATION AND SPEED PROBLEM Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016 44
  • 45. BATCH SIZE AND LEARNING RATE Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016 45
  • 46. CLASSIFIER DESIGN Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016 Ren et.al Object Detection Networks on Convolutional Feature Maps, arXiv 2016 Take home: put fully-connected before final layer, not earlier 46
  • 47. APPLYING ALTOGETHER Mishkin et.al. Systematic evaluation of CNN advances on the ImageNet, arXiv 2016 47 >5 pp. additional top-1 accuracy for free.
  • 48. THANK YOU FOR THE ATTENTION  Any questions?  All logs, graphs and network definitions: https://github.com/ducha-aiki/caffenet-benchmark Feel free to add your tests :)  The paper is here: https://arxiv.org/abs/1606.02228 ducha.aiki@gmail.com mishkdmy@cmp.felk.cvut.cz 48
  • 49. ARCHITECTURE  Use as small filters as possible  3x3 + ReLU + 3x3 + ReLU > 5x5 + ReLU.  3x1 + 1x3 > 3x3.  2x2 + 2x2 > 3x3  Exception: 1st layer. Too computationally ineffective to use 3x3 there. Convolutional Neural Networks at Constrained Time Cost. He and Sun, CVPR 2015 49
  • 50. CAFFENET TRAINING Mishkin and Matas. All you need is a good init. ICLR, 2016 50
  • 51. GOOGLENET TRAINING Mishkin and Matas. All you need is a good init. ICLR, 2016 51