SlideShare ist ein Scribd-Unternehmen logo
Modern Convolutional Neural Network
techniques for image segmentation
Deep Learning Journal Club
Gioele Ciaparrone
Michele Curci
November 30, 2016
University of Salerno
Index
1. Introduction
2. The Inception architecture
3. Fully convolutional networks
4. Hypercolumns
5. Conclusion
2
Introduction
CNN recap
• Sequence of convolutional and pooling layers
• Rectifier activation function
• Fully connected layers at the end
• Softmax function for classification
4
Convolution I
5
Convolution II
Valid padding (left) and same padding (right) convolutions
6
LeNet-5 (1989-1998)
• First CNN (1989) proven to work well, used for handwritten Zip
code recognition [1]
• Refined through the years until the LeNet-5 version (1998) [2]
7
LeNet-5 interactive visualization [3]
It’s possible to interact with the network in 3D, manually drawing a digit
to be classified, clicking on the neurons to get info about the parameters
and the connected units, or rotating and zooming the network:
http://scs.ryerson.ca/~aharley/vis/conv/
8
AlexNet (2012) [5]
• After a long hiatus in which deep learning was ignored [4], they
received attention once again after Alex Krizhevsky overwhelmingly
won the ILSVRC in 2012 with AlexNet
• Structure very similar to LeNet-5, but with some new key insights:
very efficient GPU implementation, ReLU neurons and dropout
9
The Inception architecture
Motivations
• Increasing model size tends to improve quality
• More computational resources are needed
• Computational efficiency and low parameter count are still important
• Mobile vision and embedded systems
• Big Data
11
Going Deeper with Convolutions [6]
• The Inception module solves this problem making a better use of the
computing resources
• Proposed in 2014 by Christian Szegedy and other Google researchers
• Used in the GoogLeNet architecture that won both the ILSVRC
2014 classification and detection challanges
12
Inception module I
• Visual information is processed at various scales and then aggregated
• Since pooling operations are beneficial in CNNs, a parallel pooling
path has been added
• Problems:
• 3x3 and 5x5 convolutions can be very expensive on top of a layer
with lots of filters
• The number of filters substantially increases for each Inception layer
added, leading to a computational blow up 13
Inception module II
• Adding the 1x1 convolutions before the bigger convolutions reduces
dimensionality
• The same is done after the pooling layer
14
GoogLeNet I
• GoogLeNet is a particular incarnation of the Inception architecture
• 22 convolutional layers (27 including pooling)
• 9 Inception modules
• 2 auxiliary classifiers to solve the vanishing gradient problem and for
regularization
• Designed with computational efficiency in mind
• Inference can be run on devices with limited computational
resources, especially memory
• 7 of these networks used in an ensemble for the ILSVRC 2014
classification task
15
GoogLeNet II
16
GoogLeNet III
17
GoogLeNet - Training
• Trained with the DistBelief distributed machine learning system
• Asynchronous stochastic gradient descent with 0.9 momentum
• Image sampling methods have changed many times before the
competition
• Converged models were trained on with other options
• Models were trained on crops of different size
• There isn’t a definitive guidance to the most effective single way to
train these networks
18
GoogLeNet - ILSVRC 2014 Results
Classification (above) and object detection (below) results.
19
DeepDream
Google’s DeepDream uses a GoogLeNet to produce “machine dreams”
20
Inception-v2 and Inception-v3
• The Inception module authors later presented new optimized
versions of the architecture, called Inception-v2 and Inception-v3 [7]
• They managed to significantly improve GoogLeNet ILSVRC 2014
results
• The improvements were based on various key principles:
• Avoid representational bottlenecks
• Spatial aggregation on lower dimensional embeddings doesn’t usually
induce relevant losses in representational power
• Balance the width and depth of the network
21
Convolution factorization I
• Factorizing convolutions allows to reduce the number of parameters
while not loosing much expressiveness
• For example 5x5 convolutions can be factorized into a pair of 3x3
convolutions
• It is also possible to factorize a NxN convolutions into a 1xN and a
Nx1 convolutions
22
Convolution factorization II
The original Inception module (left) and the new factorized module
(right).
23
Efficient grid size reduction - problem
• Suppose we want to pass from a d × d grid with k filters to a d
2 × d
2
grid with 2k filters
• We need to compute a stride-1 convolution and then a pooling
• Computational cost dominated by convolutions: 2d2
k2
operations
• Inverting the order, the number of operations is reduced to 2(d
2 )2
k2
,
but we violate the bottleneck principle
24
Efficient grid size reduction - solution
• The solution is an Inception module with convolution and pooling
blocks with stride 2
• Computationally efficient and no representational bottleneck
introduced
25
The new architecture
• Using various modified Inception modules, here is the new
Inception-v2 architecture
26
Inception-v2: modules used
n = 7
27
Inception-v2: training and observations
• The network was trained on the ILSVRC 2012 images using
stochastic gradient descent and the TensorFlow library
• Experimental testings proved the two auxiliary classifiers to have less
impact on the training convergence than expected
• In the early training phases, the model performance was not affected
by the presence of the auxiliary classifiers: they only improved the
performance near the end of training
• Removing the lower auxiliary classifier didn’t have any effect
• The main classifier performs better if batch normalization or dropout
are added to the auxiliary ones
• The model was also trained and tested on smaller receptive fields
with only a small loss of top-1 accuracy (76.6% for 299x299 RF vs.
75.2% on 79x79 RF). Important for post-classification of detection
28
Inception-v2 to Inception-v3 results (single model)
• Each row’s Inception-v2 model adds a feature with respect to the
previous row’s model
• The last line’s model is referred to as the Inception-v3 model
29
Inception-v3 vs other models (single and ensemble)
Single model results Ensemble results
• On the ILSVRC 2012 dataset, there is a significant improvement
versus state-of-the-art models, both with a single model and with an
ensemble of models
• Note that the ensemble errors here are validation errors (except for
the one marked with ’*’, that is a test error)
30
Fully convolutional networks
Semantic segmentation
• Image segmentation is the process of partitioning an image in
multiple segments (set of pixels or super-pixels)
• Semantic segmentation is the partitioning of an image into
semantically meaningful parts and to classify each part into one of
the pre-determined classes
• It’s possible to achieve the same result with pixel-wise
classification, i.e. assigning a class to each pixel
32
Fully convolutional networks
• Shelhamer et al. [8] showed that fully convolutional networks trained
pixels-to-pixels exceed the state-of-the-art in semantic segmentation
• The fully convolutional networks they proposed take input of
arbitrary size and produce same-sized output to make dense
predictions
33
Convolutionalization of a classic net I
• Typical recognition nets (AlexNet, GoogLeNet, etc.) take fixed-sized
inputs and produce non-spatial outputs
• The fully connected layers have fixed dimensions and drop the
spatial coordinates
• However we can view these fully connected layers as convolutions
that cover their entire input regions
34
Convolutionalization of a classic net II
• These fully convolutional networks take input of any size and output
classifications map
• The resulting maps are equivalent to the evaluation of the original
network on particular input patches
• The new network is more than 5 times faster than the original
network both at learning time and at inference time (considering a
10x10 output grid)
• Note that the output dimensions are typically reduced by
subsampling
• So output interpolation is needed to obtain dense predictions
• The interpolation is obtained through backwards convolutions
35
Backwards strided convolution
Upsampling from 3x3 grid to 5x5
36
Architecture I
• Coarse and local information is fused combining lower and higher
layers
• 3 network types with different layers fused were tested
37
Architecture II
• 3 proven classification architectures were transformed to fully
convolutional: AlexNet, VGG16 and GoogLeNet
• Each net’s final classifier layer is discarded and all the fully
connected layers are converted to convolutions
• A 1x1 convolution with 21 channels (the number of classes in the
PASCAL VOC 2011 dataset) is added to the end, followed by a
backwards convolution layer
38
Architecture III
• The original nets were first pre-trained using image classification
• Then they were transformed to fully convolutional for fine tuning
using whole images (using SGD with momentum)
• The best results were obtained with FCN-VGG16
• Training on whole images proved to be as effective as sampling
patches
39
Architecture comparison
• The first models (FCN-32s) didn’t fuse different layers, but the
resulting output is very coarse
• They then fused lower layers with the last one (as shown earlier) to
obtain better results (mean IU 62.7 for FCN-8s vs. 59.4 for
FCN-32s)
40
Results comparison I
• The model reaches state-of-the-art performance on semantic
segmentation
• Also the model is much faster at inference time than previous
architectures
41
Results comparison II
42
Hypercolumns
Hypercolumns I
• The last layer of a CNN captures general features of the image, but
is too coarse spatially to allow precise localization
• Earlier layers instead may be precise in localization but will not
capture semantics
• Hariharan et al. [9] presented the hypercolumn concept, which puts
togheter the information from both higher and lower layers to obtain
better results on 3 fine-grained localization tasks:
• Simultaneous detection and segmentation
• Keypoint localization
• Part labeling
44
Hypercolumns II
• The hypercolumn corresponding to a given input location is defined
as the outputs of all units above that location at all layers of the
CNN, stacked into one vector
45
Problem setting I
• Input: a set of detections (subjected to non-maximum suppression),
each with a bounding box, a category label and a score
• According to the task we are performing for each detection we want:
• segment out the object
• segment its parts
• predict its keypoints
• Whichever the task, the bounding boxes are slightly expanded and a
50x50 heatmap is predicted on each of them
46
Problem setting II
• The information encoded in each heatmap and the number of
heatmaps depend on the chosen task:
• For segmentation, the heatmap encodes the probability that a
particular location is inside the object
• For part labeling a separate heatmap is predicted for each part,
where each heatmap is the probability a location belongs to that part
• For keypoint localization a separate heatmap is predicted for each
keypoint, with each heatmap encoding the probability that the
keypoint is at a particular location
• The heatmaps are finally resized to the size of the expanded
bounding boxes
• So all the tasks are solved assigning a probability to each of the
50x50 locations
47
Problem setting III
• For each of the 50x50 locations and for each category a classifier
should be trained
• But doing so has 3 problems:
• The amount of data that each classifier sees during training is
heavily reduced
• Training so many classifiers is computationally expensive
• While the classifier should vary according to the location, to adjacent
pixels should be classified similarly
• The solution is to train a coarse K × K (usually K = 5 or K = 10)
grid of classifiers and interpolate between them
48
Network architecture
conv conv conv
upsample upsample upsample
sigmoid
classifier
interpolation
Note: inverting the order of upsampling and convolutions (that calculate
the K × K grids) and computing them separately for each of the 3
combined layers allows to reduce computational cost
49
Bounding box refining
• A special technique is used to improve the box selection, called
rescoring
50
SDS results
51
Keypoint prediction results
52
Part labeling results
53
Conclusion
Conclusion
• We have seen how the Inception modules allow to train deeper and
better networks in a computationally efficient manner
• We have then observed how to transform a classification CNN into a
fully convolutional network for pixel-wise classification
• We have learned the hypercolumn technique to combine high and
low level information to improve the accuracy on various fine-grained
localization tasks
55
Thank you for your patience! :)
56
References I
[1] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard,
W. Hubbard, and L. D. Jackel, “Backpropagation applied to
handwritten zip code recognition,” Neural Computation, vol. 1(4),
pp. 541–551, 1989.
[2] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based
learning applied to document recognition,” Proc. IEEE, vol. 86,
pp. 2278–2324, 1998.
[3] A. W. Harley, “An interactive node-link visualization of convolutional
neural networks,” in ISVC, pp. 867–877, 2015.
[4] A. Kurenkov, “A ’brief’ history of neural nets and deep learning, part
4.” http://www.andreykurenkov.com/writing/
a-brief-history-of-neural-nets-and-deep-learning-part-4/.
57
References II
[5] A. Krizhevsky, I. Sutskever, , and G. Hinton, “Imagenet classification
with deep convolutional neural networks,” Advances in Neural
Information Processing Systems, vol. 25, pp. 1106–1114, 2012.
[6] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov,
D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with
convolutions,” CoRR, vol. abs/1409.4842, 2014.
[7] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna,
“Rethinking the inception architecture for computer vision,” CoRR,
vol. abs/1512.00567, 2015.
[8] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks
for semantic segmentation,” CoRR, vol. abs/1605.06211, 2016.
58
References III
[9] B. Hariharan, P. A. Arbel´aez, R. B. Girshick, and J. Malik,
“Hypercolumns for object segmentation and fine-grained
localization,” CoRR, vol. abs/1411.5752, 2014.
59

Weitere ähnliche Inhalte

Was ist angesagt?

Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
Ashray Bhandare
 
Transfer Learning
Transfer LearningTransfer Learning
Transfer Learning
Hichem Felouat
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
Richard Kuo
 
Convolutional Neural Network (CNN) - image recognition
Convolutional Neural Network (CNN)  - image recognitionConvolutional Neural Network (CNN)  - image recognition
Convolutional Neural Network (CNN) - image recognition
YUNG-KUEI CHEN
 
MobileNet - PR044
MobileNet - PR044MobileNet - PR044
MobileNet - PR044
Jinwon Lee
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNN
Ashray Bhandare
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
milad abbasi
 
Cnn
CnnCnn
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
Jörgen Sandig
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Sujit Pal
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
Kasun Chinthaka Piyarathna
 
Resnet
ResnetResnet
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution Overview
LEE HOSEONG
 
Cnn method
Cnn methodCnn method
Cnn method
AmirSajedi1
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNN
Pradnya Saval
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Simplilearn
 
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
Edge AI and Vision Alliance
 
convolutional neural network (CNN, or ConvNet)
convolutional neural network (CNN, or ConvNet)convolutional neural network (CNN, or ConvNet)
convolutional neural network (CNN, or ConvNet)
RakeshSaran5
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
Sangwoo Mo
 
Deep learning for image super resolution
Deep learning for image super resolutionDeep learning for image super resolution
Deep learning for image super resolution
Prudhvi Raj
 

Was ist angesagt? (20)

Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Transfer Learning
Transfer LearningTransfer Learning
Transfer Learning
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
Convolutional Neural Network (CNN) - image recognition
Convolutional Neural Network (CNN)  - image recognitionConvolutional Neural Network (CNN)  - image recognition
Convolutional Neural Network (CNN) - image recognition
 
MobileNet - PR044
MobileNet - PR044MobileNet - PR044
MobileNet - PR044
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNN
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Cnn
CnnCnn
Cnn
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
 
Resnet
ResnetResnet
Resnet
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution Overview
 
Cnn method
Cnn methodCnn method
Cnn method
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNN
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
 
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
 
convolutional neural network (CNN, or ConvNet)
convolutional neural network (CNN, or ConvNet)convolutional neural network (CNN, or ConvNet)
convolutional neural network (CNN, or ConvNet)
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
 
Deep learning for image super resolution
Deep learning for image super resolutionDeep learning for image super resolution
Deep learning for image super resolution
 

Ähnlich wie Modern Convolutional Neural Network techniques for image segmentation

PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
Jinwon Lee
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
SaadMemon23
 
GoogLeNet.pptx
GoogLeNet.pptxGoogLeNet.pptx
GoogLeNet.pptx
ssuser2624f71
 
Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]
SubhradeepMaji
 
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
thanhdowork
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
ssuser3aa461
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
taeseon ryu
 
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
ssuser4b1f48
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
ananth
 
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digitsNVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA Taiwan
 
U-Netpresentation.pptx
U-Netpresentation.pptxU-Netpresentation.pptx
U-Netpresentation.pptx
NoorUlHaq47
 
ConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explainedConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explained
Sushant Gautam
 
04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx
ZainULABIDIN496386
 
Image Classification using deep learning
Image Classification using deep learning Image Classification using deep learning
Image Classification using deep learning
Asma-AH
 
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxEfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
ssuser2624f71
 
3_Transfer_Learning.pdf
3_Transfer_Learning.pdf3_Transfer_Learning.pdf
3_Transfer_Learning.pdf
FEG
 
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
taeseon ryu
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
Jinwon Lee
 
convolutional_neural_networks.pptx
convolutional_neural_networks.pptxconvolutional_neural_networks.pptx
convolutional_neural_networks.pptx
MsKiranSingh
 

Ähnlich wie Modern Convolutional Neural Network techniques for image segmentation (20)

PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
GoogLeNet.pptx
GoogLeNet.pptxGoogLeNet.pptx
GoogLeNet.pptx
 
Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]
 
VGG.pptx
VGG.pptxVGG.pptx
VGG.pptx
 
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digitsNVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
 
U-Netpresentation.pptx
U-Netpresentation.pptxU-Netpresentation.pptx
U-Netpresentation.pptx
 
ConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explainedConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explained
 
04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx
 
Image Classification using deep learning
Image Classification using deep learning Image Classification using deep learning
Image Classification using deep learning
 
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxEfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
 
3_Transfer_Learning.pdf
3_Transfer_Learning.pdf3_Transfer_Learning.pdf
3_Transfer_Learning.pdf
 
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
convolutional_neural_networks.pptx
convolutional_neural_networks.pptxconvolutional_neural_networks.pptx
convolutional_neural_networks.pptx
 

Kürzlich hochgeladen

Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 

Kürzlich hochgeladen (20)

Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 

Modern Convolutional Neural Network techniques for image segmentation

  • 1. Modern Convolutional Neural Network techniques for image segmentation Deep Learning Journal Club Gioele Ciaparrone Michele Curci November 30, 2016 University of Salerno
  • 2. Index 1. Introduction 2. The Inception architecture 3. Fully convolutional networks 4. Hypercolumns 5. Conclusion 2
  • 4. CNN recap • Sequence of convolutional and pooling layers • Rectifier activation function • Fully connected layers at the end • Softmax function for classification 4
  • 6. Convolution II Valid padding (left) and same padding (right) convolutions 6
  • 7. LeNet-5 (1989-1998) • First CNN (1989) proven to work well, used for handwritten Zip code recognition [1] • Refined through the years until the LeNet-5 version (1998) [2] 7
  • 8. LeNet-5 interactive visualization [3] It’s possible to interact with the network in 3D, manually drawing a digit to be classified, clicking on the neurons to get info about the parameters and the connected units, or rotating and zooming the network: http://scs.ryerson.ca/~aharley/vis/conv/ 8
  • 9. AlexNet (2012) [5] • After a long hiatus in which deep learning was ignored [4], they received attention once again after Alex Krizhevsky overwhelmingly won the ILSVRC in 2012 with AlexNet • Structure very similar to LeNet-5, but with some new key insights: very efficient GPU implementation, ReLU neurons and dropout 9
  • 11. Motivations • Increasing model size tends to improve quality • More computational resources are needed • Computational efficiency and low parameter count are still important • Mobile vision and embedded systems • Big Data 11
  • 12. Going Deeper with Convolutions [6] • The Inception module solves this problem making a better use of the computing resources • Proposed in 2014 by Christian Szegedy and other Google researchers • Used in the GoogLeNet architecture that won both the ILSVRC 2014 classification and detection challanges 12
  • 13. Inception module I • Visual information is processed at various scales and then aggregated • Since pooling operations are beneficial in CNNs, a parallel pooling path has been added • Problems: • 3x3 and 5x5 convolutions can be very expensive on top of a layer with lots of filters • The number of filters substantially increases for each Inception layer added, leading to a computational blow up 13
  • 14. Inception module II • Adding the 1x1 convolutions before the bigger convolutions reduces dimensionality • The same is done after the pooling layer 14
  • 15. GoogLeNet I • GoogLeNet is a particular incarnation of the Inception architecture • 22 convolutional layers (27 including pooling) • 9 Inception modules • 2 auxiliary classifiers to solve the vanishing gradient problem and for regularization • Designed with computational efficiency in mind • Inference can be run on devices with limited computational resources, especially memory • 7 of these networks used in an ensemble for the ILSVRC 2014 classification task 15
  • 18. GoogLeNet - Training • Trained with the DistBelief distributed machine learning system • Asynchronous stochastic gradient descent with 0.9 momentum • Image sampling methods have changed many times before the competition • Converged models were trained on with other options • Models were trained on crops of different size • There isn’t a definitive guidance to the most effective single way to train these networks 18
  • 19. GoogLeNet - ILSVRC 2014 Results Classification (above) and object detection (below) results. 19
  • 20. DeepDream Google’s DeepDream uses a GoogLeNet to produce “machine dreams” 20
  • 21. Inception-v2 and Inception-v3 • The Inception module authors later presented new optimized versions of the architecture, called Inception-v2 and Inception-v3 [7] • They managed to significantly improve GoogLeNet ILSVRC 2014 results • The improvements were based on various key principles: • Avoid representational bottlenecks • Spatial aggregation on lower dimensional embeddings doesn’t usually induce relevant losses in representational power • Balance the width and depth of the network 21
  • 22. Convolution factorization I • Factorizing convolutions allows to reduce the number of parameters while not loosing much expressiveness • For example 5x5 convolutions can be factorized into a pair of 3x3 convolutions • It is also possible to factorize a NxN convolutions into a 1xN and a Nx1 convolutions 22
  • 23. Convolution factorization II The original Inception module (left) and the new factorized module (right). 23
  • 24. Efficient grid size reduction - problem • Suppose we want to pass from a d × d grid with k filters to a d 2 × d 2 grid with 2k filters • We need to compute a stride-1 convolution and then a pooling • Computational cost dominated by convolutions: 2d2 k2 operations • Inverting the order, the number of operations is reduced to 2(d 2 )2 k2 , but we violate the bottleneck principle 24
  • 25. Efficient grid size reduction - solution • The solution is an Inception module with convolution and pooling blocks with stride 2 • Computationally efficient and no representational bottleneck introduced 25
  • 26. The new architecture • Using various modified Inception modules, here is the new Inception-v2 architecture 26
  • 28. Inception-v2: training and observations • The network was trained on the ILSVRC 2012 images using stochastic gradient descent and the TensorFlow library • Experimental testings proved the two auxiliary classifiers to have less impact on the training convergence than expected • In the early training phases, the model performance was not affected by the presence of the auxiliary classifiers: they only improved the performance near the end of training • Removing the lower auxiliary classifier didn’t have any effect • The main classifier performs better if batch normalization or dropout are added to the auxiliary ones • The model was also trained and tested on smaller receptive fields with only a small loss of top-1 accuracy (76.6% for 299x299 RF vs. 75.2% on 79x79 RF). Important for post-classification of detection 28
  • 29. Inception-v2 to Inception-v3 results (single model) • Each row’s Inception-v2 model adds a feature with respect to the previous row’s model • The last line’s model is referred to as the Inception-v3 model 29
  • 30. Inception-v3 vs other models (single and ensemble) Single model results Ensemble results • On the ILSVRC 2012 dataset, there is a significant improvement versus state-of-the-art models, both with a single model and with an ensemble of models • Note that the ensemble errors here are validation errors (except for the one marked with ’*’, that is a test error) 30
  • 32. Semantic segmentation • Image segmentation is the process of partitioning an image in multiple segments (set of pixels or super-pixels) • Semantic segmentation is the partitioning of an image into semantically meaningful parts and to classify each part into one of the pre-determined classes • It’s possible to achieve the same result with pixel-wise classification, i.e. assigning a class to each pixel 32
  • 33. Fully convolutional networks • Shelhamer et al. [8] showed that fully convolutional networks trained pixels-to-pixels exceed the state-of-the-art in semantic segmentation • The fully convolutional networks they proposed take input of arbitrary size and produce same-sized output to make dense predictions 33
  • 34. Convolutionalization of a classic net I • Typical recognition nets (AlexNet, GoogLeNet, etc.) take fixed-sized inputs and produce non-spatial outputs • The fully connected layers have fixed dimensions and drop the spatial coordinates • However we can view these fully connected layers as convolutions that cover their entire input regions 34
  • 35. Convolutionalization of a classic net II • These fully convolutional networks take input of any size and output classifications map • The resulting maps are equivalent to the evaluation of the original network on particular input patches • The new network is more than 5 times faster than the original network both at learning time and at inference time (considering a 10x10 output grid) • Note that the output dimensions are typically reduced by subsampling • So output interpolation is needed to obtain dense predictions • The interpolation is obtained through backwards convolutions 35
  • 36. Backwards strided convolution Upsampling from 3x3 grid to 5x5 36
  • 37. Architecture I • Coarse and local information is fused combining lower and higher layers • 3 network types with different layers fused were tested 37
  • 38. Architecture II • 3 proven classification architectures were transformed to fully convolutional: AlexNet, VGG16 and GoogLeNet • Each net’s final classifier layer is discarded and all the fully connected layers are converted to convolutions • A 1x1 convolution with 21 channels (the number of classes in the PASCAL VOC 2011 dataset) is added to the end, followed by a backwards convolution layer 38
  • 39. Architecture III • The original nets were first pre-trained using image classification • Then they were transformed to fully convolutional for fine tuning using whole images (using SGD with momentum) • The best results were obtained with FCN-VGG16 • Training on whole images proved to be as effective as sampling patches 39
  • 40. Architecture comparison • The first models (FCN-32s) didn’t fuse different layers, but the resulting output is very coarse • They then fused lower layers with the last one (as shown earlier) to obtain better results (mean IU 62.7 for FCN-8s vs. 59.4 for FCN-32s) 40
  • 41. Results comparison I • The model reaches state-of-the-art performance on semantic segmentation • Also the model is much faster at inference time than previous architectures 41
  • 44. Hypercolumns I • The last layer of a CNN captures general features of the image, but is too coarse spatially to allow precise localization • Earlier layers instead may be precise in localization but will not capture semantics • Hariharan et al. [9] presented the hypercolumn concept, which puts togheter the information from both higher and lower layers to obtain better results on 3 fine-grained localization tasks: • Simultaneous detection and segmentation • Keypoint localization • Part labeling 44
  • 45. Hypercolumns II • The hypercolumn corresponding to a given input location is defined as the outputs of all units above that location at all layers of the CNN, stacked into one vector 45
  • 46. Problem setting I • Input: a set of detections (subjected to non-maximum suppression), each with a bounding box, a category label and a score • According to the task we are performing for each detection we want: • segment out the object • segment its parts • predict its keypoints • Whichever the task, the bounding boxes are slightly expanded and a 50x50 heatmap is predicted on each of them 46
  • 47. Problem setting II • The information encoded in each heatmap and the number of heatmaps depend on the chosen task: • For segmentation, the heatmap encodes the probability that a particular location is inside the object • For part labeling a separate heatmap is predicted for each part, where each heatmap is the probability a location belongs to that part • For keypoint localization a separate heatmap is predicted for each keypoint, with each heatmap encoding the probability that the keypoint is at a particular location • The heatmaps are finally resized to the size of the expanded bounding boxes • So all the tasks are solved assigning a probability to each of the 50x50 locations 47
  • 48. Problem setting III • For each of the 50x50 locations and for each category a classifier should be trained • But doing so has 3 problems: • The amount of data that each classifier sees during training is heavily reduced • Training so many classifiers is computationally expensive • While the classifier should vary according to the location, to adjacent pixels should be classified similarly • The solution is to train a coarse K × K (usually K = 5 or K = 10) grid of classifiers and interpolate between them 48
  • 49. Network architecture conv conv conv upsample upsample upsample sigmoid classifier interpolation Note: inverting the order of upsampling and convolutions (that calculate the K × K grids) and computing them separately for each of the 3 combined layers allows to reduce computational cost 49
  • 50. Bounding box refining • A special technique is used to improve the box selection, called rescoring 50
  • 55. Conclusion • We have seen how the Inception modules allow to train deeper and better networks in a computationally efficient manner • We have then observed how to transform a classification CNN into a fully convolutional network for pixel-wise classification • We have learned the hypercolumn technique to combine high and low level information to improve the accuracy on various fine-grained localization tasks 55
  • 56. Thank you for your patience! :) 56
  • 57. References I [1] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural Computation, vol. 1(4), pp. 541–551, 1989. [2] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, pp. 2278–2324, 1998. [3] A. W. Harley, “An interactive node-link visualization of convolutional neural networks,” in ISVC, pp. 867–877, 2015. [4] A. Kurenkov, “A ’brief’ history of neural nets and deep learning, part 4.” http://www.andreykurenkov.com/writing/ a-brief-history-of-neural-nets-and-deep-learning-part-4/. 57
  • 58. References II [5] A. Krizhevsky, I. Sutskever, , and G. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, vol. 25, pp. 1106–1114, 2012. [6] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” CoRR, vol. abs/1409.4842, 2014. [7] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” CoRR, vol. abs/1512.00567, 2015. [8] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic segmentation,” CoRR, vol. abs/1605.06211, 2016. 58
  • 59. References III [9] B. Hariharan, P. A. Arbel´aez, R. B. Girshick, and J. Malik, “Hypercolumns for object segmentation and fine-grained localization,” CoRR, vol. abs/1411.5752, 2014. 59