Convolutional Neural Networks : Popular Architectures

A
CNN Popular Architectures
and Transfer Learning
Palacode Narayana Iyer Anantharaman
16th Oct 2018
Motivation : Why study these?
Understanding popular architectures help us in many ways:
• We develop a better understanding of the subject by studying high performant
architectures
• This helps perform our own research on newer models
• Learning their design philosophy help us to design our models more effectively.
• Use them as a backbone for transfer learning, selecting the right architecture
References
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/
Current trend: Deeper Models work better
• CNNs consistently outperform other
approaches for the core tasks of CV
• Deeper models work better
• Increasing the number of parameters in layers
of CNN without increasing their depth is not
effective at increasing test set performance.
• Shallow models overfit at around 20 million
parameters while deep ones can benefit from
having over 60 million.
• Key insight: Model performs better when it is
architected to reflect composition of simpler
functions than a single complex function. This
may also be explained off viewing the
computation as a chain of dependencies
Convolutional Neural Networks : Popular Architectures
VGG Net
VGG net
ResNet
Recent Results (Credits: CS231n Stanford)
Resnet Motivation
ResNet Approach
• Generally: Deeper the better - See the fig
• Issue: Hard to train deeper networks effectively :
Validation error goes up not just due to
overfitting but increase in training error
• Solution: Use skip connections to propagate the
activations to reduce the impact
• Rationale: Imagine how to get an identity output
through a deep network accurately.
ResNet Hypothesis : Rationale
• Deeper models should perform at least as well as shallower models
• More parameters, more degrees of freedom to get the training error down
• More depth, more ability to model abstractions
• Solution by construction is copying the learned layers of a shallower
model and setting the additional layers to identity mapping
ResNet
• Very deep network: Uses 152 layers
• Shallower versions (e.g. Resnet50) are
available
• The “go to” backbone network for many
applications such as Faster RCNN
• Pre trained weights are available for
Keras, TensorFlow
ResNet Details
• Default input size: (224, 224, 3)
• Each stage reduces the width, height dimensions by a factor of 2
• This property is leveraged in later implementations such as pyramidal networks
ImageNet Results summary table
GoogleNet
• Design choices of filter sizes: (3, 3), (5, 5) and so on – Which to choose for each
convolution layer?
• Why not try all of them and choose the best?
• Trying each possible value on every layer and experimenting manually is not a solution
• Inception layer allows multiple filter sizes and learn their contributions through
parameters automatically
Inception Layer Naïve Architecture
Inception Layer Naïve Architecture (Fig Udacity Deep Learning)
Inception Architecture with bottleneck layer
Bottleneck Layer
Example
• Consider an input volume (28, 28, 192) and an output volume (28, 28, 32)
• How many computations are needed if we use a 5 x 5 filter?
• Each filter will be 5 x 5 x 192, we will be moving this over a 28 x 28 surface and
we have 32 of them
• 28 x 28 x 32 x 5 x 5 x 192 = 120M
• If we need multiple such filters, we need to add up corresponding computations
for each of them
• On a very deep network these many computations are prohibitively large even
when we use powerful hardware
• By reducing the dimensionality of the input before final convolutions, we get a
manageable number of computations
Example with bottleneck layer
• In our example, we can transform the 28 x 28 x 192 in to same sized surface but
much reduced depth (say 16) using 1 x 1 convolutions
• The bottleneck layer has the shape (28 x 28 x 16)
• Perform the required convolutions (e.g. 5 x 5) on the bottleneck layer to generate
the final output volume
• Computations: #computations between input to bottleneck layer +
#computations between bottleneck to output. In our example this is 12M
GoogleNet architecture with Inception layer
State of the art : SENet
• ImageNet 2017 topper in multiple categories
• A novel technique to weight the contributions of the channels of a
convolutional layer
SeNet : Squeeze and Excitation Network
• SeNet is the winning architecture of ImageNet 2017 in multiple categories
• Error rate on image classification: 2.251%
• Key Idea:
• In authors’ words: “Improve the representational power of the network by explicitly
modelling interdependencies between channels of its convolutional features”
• Simple explanation: Add parameters to each channel of a convolutional block so that network
can adaptively adjust the weighting of each feature map
SeNet Rationale
• Deep CNN’s learn increasing levels of abstractions from lower to higher layers. Lower
layers have higher resolution and can extract basic elements of information
• Higher layers can detect faces or generate text etc and deal with abstract information
• All of this works by fusing the spatial and channel information of an image.
• The network weights each of its channels equally when creating the output feature maps.
• SENets change this by adding a content aware mechanism to weight each channel
adaptively. In it’s most basic form this could mean adding a single parameter to each
channel and giving it a linear scalar how relevant each one is.
SENet Architecture
• Get a global understanding of each channel by squeezing the feature maps to a
single numeric value. This results in a vector of size n, where n is equal to the
number of convolutional channels.
• Afterwards, it is fed through a two-layer neural network, which outputs a vector
of the same size. These n values can now be used as weights on the original
features maps, scaling each channel based on its importance.
Code Illustration of the key idea
Ref: https://towardsdatascience.com/squeeze-and-excitation-networks-9ef5e71eacd7
Credits: Paul-Louis Prove
High level steps
Ref: https://towardsdatascience.com/squeeze-and-excitation-networks-9ef5e71eacd7
Credits: Paul-Louis Prove
Adding squeeze excitation technique to ResNet
1 von 30

Recomendados

Convolutional Neural Networks (CNN) von
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
58.5K views70 Folien
Resnet von
ResnetResnet
Resnetashwinjoseph95
1.6K views16 Folien
Introduction to CNN von
Introduction to CNNIntroduction to CNN
Introduction to CNNShuai Zhang
8.8K views18 Folien
Cnn von
CnnCnn
CnnNirthika Rajendran
14K views31 Folien
Convolutional Neural Networks von
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural NetworksAshray Bhandare
16.2K views80 Folien
Convolutional Neural Network (CNN) - image recognition von
Convolutional Neural Network (CNN)  - image recognitionConvolutional Neural Network (CNN)  - image recognition
Convolutional Neural Network (CNN) - image recognitionYUNG-KUEI CHEN
4.3K views64 Folien

Más contenido relacionado

Was ist angesagt?

Densely Connected Convolutional Networks von
Densely Connected Convolutional NetworksDensely Connected Convolutional Networks
Densely Connected Convolutional NetworksHosein Mohebbi
887 views27 Folien
Machine Learning - Convolutional Neural Network von
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkRichard Kuo
1K views36 Folien
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks von
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksJinwon Lee
4.3K views31 Folien
CNN Tutorial von
CNN TutorialCNN Tutorial
CNN TutorialSungjoon Choi
5.6K views37 Folien
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu... von
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...Simplilearn
2K views73 Folien
LeNet to ResNet von
LeNet to ResNetLeNet to ResNet
LeNet to ResNetSomnath Banerjee
7.6K views21 Folien

Was ist angesagt?(20)

Densely Connected Convolutional Networks von Hosein Mohebbi
Densely Connected Convolutional NetworksDensely Connected Convolutional Networks
Densely Connected Convolutional Networks
Hosein Mohebbi887 views
Machine Learning - Convolutional Neural Network von Richard Kuo
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
Richard Kuo1K views
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks von Jinwon Lee
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Jinwon Lee4.3K views
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu... von Simplilearn
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
Simplilearn2K views
Artificial Neural Network von Prakash K
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
Prakash K2K views
Convolutional neural network von MojammilHusain
Convolutional neural networkConvolutional neural network
Convolutional neural network
MojammilHusain1.1K views
Convolutional neural network from VGG to DenseNet von SungminYou
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
SungminYou992 views
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S... von Simplilearn
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Simplilearn13.6K views
Overview of Convolutional Neural Networks von ananth
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
ananth7.7K views
Introduction to deep learning von Junaid Bhat
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
Junaid Bhat1.4K views
MobileNet - PR044 von Jinwon Lee
MobileNet - PR044MobileNet - PR044
MobileNet - PR044
Jinwon Lee8.5K views
CNNs: from the Basics to Recent Advances von Dmytro Mishkin
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent Advances
Dmytro Mishkin3.9K views
Handwritten Digit Recognition(Convolutional Neural Network) PPT von RishabhTyagi48
Handwritten Digit Recognition(Convolutional Neural Network) PPTHandwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPT
RishabhTyagi4818.8K views
CNN and its applications by ketaki von Ketaki Patwari
CNN and its applications by ketakiCNN and its applications by ketaki
CNN and its applications by ketaki
Ketaki Patwari1.7K views
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural... von Simplilearn
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Simplilearn4.1K views

Similar a Convolutional Neural Networks : Popular Architectures

PR243: Designing Network Design Spaces von
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesJinwon Lee
875 views43 Folien
PR-183: MixNet: Mixed Depthwise Convolutional Kernels von
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsJinwon Lee
968 views33 Folien
TensorFlow.pptx von
TensorFlow.pptxTensorFlow.pptx
TensorFlow.pptxJayesh Patil
35 views24 Folien
Introduction to CNN Models: DenseNet & MobileNet von
Introduction to CNN Models: DenseNet & MobileNetIntroduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetKrishnakoumarC
5 views7 Folien
Cnn von
CnnCnn
CnnMehrnaz Faraz
630 views34 Folien
Convolutional Neural Networks von
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networksmilad abbasi
318 views34 Folien

Similar a Convolutional Neural Networks : Popular Architectures(20)

PR243: Designing Network Design Spaces von Jinwon Lee
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
Jinwon Lee875 views
PR-183: MixNet: Mixed Depthwise Convolutional Kernels von Jinwon Lee
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
Jinwon Lee968 views
Introduction to CNN Models: DenseNet & MobileNet von KrishnakoumarC
Introduction to CNN Models: DenseNet & MobileNetIntroduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNet
KrishnakoumarC5 views
Convolutional Neural Networks von milad abbasi
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
milad abbasi318 views
Introduction to computer vision von Marcin Jedyk
Introduction to computer visionIntroduction to computer vision
Introduction to computer vision
Marcin Jedyk56 views
Introduction to computer vision with Convoluted Neural Networks von MarcinJedyk
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural Networks
MarcinJedyk60 views
Handwritten Digit Recognition and performance of various modelsation[autosaved] von SubhradeepMaji
Handwritten Digit Recognition and performance of various modelsation[autosaved]Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]
SubhradeepMaji494 views
Once-for-All: Train One Network and Specialize it for Efficient Deployment von taeseon ryu
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
taeseon ryu360 views
intro-to-cnn-April_2020.pptx von ssuser3aa461
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
ssuser3aa46114 views
201907 AutoML and Neural Architecture Search von DaeJin Kim
201907 AutoML and Neural Architecture Search201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search
DaeJin Kim950 views
A Survey of Convolutional Neural Networks von Rimzim Thube
A Survey of Convolutional Neural NetworksA Survey of Convolutional Neural Networks
A Survey of Convolutional Neural Networks
Rimzim Thube113 views
Cvpr 2018 papers review (efficient computing) von DonghyunKang12
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
DonghyunKang12400 views
PR-144: SqueezeNext: Hardware-Aware Neural Network Design von Jinwon Lee
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
Jinwon Lee1.2K views
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio... von Edge AI and Vision Alliance
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
Modern Convolutional Neural Network techniques for image segmentation von Gioele Ciaparrone
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentation
Gioele Ciaparrone2.9K views

Más de ananth

Generative Adversarial Networks : Basic architecture and variants von
Generative Adversarial Networks : Basic architecture and variantsGenerative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsananth
1.8K views20 Folien
Foundations: Artificial Neural Networks von
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networksananth
2.2K views50 Folien
Artificial Intelligence Course: Linear models von
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
1.4K views47 Folien
An Overview of Naïve Bayes Classifier von
An Overview of Naïve Bayes Classifier An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier ananth
3.9K views26 Folien
Mathematical Background for Artificial Intelligence von
Mathematical Background for Artificial IntelligenceMathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligenceananth
1.7K views68 Folien
Search problems in Artificial Intelligence von
Search problems in Artificial IntelligenceSearch problems in Artificial Intelligence
Search problems in Artificial Intelligenceananth
4.1K views36 Folien

Más de ananth(20)

Generative Adversarial Networks : Basic architecture and variants von ananth
Generative Adversarial Networks : Basic architecture and variantsGenerative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variants
ananth1.8K views
Foundations: Artificial Neural Networks von ananth
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
ananth2.2K views
Artificial Intelligence Course: Linear models von ananth
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
ananth1.4K views
An Overview of Naïve Bayes Classifier von ananth
An Overview of Naïve Bayes Classifier An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier
ananth3.9K views
Mathematical Background for Artificial Intelligence von ananth
Mathematical Background for Artificial IntelligenceMathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligence
ananth1.7K views
Search problems in Artificial Intelligence von ananth
Search problems in Artificial IntelligenceSearch problems in Artificial Intelligence
Search problems in Artificial Intelligence
ananth4.1K views
Introduction to Artificial Intelligence von ananth
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence
ananth4.5K views
Word representation: SVD, LSA, Word2Vec von ananth
Word representation: SVD, LSA, Word2VecWord representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2Vec
ananth11.4K views
Deep Learning For Speech Recognition von ananth
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
ananth3.3K views
Overview of TensorFlow For Natural Language Processing von ananth
Overview of TensorFlow For Natural Language ProcessingOverview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language Processing
ananth6K views
Convolutional Neural Networks: Part 1 von ananth
Convolutional Neural Networks: Part 1Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1
ananth1.5K views
Machine Learning Lecture 3 Decision Trees von ananth
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
ananth5.8K views
Machine Learning Lecture 2 Basics von ananth
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basics
ananth1.6K views
Introduction To Applied Machine Learning von ananth
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learning
ananth2.2K views
Recurrent Neural Networks, LSTM and GRU von ananth
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
ananth29.8K views
MaxEnt (Loglinear) Models - Overview von ananth
MaxEnt (Loglinear) Models - OverviewMaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - Overview
ananth2K views
An overview of Hidden Markov Models (HMM) von ananth
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)
ananth8K views
L06 stemmer and edit distance von ananth
L06 stemmer and edit distanceL06 stemmer and edit distance
L06 stemmer and edit distance
ananth2.2K views
L05 language model_part2 von ananth
L05 language model_part2L05 language model_part2
L05 language model_part2
ananth1.2K views
L05 word representation von ananth
L05 word representationL05 word representation
L05 word representation
ananth1.2K views

Último

Software evolution understanding: Automatic extraction of software identifier... von
Software evolution understanding: Automatic extraction of software identifier...Software evolution understanding: Automatic extraction of software identifier...
Software evolution understanding: Automatic extraction of software identifier...Ra'Fat Al-Msie'deen
9 views33 Folien
Airline Booking Software von
Airline Booking SoftwareAirline Booking Software
Airline Booking SoftwareSharmiMehta
6 views26 Folien
DSD-INT 2023 Salt intrusion Modelling of the Lauwersmeer, towards a measureme... von
DSD-INT 2023 Salt intrusion Modelling of the Lauwersmeer, towards a measureme...DSD-INT 2023 Salt intrusion Modelling of the Lauwersmeer, towards a measureme...
DSD-INT 2023 Salt intrusion Modelling of the Lauwersmeer, towards a measureme...Deltares
5 views28 Folien
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J... von
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...Deltares
9 views24 Folien
Advanced API Mocking Techniques von
Advanced API Mocking TechniquesAdvanced API Mocking Techniques
Advanced API Mocking TechniquesDimpy Adhikary
19 views11 Folien
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra... von
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra....NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...Marc Müller
38 views62 Folien

Último(20)

Software evolution understanding: Automatic extraction of software identifier... von Ra'Fat Al-Msie'deen
Software evolution understanding: Automatic extraction of software identifier...Software evolution understanding: Automatic extraction of software identifier...
Software evolution understanding: Automatic extraction of software identifier...
Airline Booking Software von SharmiMehta
Airline Booking SoftwareAirline Booking Software
Airline Booking Software
SharmiMehta6 views
DSD-INT 2023 Salt intrusion Modelling of the Lauwersmeer, towards a measureme... von Deltares
DSD-INT 2023 Salt intrusion Modelling of the Lauwersmeer, towards a measureme...DSD-INT 2023 Salt intrusion Modelling of the Lauwersmeer, towards a measureme...
DSD-INT 2023 Salt intrusion Modelling of the Lauwersmeer, towards a measureme...
Deltares5 views
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J... von Deltares
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...
Deltares9 views
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra... von Marc Müller
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra....NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
Marc Müller38 views
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports von Ra'Fat Al-Msie'deen
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug ReportsBushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports
FIMA 2023 Neo4j & FS - Entity Resolution.pptx von Neo4j
FIMA 2023 Neo4j & FS - Entity Resolution.pptxFIMA 2023 Neo4j & FS - Entity Resolution.pptx
FIMA 2023 Neo4j & FS - Entity Resolution.pptx
Neo4j7 views
DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -... von Deltares
DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -...DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -...
DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -...
Deltares6 views
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with... von sparkfabrik
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...
sparkfabrik5 views
Myths and Facts About Hospice Care: Busting Common Misconceptions von Care Coordinations
Myths and Facts About Hospice Care: Busting Common MisconceptionsMyths and Facts About Hospice Care: Busting Common Misconceptions
Myths and Facts About Hospice Care: Busting Common Misconceptions
Copilot Prompting Toolkit_All Resources.pdf von Riccardo Zamana
Copilot Prompting Toolkit_All Resources.pdfCopilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdf
Riccardo Zamana8 views
Fleet Management Software in India von Fleetable
Fleet Management Software in India Fleet Management Software in India
Fleet Management Software in India
Fleetable11 views
Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ... von Donato Onofri
Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ...Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ...
Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ...
Donato Onofri825 views
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI... von Marc Müller
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Marc Müller37 views

Convolutional Neural Networks : Popular Architectures

  • 1. CNN Popular Architectures and Transfer Learning Palacode Narayana Iyer Anantharaman 16th Oct 2018
  • 2. Motivation : Why study these? Understanding popular architectures help us in many ways: • We develop a better understanding of the subject by studying high performant architectures • This helps perform our own research on newer models • Learning their design philosophy help us to design our models more effectively. • Use them as a backbone for transfer learning, selecting the right architecture
  • 4. Current trend: Deeper Models work better • CNNs consistently outperform other approaches for the core tasks of CV • Deeper models work better • Increasing the number of parameters in layers of CNN without increasing their depth is not effective at increasing test set performance. • Shallow models overfit at around 20 million parameters while deep ones can benefit from having over 60 million. • Key insight: Model performs better when it is architected to reflect composition of simpler functions than a single complex function. This may also be explained off viewing the computation as a chain of dependencies
  • 9. Recent Results (Credits: CS231n Stanford)
  • 11. ResNet Approach • Generally: Deeper the better - See the fig • Issue: Hard to train deeper networks effectively : Validation error goes up not just due to overfitting but increase in training error • Solution: Use skip connections to propagate the activations to reduce the impact • Rationale: Imagine how to get an identity output through a deep network accurately.
  • 12. ResNet Hypothesis : Rationale • Deeper models should perform at least as well as shallower models • More parameters, more degrees of freedom to get the training error down • More depth, more ability to model abstractions • Solution by construction is copying the learned layers of a shallower model and setting the additional layers to identity mapping
  • 13. ResNet • Very deep network: Uses 152 layers • Shallower versions (e.g. Resnet50) are available • The “go to” backbone network for many applications such as Faster RCNN • Pre trained weights are available for Keras, TensorFlow
  • 14. ResNet Details • Default input size: (224, 224, 3) • Each stage reduces the width, height dimensions by a factor of 2 • This property is leveraged in later implementations such as pyramidal networks
  • 16. GoogleNet • Design choices of filter sizes: (3, 3), (5, 5) and so on – Which to choose for each convolution layer? • Why not try all of them and choose the best? • Trying each possible value on every layer and experimenting manually is not a solution • Inception layer allows multiple filter sizes and learn their contributions through parameters automatically
  • 17. Inception Layer Naïve Architecture
  • 18. Inception Layer Naïve Architecture (Fig Udacity Deep Learning)
  • 19. Inception Architecture with bottleneck layer
  • 21. Example • Consider an input volume (28, 28, 192) and an output volume (28, 28, 32) • How many computations are needed if we use a 5 x 5 filter? • Each filter will be 5 x 5 x 192, we will be moving this over a 28 x 28 surface and we have 32 of them • 28 x 28 x 32 x 5 x 5 x 192 = 120M • If we need multiple such filters, we need to add up corresponding computations for each of them • On a very deep network these many computations are prohibitively large even when we use powerful hardware • By reducing the dimensionality of the input before final convolutions, we get a manageable number of computations
  • 22. Example with bottleneck layer • In our example, we can transform the 28 x 28 x 192 in to same sized surface but much reduced depth (say 16) using 1 x 1 convolutions • The bottleneck layer has the shape (28 x 28 x 16) • Perform the required convolutions (e.g. 5 x 5) on the bottleneck layer to generate the final output volume • Computations: #computations between input to bottleneck layer + #computations between bottleneck to output. In our example this is 12M
  • 23. GoogleNet architecture with Inception layer
  • 24. State of the art : SENet • ImageNet 2017 topper in multiple categories • A novel technique to weight the contributions of the channels of a convolutional layer
  • 25. SeNet : Squeeze and Excitation Network • SeNet is the winning architecture of ImageNet 2017 in multiple categories • Error rate on image classification: 2.251% • Key Idea: • In authors’ words: “Improve the representational power of the network by explicitly modelling interdependencies between channels of its convolutional features” • Simple explanation: Add parameters to each channel of a convolutional block so that network can adaptively adjust the weighting of each feature map
  • 26. SeNet Rationale • Deep CNN’s learn increasing levels of abstractions from lower to higher layers. Lower layers have higher resolution and can extract basic elements of information • Higher layers can detect faces or generate text etc and deal with abstract information • All of this works by fusing the spatial and channel information of an image. • The network weights each of its channels equally when creating the output feature maps. • SENets change this by adding a content aware mechanism to weight each channel adaptively. In it’s most basic form this could mean adding a single parameter to each channel and giving it a linear scalar how relevant each one is.
  • 27. SENet Architecture • Get a global understanding of each channel by squeezing the feature maps to a single numeric value. This results in a vector of size n, where n is equal to the number of convolutional channels. • Afterwards, it is fed through a two-layer neural network, which outputs a vector of the same size. These n values can now be used as weights on the original features maps, scaling each channel based on its importance.
  • 28. Code Illustration of the key idea Ref: https://towardsdatascience.com/squeeze-and-excitation-networks-9ef5e71eacd7 Credits: Paul-Louis Prove
  • 29. High level steps Ref: https://towardsdatascience.com/squeeze-and-excitation-networks-9ef5e71eacd7 Credits: Paul-Louis Prove
  • 30. Adding squeeze excitation technique to ResNet