SlideShare a Scribd company logo
1 of 26
Download to read offline
ShuffleNet:
An Extremely Efficient Convolutional Neural
Network for Mobile Devices
10th December, 2017
PR12 Paper Review
Jinwon Lee
Samsung Electronics
Xiangyu Zhang, et al. “ShuffleNet: an extremely efficient
convolutional neural network for mobile devices.”, arXiv:1707.01083
BeforeWe Start…
• Xception
 PR-034 : reviewed by JaejunYoo
 https://youtu.be/V0dLhyg5_Dw
• MobileNet
 PR-044 : reviewed by Jinwon Lee
 https://youtu.be/7UoOFKcyIvM
Standard Convolution
Figures from http://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/
Inception v2 & v3
• Factorization of convolution filters
Xception
Slides from Jaejun Yoo’s PR-034: Xception
Depthwise Separable Convolution
• Depthwise Convolution + Pointwise Convolution(1x1 convolution)
Depthwise convolution Pointwise convolution
Figures from http://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/
Depthwise Separable Convolutions
MobileNetArchitecture
Toward More Efficient Network Architecture
• The 1x1 convolution accounts for most of the computation
• Can we reduce more?
A Secret of AlexNet
Grouped Convolution!
Grouped Convolution ofAlexNet
• AlexNet’s primary motivation was to allow the training of the
network over two Nvidia GTX580 GPUs with 1.5GB of memory each
• AlexNet without filter groups is not only less efficient(both in
parameters and compute), but also slightly less accurate!
Figure from https://blog.yani.io/filter-group-tutorial/
Main Ideas of ShuffleNet
• (Use depthwise separable convolution)
• Grouped convolution on 1x1 convolution layers – pointwise group
convolution
• Channel shuffle operation after pointwise group convolution
Grouped Convolution
Figure from https://blog.yani.io/filter-group-tutorial/
ResNext
• Equivalent building blocks of RexNext (cardinality=32)
1x1 Grouped Convolution with Channel Shuffling
• If multiple group convolutions stack together, there is one side effect(a)
 Outputs from a certain channel are only derived from a small fraction of input channels
• If we allow group convolution to obtain input data from different groups, the
input and outputchannels will be fully related.
Channel Shuffle Operation
• Suppose a convolutional layer with g groups whose output has g x n
channels; we first reshape the output channel dimension into (g, n),
transposing and then flattening it back as the input of next layer.
• Channel shuffle operation is also differentiable
Tensorflow Implementation from https://github.com/MG2033/ShuffleNet
ShuffleNet Units
ShuffleNet Units
• From (a), replace the first 1x1 layer with pointwise group convolution
followed by a channel shuffle operation
• ReLU is not applied to 3x3 DWConv
• As for the case where ShuffleNet is applied with stride, simply make
to modifications
 Add 3x3 average pooling on the shortcut path
 Replace element-wise addition with channel concatenation to enlarge
channel dimension with little extra computation
Complexity
• For example, given the input size c x h x w and the bottleneck channel m
ResNet ResNeXt ShuffleNet
ℎ𝑤(2𝑐𝑚 + 9𝑚2
) ℎ𝑤(2𝑐𝑚 + 9𝑚2
/𝑔) ℎ𝑤(2𝑐𝑚/𝑔 + 9𝑚)
<Number of Operations>
ShuffleNetArchitecture
Experimental Results
• To customize the network to a desired complexity, a scale factor s on
the number of channels is applied
 ShuffleNet s× means scaling the number of filters in ShuffleNet 1× by s
times thus overall complexity will be roughly s2 times of ShuffleNet 1×
• Grouped convolutions (g > 1) consistently perform better than the
counter parts without pointwise group convolutions (g=1). Smaller
models tend to benefit more from groups
Experimental Results
• It is clear that channel shuffle consistently boosts classification
scores for different settings
Experimental Results
• ShuffleNet models outperform most others by a significant margin
under different complexities
• Interestingly, they found an empirical relationship between feature
map channels and classification accuracy
 Under the complexity of 38 MFLOPs, output channels of Stage 4 forVGG-like,
ResNet, ResNeXt, Xception-like. ShuffleNet models are 50, 192, 192, 288,
576 respectively
• PVANet has 29.7% classification errors with 557MFLOPs / ShuffleNet
2x model(g=3) gets 26.3% with 524MFLOPs
Experimental Results
• It is clear that ShuffleNet models are superior to MobileNet for all the
complexities though ShuffleNet network is specially designed for small models
(< 150 MFLOPs)
• Results show that the shallower model is still significantly better than the
corresponding MobileNet, which implies that the effectiveness of ShuffleNet
mainly results from its efficient structure, not the depth.
Experimental Results
• Results show that with similar accuracy ShuffleNet is much more
efficient than others.
Experimental Results
• Due to memory access and other overheads, every 4x theoretical complexity
reduction usually result in ~2.6x actual speedup.Compared with AlexNet
achieving ~13x actual speedup(the theoretical speedup is 18x)

More Related Content

What's hot

"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
Edge AI and Vision Alliance
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
NAVER Engineering
 

What's hot (20)

Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Batch normalization paper review
Batch normalization paper reviewBatch normalization paper review
Batch normalization paper review
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)
 
Temporal Convolutional Networks - Dethroning RNN's for sequence modelling?
Temporal Convolutional Networks - Dethroning RNN's for sequence modelling?Temporal Convolutional Networks - Dethroning RNN's for sequence modelling?
Temporal Convolutional Networks - Dethroning RNN's for sequence modelling?
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
CVPR 2018 Paper Reading MobileNet V2
CVPR 2018 Paper Reading MobileNet V2CVPR 2018 Paper Reading MobileNet V2
CVPR 2018 Paper Reading MobileNet V2
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
 
EfficientNet
EfficientNetEfficientNet
EfficientNet
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
 
ConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explainedConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explained
 
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic SegmentationSemantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
 
Video Classification Basic
Video Classification Basic Video Classification Basic
Video Classification Basic
 
Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs)Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs)
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNN
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 
EfficientNet
EfficientNetEfficientNet
EfficientNet
 
Convolutional neural network in practice
Convolutional neural network in practiceConvolutional neural network in practice
Convolutional neural network in practice
 

Similar to ShuffleNet - PR054

PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
Jinwon Lee
 

Similar to ShuffleNet - PR054 (20)

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxEfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Introduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetIntroduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNet
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
 
1801.06434
1801.064341801.06434
1801.06434
 
B.tech_project_ppt.pptx
B.tech_project_ppt.pptxB.tech_project_ppt.pptx
B.tech_project_ppt.pptx
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlow
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
 
Loop parallelization & pipelining
Loop parallelization & pipeliningLoop parallelization & pipelining
Loop parallelization & pipelining
 
CNN.pptx
CNN.pptxCNN.pptx
CNN.pptx
 
Semi orthogonal low-rank matrix factorization for deep neural networks
Semi orthogonal low-rank matrix factorization for deep neural networksSemi orthogonal low-rank matrix factorization for deep neural networks
Semi orthogonal low-rank matrix factorization for deep neural networks
 
Review-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningReview-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learning
 
A Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksA Survey of Convolutional Neural Networks
A Survey of Convolutional Neural Networks
 
WaveNet
WaveNetWaveNet
WaveNet
 
Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Network Deconvolution review [cdm]
Network Deconvolution review [cdm]
 

More from Jinwon Lee

PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
Jinwon Lee
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
Jinwon Lee
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
Jinwon Lee
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
Jinwon Lee
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
Jinwon Lee
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
Jinwon Lee
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
Jinwon Lee
 

More from Jinwon Lee (20)

PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental Improvement
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingEfficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter Sharing
 
PVANet - PR033
PVANet - PR033PVANet - PR033
PVANet - PR033
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
Deep learning seminar_snu_161031
Deep learning seminar_snu_161031Deep learning seminar_snu_161031
Deep learning seminar_snu_161031
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

ShuffleNet - PR054

  • 1. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices 10th December, 2017 PR12 Paper Review Jinwon Lee Samsung Electronics Xiangyu Zhang, et al. “ShuffleNet: an extremely efficient convolutional neural network for mobile devices.”, arXiv:1707.01083
  • 2. BeforeWe Start… • Xception  PR-034 : reviewed by JaejunYoo  https://youtu.be/V0dLhyg5_Dw • MobileNet  PR-044 : reviewed by Jinwon Lee  https://youtu.be/7UoOFKcyIvM
  • 3. Standard Convolution Figures from http://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/
  • 4. Inception v2 & v3 • Factorization of convolution filters
  • 5. Xception Slides from Jaejun Yoo’s PR-034: Xception
  • 6. Depthwise Separable Convolution • Depthwise Convolution + Pointwise Convolution(1x1 convolution) Depthwise convolution Pointwise convolution Figures from http://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/
  • 9. Toward More Efficient Network Architecture • The 1x1 convolution accounts for most of the computation • Can we reduce more?
  • 10. A Secret of AlexNet Grouped Convolution!
  • 11. Grouped Convolution ofAlexNet • AlexNet’s primary motivation was to allow the training of the network over two Nvidia GTX580 GPUs with 1.5GB of memory each • AlexNet without filter groups is not only less efficient(both in parameters and compute), but also slightly less accurate! Figure from https://blog.yani.io/filter-group-tutorial/
  • 12. Main Ideas of ShuffleNet • (Use depthwise separable convolution) • Grouped convolution on 1x1 convolution layers – pointwise group convolution • Channel shuffle operation after pointwise group convolution
  • 13. Grouped Convolution Figure from https://blog.yani.io/filter-group-tutorial/
  • 14. ResNext • Equivalent building blocks of RexNext (cardinality=32)
  • 15. 1x1 Grouped Convolution with Channel Shuffling • If multiple group convolutions stack together, there is one side effect(a)  Outputs from a certain channel are only derived from a small fraction of input channels • If we allow group convolution to obtain input data from different groups, the input and outputchannels will be fully related.
  • 16. Channel Shuffle Operation • Suppose a convolutional layer with g groups whose output has g x n channels; we first reshape the output channel dimension into (g, n), transposing and then flattening it back as the input of next layer. • Channel shuffle operation is also differentiable Tensorflow Implementation from https://github.com/MG2033/ShuffleNet
  • 18. ShuffleNet Units • From (a), replace the first 1x1 layer with pointwise group convolution followed by a channel shuffle operation • ReLU is not applied to 3x3 DWConv • As for the case where ShuffleNet is applied with stride, simply make to modifications  Add 3x3 average pooling on the shortcut path  Replace element-wise addition with channel concatenation to enlarge channel dimension with little extra computation
  • 19. Complexity • For example, given the input size c x h x w and the bottleneck channel m ResNet ResNeXt ShuffleNet ℎ𝑤(2𝑐𝑚 + 9𝑚2 ) ℎ𝑤(2𝑐𝑚 + 9𝑚2 /𝑔) ℎ𝑤(2𝑐𝑚/𝑔 + 9𝑚) <Number of Operations>
  • 21. Experimental Results • To customize the network to a desired complexity, a scale factor s on the number of channels is applied  ShuffleNet s× means scaling the number of filters in ShuffleNet 1× by s times thus overall complexity will be roughly s2 times of ShuffleNet 1× • Grouped convolutions (g > 1) consistently perform better than the counter parts without pointwise group convolutions (g=1). Smaller models tend to benefit more from groups
  • 22. Experimental Results • It is clear that channel shuffle consistently boosts classification scores for different settings
  • 23. Experimental Results • ShuffleNet models outperform most others by a significant margin under different complexities • Interestingly, they found an empirical relationship between feature map channels and classification accuracy  Under the complexity of 38 MFLOPs, output channels of Stage 4 forVGG-like, ResNet, ResNeXt, Xception-like. ShuffleNet models are 50, 192, 192, 288, 576 respectively • PVANet has 29.7% classification errors with 557MFLOPs / ShuffleNet 2x model(g=3) gets 26.3% with 524MFLOPs
  • 24. Experimental Results • It is clear that ShuffleNet models are superior to MobileNet for all the complexities though ShuffleNet network is specially designed for small models (< 150 MFLOPs) • Results show that the shallower model is still significantly better than the corresponding MobileNet, which implies that the effectiveness of ShuffleNet mainly results from its efficient structure, not the depth.
  • 25. Experimental Results • Results show that with similar accuracy ShuffleNet is much more efficient than others.
  • 26. Experimental Results • Due to memory access and other overheads, every 4x theoretical complexity reduction usually result in ~2.6x actual speedup.Compared with AlexNet achieving ~13x actual speedup(the theoretical speedup is 18x)