SlideShare a Scribd company logo
1 of 20
Pruning Convolutional Neural
Networks for resource
efficient inference
Presented by: Kaushalya Madhawa
27th January 2017
Molchanov, Pavlo, et al. "Pruning Convolutional Neural Networks for Resource Efficient
Transfer Learning." arXiv preprint arXiv:1611.06440 (2016).
The paper
2
● Will be presented
at ICLR 2017 -
24-26th April
● Anonymous
reviewer ratings
○ 9
○ 6
○ 6
https://openreview.net/forum?id=SJGCiw5gl
Optimizing neural networks
Goal: running trained neural networks on mobile devices
1.Designing optimized networks from scratch
2.Optimizing pre-trained networks
Deep Compression (Han et al.)
3
Optimizing pre-trained neural networks
Reasons for pruning pre-trained networks
Transfer learning: fine-tuning an existing deep neural network
previously trained on a larger related dataset results in
higher accuracies
Objectives of pruning:
Improving the speed of inference
Reducing the size of the trained model
Better generalization
4
Which parameters to be pruned?
Saliency: A measure of importance
Parameters with least saliency will be deleted
“Magnitude equals saliency”
Parameters with smaller magnitudes have low saliency
Criteria for pruning
Magnitude of weight
a convolutional kernel with low l2 norm detects less important
features than those with a high norm
Magnitude of activation
if an activation value is small, then this feature detector is not
important for prediction task
Pruning the parameters which has the least effect on the trained
5
Which parameters to be pruned?
Saliency: A measure of importance
Parameters with least saliency will be deleted
“Magnitude equals saliency”
Parameters with smaller magnitudes have low saliency
Criteria for pruning
Magnitude of weight
a convolutional kernel with low l2 norm detects less important
features than those with a high norm
Magnitude of activation
if an activation value is small, then this feature detector is not
important for prediction task
Pruning the parameters which has the least effect on the trained
6
Contributions of this paper
New saliency measure based on the
first-order Taylor expansion
Significant reduction in floating point
operations per second (FLOPs)
without a significant loss in accuracy
Oracle pruning as a general method to
compare network pruning models
7
Pruning as an optimization problem
Find a subset of parameters which preserves the accuracy of
the trained network
Impractical to solve this combinatorial optimization problem
for current networks
ie: VGG-16 has 4,224 convolutional feature maps
8
Taylor series approximation
Taylor approximation used to approximate the change in the
loss function from removing a particular parameter (hi)
Parameters are assumed to be independent
First order Taylor polynomial
9
Optimal Brain Damage (Le Cun et al., 1990)
Change of loss function approximated by second order Taylor
polynomial
10
The effect of parameters are assumed to be independent
Parameter pruning is performed once the training is converged
OBD is 30 times slower than he proposed Taylor method for
saliency calculation
Experiments
Data sets
Flowers-102
Birds-200
ImageNet
Implemented using Theano
Layerwise l2-normalization
FLOPs regularization
Feature maps from different layers require different amounts of
computation due to the number of input feature maps and kernels
11
Experiments...
Compared against
Oracle pruning: computing the effect of removal of each parameter
and the one which has the least effect on the cost function is
pruned at each iteration
Optimal Brain Damage (OBD)
Minimum weight
Magnitude of activation
Mean
Standard deviation
Average Percentage of Zeros (APoZ) : neurons with low average
percentage of positive activations are pruned (Hu et al., 2016)
12
Feature maps at the first few layers have similar APoZ regardless of the
network’s target
Results
Spearman rank against the oracle ranking calculated for each
criterion
13
Layerwise contribution to the loss
Oracle pruning on VGG-16 trained on Birds-200 dataset
Layers with max-pooling tend to be more important than those
without (layers 2, 4, 7, 10, and 13)
14
Importance of normalization across layers
15
Pruning VGG-16 (Simonyan & Zisserman, 2015)
16
A network with 50% of the original
parameters trained from scratch
OBD
Parameters
FLOPs
● Pruning of feature maps in VGG-16 trained on the Birds-200
dataset (30 mini-batch SGD updates after pruning a feature
map)
Pruning AlexNet (Krizhevsky et al., 2012)
● Pruning of feature maps in AlexNet trained on the Flowers-102
dataset (10 mini-batch SGD updates after pruning a feature
map)
17
Speedup of networks pruned by Taylor criterion
18
● All experiments performed in Theano with cuDNN v5.1.0
Conclusion
An efficient saliency measure to decide which
parameters can be pruned without a significant loss
of accuracy
Provides a thorough evaluation of many aspects of
network pruning
A theoretical explanation about how the gradient
contains information about the magnitude of the
activations is needed
19
References
[1] Molchanov, Pavlo, et al. "Pruning Convolutional Neural Networks for Resource Efficient Transfer
Learning." arXiv preprint arXiv:1611.06440 (2016).
[2] Hengyuan Hu, Rui Peng, Yu-Wing Tai, and Chi-Keung Tang. Network trimming: A data-driven neuron
pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250, 2016
[3] S. Han, H. Mao, and W. J. Dally, “Deep Compression - Compressing Deep Neural Networks with
Pruning, Trained Quantization and Huffman Coding,” Int. Conf. Learn. Represent., pp. 1–13, 2016.
[4] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image
recognition. In ICLR, 2015
[5] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep
convolutional neural networks." Advances in neural information processing systems. 2012.
20

More Related Content

What's hot

Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
PyData
 

What's hot (20)

Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
 
Genetic Algorithm (GA) Optimization - Step-by-Step Example
Genetic Algorithm (GA) Optimization - Step-by-Step ExampleGenetic Algorithm (GA) Optimization - Step-by-Step Example
Genetic Algorithm (GA) Optimization - Step-by-Step Example
 
Deep learning for real life applications
Deep learning for real life applicationsDeep learning for real life applications
Deep learning for real life applications
 
Capsule Networks
Capsule NetworksCapsule Networks
Capsule Networks
 
Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural Networks
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic SegmentationSemantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & Future
 
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIGenerative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 

Viewers also liked

Real time machine learning visualization with spark -- Hadoop Summit 2016
Real time machine learning visualization with spark -- Hadoop Summit 2016Real time machine learning visualization with spark -- Hadoop Summit 2016
Real time machine learning visualization with spark -- Hadoop Summit 2016
Chester Chen
 
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLabSF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
Chester Chen
 
Neural_Programmer_Interpreter
Neural_Programmer_InterpreterNeural_Programmer_Interpreter
Neural_Programmer_Interpreter
Katy Lee
 

Viewers also liked (20)

Leveraging mobile network big data for urban planning
Leveraging mobile network big data for urban planningLeveraging mobile network big data for urban planning
Leveraging mobile network big data for urban planning
 
Orthogonal porjection in statistics
Orthogonal porjection in statisticsOrthogonal porjection in statistics
Orthogonal porjection in statistics
 
Unsupervised Learning Techniques to Diversifying and Pruning Random Forest
Unsupervised Learning Techniques to Diversifying and Pruning Random ForestUnsupervised Learning Techniques to Diversifying and Pruning Random Forest
Unsupervised Learning Techniques to Diversifying and Pruning Random Forest
 
Real time machine learning visualization with spark -- Hadoop Summit 2016
Real time machine learning visualization with spark -- Hadoop Summit 2016Real time machine learning visualization with spark -- Hadoop Summit 2016
Real time machine learning visualization with spark -- Hadoop Summit 2016
 
Alpine Tech Talk: System ML by Berthold Reinwald
Alpine Tech Talk: System ML by Berthold ReinwaldAlpine Tech Talk: System ML by Berthold Reinwald
Alpine Tech Talk: System ML by Berthold Reinwald
 
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLabSF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
 
Real Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkReal Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With Spark
 
Paper Reading, "On Causal and Anticausal Learning", ICML-12
Paper Reading, "On Causal and Anticausal Learning", ICML-12Paper Reading, "On Causal and Anticausal Learning", ICML-12
Paper Reading, "On Causal and Anticausal Learning", ICML-12
 
Neural_Programmer_Interpreter
Neural_Programmer_InterpreterNeural_Programmer_Interpreter
Neural_Programmer_Interpreter
 
Making neural programming architectures generalize via recursion
Making neural programming architectures generalize via recursionMaking neural programming architectures generalize via recursion
Making neural programming architectures generalize via recursion
 
[DL輪読会] Hybrid computing using a neural network with dynamic external memory
[DL輪読会] Hybrid computing using a neural network with dynamic external memory[DL輪読会] Hybrid computing using a neural network with dynamic external memory
[DL輪読会] Hybrid computing using a neural network with dynamic external memory
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
 
[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural Networks
[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural Networks[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural Networks
[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural Networks
 
[DL輪読会]Unsupervised Cross-Domain Image Generation
[DL輪読会]Unsupervised Cross-Domain Image Generation[DL輪読会]Unsupervised Cross-Domain Image Generation
[DL輪読会]Unsupervised Cross-Domain Image Generation
 
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
 
[DL輪読会]StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generat...
[DL輪読会]StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generat...[DL輪読会]StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generat...
[DL輪読会]StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generat...
 
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN Applications
 
Projection In Computer Graphics
Projection In Computer GraphicsProjection In Computer Graphics
Projection In Computer Graphics
 
[DL輪読会] GAN系の研究まとめ (NIPS2016とICLR2016が中心)
[DL輪読会] GAN系の研究まとめ (NIPS2016とICLR2016が中心)[DL輪読会] GAN系の研究まとめ (NIPS2016とICLR2016が中心)
[DL輪読会] GAN系の研究まとめ (NIPS2016とICLR2016が中心)
 
[DL輪読会]Understanding deep learning requires rethinking generalization
[DL輪読会]Understanding deep learning requires rethinking generalization[DL輪読会]Understanding deep learning requires rethinking generalization
[DL輪読会]Understanding deep learning requires rethinking generalization
 

Similar to Pruning convolutional neural networks for resource efficient inference

AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
IAEME Publication
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
Bomm Kim
 
Hyper-parameter optimization of convolutional neural network based on particl...
Hyper-parameter optimization of convolutional neural network based on particl...Hyper-parameter optimization of convolutional neural network based on particl...
Hyper-parameter optimization of convolutional neural network based on particl...
journalBEEI
 
Energy Efficient Optimal Paths Using PDORP-LC
Energy Efficient Optimal Paths Using PDORP-LCEnergy Efficient Optimal Paths Using PDORP-LC
Energy Efficient Optimal Paths Using PDORP-LC
paperpublications3
 
Intelligent Controller Design for a Chemical Process
Intelligent Controller Design for a Chemical ProcessIntelligent Controller Design for a Chemical Process
Intelligent Controller Design for a Chemical Process
CSCJournals
 
Artificial Neural Networks (ANNS) For Prediction of California Bearing Ratio ...
Artificial Neural Networks (ANNS) For Prediction of California Bearing Ratio ...Artificial Neural Networks (ANNS) For Prediction of California Bearing Ratio ...
Artificial Neural Networks (ANNS) For Prediction of California Bearing Ratio ...
IJMER
 
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
ijsrd.com
 

Similar to Pruning convolutional neural networks for resource efficient inference (20)

PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
 
Particle Swarm Optimization Based QoS Aware Routing for Wireless Sensor Networks
Particle Swarm Optimization Based QoS Aware Routing for Wireless Sensor NetworksParticle Swarm Optimization Based QoS Aware Routing for Wireless Sensor Networks
Particle Swarm Optimization Based QoS Aware Routing for Wireless Sensor Networks
 
Complex system
Complex systemComplex system
Complex system
 
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
 
6119ijcsitce01
6119ijcsitce016119ijcsitce01
6119ijcsitce01
 
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
 
IRJET-AI Neural Network Disaster Recovery Cloud Operations Systems
IRJET-AI Neural Network Disaster Recovery Cloud Operations SystemsIRJET-AI Neural Network Disaster Recovery Cloud Operations Systems
IRJET-AI Neural Network Disaster Recovery Cloud Operations Systems
 
Deep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLabDeep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLab
 
Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...
Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...
Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...
 
Hyper-parameter optimization of convolutional neural network based on particl...
Hyper-parameter optimization of convolutional neural network based on particl...Hyper-parameter optimization of convolutional neural network based on particl...
Hyper-parameter optimization of convolutional neural network based on particl...
 
Energy Efficient Optimal Paths Using PDORP-LC
Energy Efficient Optimal Paths Using PDORP-LCEnergy Efficient Optimal Paths Using PDORP-LC
Energy Efficient Optimal Paths Using PDORP-LC
 
Intelligent Controller Design for a Chemical Process
Intelligent Controller Design for a Chemical ProcessIntelligent Controller Design for a Chemical Process
Intelligent Controller Design for a Chemical Process
 
Particle swarm optimization based clustering by preventing residual nodes in ...
Particle swarm optimization based clustering by preventing residual nodes in ...Particle swarm optimization based clustering by preventing residual nodes in ...
Particle swarm optimization based clustering by preventing residual nodes in ...
 
Fault-Tolerance Aware Multi Objective Scheduling Algorithm for Task Schedulin...
Fault-Tolerance Aware Multi Objective Scheduling Algorithm for Task Schedulin...Fault-Tolerance Aware Multi Objective Scheduling Algorithm for Task Schedulin...
Fault-Tolerance Aware Multi Objective Scheduling Algorithm for Task Schedulin...
 
Artificial Neural Networks (ANNS) For Prediction of California Bearing Ratio ...
Artificial Neural Networks (ANNS) For Prediction of California Bearing Ratio ...Artificial Neural Networks (ANNS) For Prediction of California Bearing Ratio ...
Artificial Neural Networks (ANNS) For Prediction of California Bearing Ratio ...
 
EFFICIENT USE OF HYBRID ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM COMBINED WITH N...
EFFICIENT USE OF HYBRID ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM COMBINED WITH N...EFFICIENT USE OF HYBRID ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM COMBINED WITH N...
EFFICIENT USE OF HYBRID ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM COMBINED WITH N...
 
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
 
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
Artificial Neural Network Based Graphical User Interface for Estimation of Fa...
 

More from Kaushalya Madhawa

More from Kaushalya Madhawa (8)

On the limitations of representing functions on sets
On the limitations of representing functions on setsOn the limitations of representing functions on sets
On the limitations of representing functions on sets
 
Graphs for Visual Understanding
Graphs for Visual UnderstandingGraphs for Visual Understanding
Graphs for Visual Understanding
 
Trends in DNN compression
Trends in DNN compressionTrends in DNN compression
Trends in DNN compression
 
Robustness of compressed CNNs
Robustness of compressed CNNsRobustness of compressed CNNs
Robustness of compressed CNNs
 
ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with ...
ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with ...ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with ...
ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with ...
 
Opportunities in Higher Education & Career Guidance
Opportunities in Higher Education & Career GuidanceOpportunities in Higher Education & Career Guidance
Opportunities in Higher Education & Career Guidance
 
Automatic generation of event summaries using microblog streams
Automatic generation of event summaries using microblog streamsAutomatic generation of event summaries using microblog streams
Automatic generation of event summaries using microblog streams
 
Understanding social connections
Understanding social connectionsUnderstanding social connections
Understanding social connections
 

Recently uploaded

Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 

Recently uploaded (20)

Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curve
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdf
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Introduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxIntroduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptx
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 

Pruning convolutional neural networks for resource efficient inference

  • 1. Pruning Convolutional Neural Networks for resource efficient inference Presented by: Kaushalya Madhawa 27th January 2017 Molchanov, Pavlo, et al. "Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning." arXiv preprint arXiv:1611.06440 (2016).
  • 2. The paper 2 ● Will be presented at ICLR 2017 - 24-26th April ● Anonymous reviewer ratings ○ 9 ○ 6 ○ 6 https://openreview.net/forum?id=SJGCiw5gl
  • 3. Optimizing neural networks Goal: running trained neural networks on mobile devices 1.Designing optimized networks from scratch 2.Optimizing pre-trained networks Deep Compression (Han et al.) 3
  • 4. Optimizing pre-trained neural networks Reasons for pruning pre-trained networks Transfer learning: fine-tuning an existing deep neural network previously trained on a larger related dataset results in higher accuracies Objectives of pruning: Improving the speed of inference Reducing the size of the trained model Better generalization 4
  • 5. Which parameters to be pruned? Saliency: A measure of importance Parameters with least saliency will be deleted “Magnitude equals saliency” Parameters with smaller magnitudes have low saliency Criteria for pruning Magnitude of weight a convolutional kernel with low l2 norm detects less important features than those with a high norm Magnitude of activation if an activation value is small, then this feature detector is not important for prediction task Pruning the parameters which has the least effect on the trained 5
  • 6. Which parameters to be pruned? Saliency: A measure of importance Parameters with least saliency will be deleted “Magnitude equals saliency” Parameters with smaller magnitudes have low saliency Criteria for pruning Magnitude of weight a convolutional kernel with low l2 norm detects less important features than those with a high norm Magnitude of activation if an activation value is small, then this feature detector is not important for prediction task Pruning the parameters which has the least effect on the trained 6
  • 7. Contributions of this paper New saliency measure based on the first-order Taylor expansion Significant reduction in floating point operations per second (FLOPs) without a significant loss in accuracy Oracle pruning as a general method to compare network pruning models 7
  • 8. Pruning as an optimization problem Find a subset of parameters which preserves the accuracy of the trained network Impractical to solve this combinatorial optimization problem for current networks ie: VGG-16 has 4,224 convolutional feature maps 8
  • 9. Taylor series approximation Taylor approximation used to approximate the change in the loss function from removing a particular parameter (hi) Parameters are assumed to be independent First order Taylor polynomial 9
  • 10. Optimal Brain Damage (Le Cun et al., 1990) Change of loss function approximated by second order Taylor polynomial 10 The effect of parameters are assumed to be independent Parameter pruning is performed once the training is converged OBD is 30 times slower than he proposed Taylor method for saliency calculation
  • 11. Experiments Data sets Flowers-102 Birds-200 ImageNet Implemented using Theano Layerwise l2-normalization FLOPs regularization Feature maps from different layers require different amounts of computation due to the number of input feature maps and kernels 11
  • 12. Experiments... Compared against Oracle pruning: computing the effect of removal of each parameter and the one which has the least effect on the cost function is pruned at each iteration Optimal Brain Damage (OBD) Minimum weight Magnitude of activation Mean Standard deviation Average Percentage of Zeros (APoZ) : neurons with low average percentage of positive activations are pruned (Hu et al., 2016) 12 Feature maps at the first few layers have similar APoZ regardless of the network’s target
  • 13. Results Spearman rank against the oracle ranking calculated for each criterion 13
  • 14. Layerwise contribution to the loss Oracle pruning on VGG-16 trained on Birds-200 dataset Layers with max-pooling tend to be more important than those without (layers 2, 4, 7, 10, and 13) 14
  • 15. Importance of normalization across layers 15
  • 16. Pruning VGG-16 (Simonyan & Zisserman, 2015) 16 A network with 50% of the original parameters trained from scratch OBD Parameters FLOPs ● Pruning of feature maps in VGG-16 trained on the Birds-200 dataset (30 mini-batch SGD updates after pruning a feature map)
  • 17. Pruning AlexNet (Krizhevsky et al., 2012) ● Pruning of feature maps in AlexNet trained on the Flowers-102 dataset (10 mini-batch SGD updates after pruning a feature map) 17
  • 18. Speedup of networks pruned by Taylor criterion 18 ● All experiments performed in Theano with cuDNN v5.1.0
  • 19. Conclusion An efficient saliency measure to decide which parameters can be pruned without a significant loss of accuracy Provides a thorough evaluation of many aspects of network pruning A theoretical explanation about how the gradient contains information about the magnitude of the activations is needed 19
  • 20. References [1] Molchanov, Pavlo, et al. "Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning." arXiv preprint arXiv:1611.06440 (2016). [2] Hengyuan Hu, Rui Peng, Yu-Wing Tai, and Chi-Keung Tang. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250, 2016 [3] S. Han, H. Mao, and W. J. Dally, “Deep Compression - Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding,” Int. Conf. Learn. Represent., pp. 1–13, 2016. [4] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015 [5] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012. 20

Editor's Notes

  1. 6000 training images, 5700 test images, 200 species. Learning rate 0.0001, epochs 60