SlideShare a Scribd company logo
1 of 30
Download to read offline
© 2021 Xailient
Introduction to DNN
Model Compression
Techniques
Sabina Pokhrel
Xailient
© 2021 Xailient
Outline of the Talk
• Need for model compression
• Introduction to model compression
• Different model compression techniques:
• Pruning
• Quantization
• Knowledge distillation
• Low-rank factorization
2
© 2021 Xailient
History Of Deep Neural Networks
3
AlexNet achieves a very large
margin of improvement with
deeper network than before.
The number of layers is
continuously increasing as the
accuracy increases.
ResNet conquers the barrier of
training network over 100 layers,
much deeper than previous
winners.
The ImageNet ILSVRC challenge results (top-5 error (%)) from
2010 to 2015.
© 2021 Xailient
Need for Model Compression
• Many real-world applications demand real-time, on-device
processing capabilities.
• Deep learning models that perform well are big.
• Inference on edge device.
• Difficult to deploy on resource constrained devices.
4
© 2021 Xailient
Introduction to Model Compression
• Technique to reducing size of neural network without compromising
too much on accuracy.
• Resulting models are both memory and energy efficient.
• Different model compression techniques:
• Pruning
• Quantization
• Knowledge distillation
• Low-rank factorization
5
© 2021 Xailient
Pruning
• Compresses the model to a smaller size
with zero or marginal loss of accuracy.
• Reduces the number of parameters by
removing redundant, and unimportant
connections that do not affect model
accuracy.
• Results in compressed neural networks
that run faster.
• Reduces the computational cost
involved in network training.
• Reduces the overall model size, saves
computation time and energy.
6
© 2021 Xailient
Pruning - Types
7
Weight Pruning
© 2021 Xailient
Pruning - Types
8
Neuron Pruning
© 2021 Xailient
Pruning - Types
9
Filter Pruning
© 2021 Xailient
Pruning - Types
10
Layer Pruning
© 2021 Xailient
Pruning – Speed and Size Tradeoffs
11
Size and speed vs accuracy
trade-offs for different
pruning methods and
families of architectures.
Pruned models sometimes
outperform the original
architecture, but rarely
outperform a better
architecture.
© 2021 Xailient
Pruning - Summary
• Can be applied during or after training.
• Can be applied to both convolutional and fully connected layers.
• Both the original and pruned model have the same architecture,
with the pruned model being sparser (weights with the low
magnitude being set to zeros).
• Pruned models sometimes outperform the original architecture, but
rarely outperform a better architecture.
12
© 2021 Xailient
Quantization
13
© 2021 Xailient
Quantization
• Compresses the original network by reducing the number of bits
required to represent each weight.
• The weights can be quantized to 16-bit, 8-bit, 4-bit or even with 1-bit.
• Per-channel quantization and/or per-layer quantization of weights.
• By reducing the number of bits used, the size of the deep neural
network can be significantly reduced.
• Reduces inference time for hardware platforms supporting lower
precision arithmetic
14
© 2021 Xailient
Quantization - Types
• There are two forms of quantization:
• Post-training quantization
• Quantization aware training
15
Comparison of Quantized and non-quantized models on ImageNet
© 2021 Xailient
Quantization - Summary
• Quantization can be applied both during and after training.
• Can be applied to both convolutional and fully connected layers.
• Quantized weights make neural networks harder to converge. A
smaller learning rate is needed to ensure the network to have good
performance.
• Quantized weights make back-propagation infeasible since gradient
cannot back-propagate through discrete neurons. Approximation
methods are needed to estimate the gradients of the loss function
with respect to the input of the discrete neurons.
16
© 2021 Xailient
Comparison of Compression Techniques
17
Table Comparison of Different DNN compression techniques on the ImageNet dataset and standard pre-trained models
© 2021 Xailient
Knowledge Distillation
18
© 2021 Xailient
Knowledge Distillation
19
• In knowledge distillation, a large, complex model is trained on a
large dataset.
• When this large model can generalize and perform well on unseen
data, the knowledge is transferred to a smaller network.
• The larger model is also known as the teacher model and the
smaller network is also known as the student network.
• In knowledge distillation, knowledge types, distillation strategies
and the teacher-student architectures play the crucial role in the
student learning.
© 2021 Xailient
Knowledge Distillation - Knowledge
• The activations, neurons or features of intermediate layers also can
be used as the knowledge to guide the learning of the student
model.
• Different forms of knowledge: response-based knowledge, feature-
based knowledge, and relation-based knowledge.
20
© 2021 Xailient
Knowledge Distillation - Strategies
21
Different knowledge
distillation strategies are:
• Offline distillation
• Online distillation
• Self-distillation
© 2021 Xailient
Low-rank Factorization
Low-rank factorization identifies
redundant parameters of deep
neural networks by employing the
matrix and tensor decomposition
A weight matrix A with m x n
dimension and having a rank r is
replaced by smaller dimension
matrices.
This technique helps by
factorizing a large matrix into
smaller matrices.
22
© 2021 Xailient
Low-rank Factorization - Summary
• Can be applied during or after training.
• Can be applied to both convolutional and fully connected layers.
• When applied during training, can reduce training time.
• The factorization of the dense layer matrices reduces model size.
• Improves the speed up-to 30-50% compared to the full-rank matrix
representation.
23
© 2021 Xailient
Model Compression – Summary
• Model compression techniques are complementary to each other.
• Can be applied to pre-trained models, as a post-processing step to
reduce your model size and increase inference speed.
• Can be applied during training time as well.
• Pruning and quantization have now been baked into machine
learning frameworks such as Tensorflow and PyTorch.
• There are many more compression approaches beyond the four
common ones covered in this talk, such as weight sharing-
based model compression, structural matrix, transferred filters, and
compact filters.
24
© 2021 Xailient
Additional Compression Techniques -
Resources
25
Transferred filters
• S. Zhai, Y. Cheng, and Z. M. Zhang, “Doubly convolutional neural
networks,” in Proc. Advances Neural Information Processing
Systems, 2016, pp. 1082–1090.
• W. Shang, K. Sohn, D. Almeida, and H. Lee, “Understanding and
improving convolutional neural networks via concatenated
rectified linear units,” arXiv Preprint, arXiv:1603.05201, 2016.
Compact filters
• S. Dieleman, J. D Fauw, and K. Kavukcuoglu, “Exploiting cyclic
symmetry in convolutional neural networks,” in Proc. 33rd Int. Conf.
Machine Learning, 2016, vol. 48, pp. 1889–1898
© 2021 Xailient
Additional Compression Techniques -
Resources
26
Structural matrix
• Y. Cheng, F. X. Yu, R. Feris, S. Kumar, A. Choudhary, and S.-F. Chang,
“An exploration of parameter redundancy in deep networks with
circulant projections,” in Proc. Int. Conf. Computer Vision, 2015, pp.
2857–2865.
• Z. Yang, M. Moczulski, M. Denil, N. de Freitas, A. Smola, L. Song, and
Z. Wang, “Deep fried convnets,” in Proc. Int. Conf. Computer Vision,
2015, pp. 1476– 1483. [32] V. Sindhwani, T. Sainath, and S. Kumar.
(2015). Structured transforms for small-footprint deep learning.
Advances in Neural Information Processing Systems, 28, pp. 3088–
3096. [Online]. Available: http://papers.nips.cc/paper/5869-
structured-transforms-for-small-footprint-deep-learning.pdf
© 2021 Xailient
Additional Compression Techniques -
Resources
27
Weight-sharing based compression
• K. Ullrich, E. Meeds, and M. Welling, “Soft weight-sharing for neural
network compression,” Computing Res. Repository, vol.
abs/1702.04008, 2017. [Online]. Available:
https://arxiv.org/abs/1702.04008
© 2021 Xailient
Resource Slide - 1
28
• Cheng, Yu, et al. "A survey of model compression and acceleration for
deep neural networks." arXiv preprint arXiv:1710.09282 (2017).
• Choudhary, Tejalal, et al. "A comprehensive survey on model
compression and acceleration." Artificial Intelligence Review (2020): 1-
43.
• He, Kaiming, et al. "Deep residual learning for image recognition."
Proceedings of the IEEE conference on computer vision and pattern
recognition. 2016.
• Zhu, Michael, and Suyog Gupta. "To prune, or not to prune: exploring
the efficacy of pruning for model compression." arXiv preprint
arXiv:1710.01878 (2017).
© 2021 Xailient
Resource Slide - 2
29
• Blalock, D., Ortiz, J. J. G., Frankle, J., & Guttag, J. (2020). What is the
state of neural network pruning?. arXiv preprint arXiv:2003.03033.
• Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the
knowledge in a neural network." arXiv preprint arXiv:1503.02531 (2015).
• Gou, Jianping, et al. "Knowledge distillation: A survey." International
Journal of Computer Vision (2021): 1-31.
• Yaguchi, Atsushi, et al. "Scalable deep neural networks via low-rank
matrix factorization." (2019).
© 2021 Xailient
Resource Slide – 3
30
• https://towardsdatascience.com/model-compression-via-pruning-
ac9b730a7c7b
• https://www.tensorflow.org/model_optimization/guide/quantization/tr
aining
• https://towardsdatascience.com/distilling-knowledge-in-neural-
network-d8991faa2cdc
• https://intellabs.github.io/distiller/knowledge_distillation.html
• https://medium.com/@RaghavPrabhu/cnn-architectures-lenet-
alexnet-vgg-googlenet-and-resnet-7c81c017b848

More Related Content

What's hot

Introduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowIntroduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlow
Sri Ambati
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
Simplilearn
 

What's hot (20)

Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
 
Introduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowIntroduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlow
 
Attention in Deep Learning
Attention in Deep LearningAttention in Deep Learning
Attention in Deep Learning
 
LSTM Basics
LSTM BasicsLSTM Basics
LSTM Basics
 
Deep learning
Deep learningDeep learning
Deep learning
 
CNN Tutorial
CNN TutorialCNN Tutorial
CNN Tutorial
 
Simple Introduction to AutoEncoder
Simple Introduction to AutoEncoderSimple Introduction to AutoEncoder
Simple Introduction to AutoEncoder
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
Deep Learning
Deep Learning Deep Learning
Deep Learning
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
 
Lstm
LstmLstm
Lstm
 
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
 
Resnet
ResnetResnet
Resnet
 
Intro to deep learning
Intro to deep learning Intro to deep learning
Intro to deep learning
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)
 
CNN Algorithm
CNN AlgorithmCNN Algorithm
CNN Algorithm
 

Similar to “Introduction to DNN Model Compression Techniques,” a Presentation from Xailient

Unit one ppt of deeep learning which includes Ann cnn
Unit one ppt of  deeep learning which includes Ann cnnUnit one ppt of  deeep learning which includes Ann cnn
Unit one ppt of deeep learning which includes Ann cnn
kartikaursang53
 
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
Edge AI and Vision Alliance
 
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al..."Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
Edge AI and Vision Alliance
 

Similar to “Introduction to DNN Model Compression Techniques,” a Presentation from Xailient (20)

Towards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networksTowards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networks
 
How data science works and how can customers help
How data science works and how can customers helpHow data science works and how can customers help
How data science works and how can customers help
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
 
Mnist soln
Mnist solnMnist soln
Mnist soln
 
Finding the best solution for Image Processing
Finding the best solution for Image ProcessingFinding the best solution for Image Processing
Finding the best solution for Image Processing
 
Computer Design Concepts for Machine Learning
Computer Design Concepts for Machine LearningComputer Design Concepts for Machine Learning
Computer Design Concepts for Machine Learning
 
Deep learning health care
Deep learning health care  Deep learning health care
Deep learning health care
 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention Networks
 
State-of-the-art Image Processing across all domains
State-of-the-art Image Processing across all domainsState-of-the-art Image Processing across all domains
State-of-the-art Image Processing across all domains
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
 
Unit one ppt of deeep learning which includes Ann cnn
Unit one ppt of  deeep learning which includes Ann cnnUnit one ppt of  deeep learning which includes Ann cnn
Unit one ppt of deeep learning which includes Ann cnn
 
AlexNet
AlexNetAlexNet
AlexNet
 
ET Ch - 2.pptx
ET Ch - 2.pptxET Ch - 2.pptx
ET Ch - 2.pptx
 
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
 
A Survey on Image Processing using CNN in Deep Learning
A Survey on Image Processing using CNN in Deep LearningA Survey on Image Processing using CNN in Deep Learning
A Survey on Image Processing using CNN in Deep Learning
 
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al..."Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
 
Deep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problemsDeep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problems
 
deep neural networkmodel implementation over homomorphically encrypted data
deep neural networkmodel implementation over homomorphically encrypted datadeep neural networkmodel implementation over homomorphically encrypted data
deep neural networkmodel implementation over homomorphically encrypted data
 

More from Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
Edge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient

  • 1. © 2021 Xailient Introduction to DNN Model Compression Techniques Sabina Pokhrel Xailient
  • 2. © 2021 Xailient Outline of the Talk • Need for model compression • Introduction to model compression • Different model compression techniques: • Pruning • Quantization • Knowledge distillation • Low-rank factorization 2
  • 3. © 2021 Xailient History Of Deep Neural Networks 3 AlexNet achieves a very large margin of improvement with deeper network than before. The number of layers is continuously increasing as the accuracy increases. ResNet conquers the barrier of training network over 100 layers, much deeper than previous winners. The ImageNet ILSVRC challenge results (top-5 error (%)) from 2010 to 2015.
  • 4. © 2021 Xailient Need for Model Compression • Many real-world applications demand real-time, on-device processing capabilities. • Deep learning models that perform well are big. • Inference on edge device. • Difficult to deploy on resource constrained devices. 4
  • 5. © 2021 Xailient Introduction to Model Compression • Technique to reducing size of neural network without compromising too much on accuracy. • Resulting models are both memory and energy efficient. • Different model compression techniques: • Pruning • Quantization • Knowledge distillation • Low-rank factorization 5
  • 6. © 2021 Xailient Pruning • Compresses the model to a smaller size with zero or marginal loss of accuracy. • Reduces the number of parameters by removing redundant, and unimportant connections that do not affect model accuracy. • Results in compressed neural networks that run faster. • Reduces the computational cost involved in network training. • Reduces the overall model size, saves computation time and energy. 6
  • 7. © 2021 Xailient Pruning - Types 7 Weight Pruning
  • 8. © 2021 Xailient Pruning - Types 8 Neuron Pruning
  • 9. © 2021 Xailient Pruning - Types 9 Filter Pruning
  • 10. © 2021 Xailient Pruning - Types 10 Layer Pruning
  • 11. © 2021 Xailient Pruning – Speed and Size Tradeoffs 11 Size and speed vs accuracy trade-offs for different pruning methods and families of architectures. Pruned models sometimes outperform the original architecture, but rarely outperform a better architecture.
  • 12. © 2021 Xailient Pruning - Summary • Can be applied during or after training. • Can be applied to both convolutional and fully connected layers. • Both the original and pruned model have the same architecture, with the pruned model being sparser (weights with the low magnitude being set to zeros). • Pruned models sometimes outperform the original architecture, but rarely outperform a better architecture. 12
  • 14. © 2021 Xailient Quantization • Compresses the original network by reducing the number of bits required to represent each weight. • The weights can be quantized to 16-bit, 8-bit, 4-bit or even with 1-bit. • Per-channel quantization and/or per-layer quantization of weights. • By reducing the number of bits used, the size of the deep neural network can be significantly reduced. • Reduces inference time for hardware platforms supporting lower precision arithmetic 14
  • 15. © 2021 Xailient Quantization - Types • There are two forms of quantization: • Post-training quantization • Quantization aware training 15 Comparison of Quantized and non-quantized models on ImageNet
  • 16. © 2021 Xailient Quantization - Summary • Quantization can be applied both during and after training. • Can be applied to both convolutional and fully connected layers. • Quantized weights make neural networks harder to converge. A smaller learning rate is needed to ensure the network to have good performance. • Quantized weights make back-propagation infeasible since gradient cannot back-propagate through discrete neurons. Approximation methods are needed to estimate the gradients of the loss function with respect to the input of the discrete neurons. 16
  • 17. © 2021 Xailient Comparison of Compression Techniques 17 Table Comparison of Different DNN compression techniques on the ImageNet dataset and standard pre-trained models
  • 18. © 2021 Xailient Knowledge Distillation 18
  • 19. © 2021 Xailient Knowledge Distillation 19 • In knowledge distillation, a large, complex model is trained on a large dataset. • When this large model can generalize and perform well on unseen data, the knowledge is transferred to a smaller network. • The larger model is also known as the teacher model and the smaller network is also known as the student network. • In knowledge distillation, knowledge types, distillation strategies and the teacher-student architectures play the crucial role in the student learning.
  • 20. © 2021 Xailient Knowledge Distillation - Knowledge • The activations, neurons or features of intermediate layers also can be used as the knowledge to guide the learning of the student model. • Different forms of knowledge: response-based knowledge, feature- based knowledge, and relation-based knowledge. 20
  • 21. © 2021 Xailient Knowledge Distillation - Strategies 21 Different knowledge distillation strategies are: • Offline distillation • Online distillation • Self-distillation
  • 22. © 2021 Xailient Low-rank Factorization Low-rank factorization identifies redundant parameters of deep neural networks by employing the matrix and tensor decomposition A weight matrix A with m x n dimension and having a rank r is replaced by smaller dimension matrices. This technique helps by factorizing a large matrix into smaller matrices. 22
  • 23. © 2021 Xailient Low-rank Factorization - Summary • Can be applied during or after training. • Can be applied to both convolutional and fully connected layers. • When applied during training, can reduce training time. • The factorization of the dense layer matrices reduces model size. • Improves the speed up-to 30-50% compared to the full-rank matrix representation. 23
  • 24. © 2021 Xailient Model Compression – Summary • Model compression techniques are complementary to each other. • Can be applied to pre-trained models, as a post-processing step to reduce your model size and increase inference speed. • Can be applied during training time as well. • Pruning and quantization have now been baked into machine learning frameworks such as Tensorflow and PyTorch. • There are many more compression approaches beyond the four common ones covered in this talk, such as weight sharing- based model compression, structural matrix, transferred filters, and compact filters. 24
  • 25. © 2021 Xailient Additional Compression Techniques - Resources 25 Transferred filters • S. Zhai, Y. Cheng, and Z. M. Zhang, “Doubly convolutional neural networks,” in Proc. Advances Neural Information Processing Systems, 2016, pp. 1082–1090. • W. Shang, K. Sohn, D. Almeida, and H. Lee, “Understanding and improving convolutional neural networks via concatenated rectified linear units,” arXiv Preprint, arXiv:1603.05201, 2016. Compact filters • S. Dieleman, J. D Fauw, and K. Kavukcuoglu, “Exploiting cyclic symmetry in convolutional neural networks,” in Proc. 33rd Int. Conf. Machine Learning, 2016, vol. 48, pp. 1889–1898
  • 26. © 2021 Xailient Additional Compression Techniques - Resources 26 Structural matrix • Y. Cheng, F. X. Yu, R. Feris, S. Kumar, A. Choudhary, and S.-F. Chang, “An exploration of parameter redundancy in deep networks with circulant projections,” in Proc. Int. Conf. Computer Vision, 2015, pp. 2857–2865. • Z. Yang, M. Moczulski, M. Denil, N. de Freitas, A. Smola, L. Song, and Z. Wang, “Deep fried convnets,” in Proc. Int. Conf. Computer Vision, 2015, pp. 1476– 1483. [32] V. Sindhwani, T. Sainath, and S. Kumar. (2015). Structured transforms for small-footprint deep learning. Advances in Neural Information Processing Systems, 28, pp. 3088– 3096. [Online]. Available: http://papers.nips.cc/paper/5869- structured-transforms-for-small-footprint-deep-learning.pdf
  • 27. © 2021 Xailient Additional Compression Techniques - Resources 27 Weight-sharing based compression • K. Ullrich, E. Meeds, and M. Welling, “Soft weight-sharing for neural network compression,” Computing Res. Repository, vol. abs/1702.04008, 2017. [Online]. Available: https://arxiv.org/abs/1702.04008
  • 28. © 2021 Xailient Resource Slide - 1 28 • Cheng, Yu, et al. "A survey of model compression and acceleration for deep neural networks." arXiv preprint arXiv:1710.09282 (2017). • Choudhary, Tejalal, et al. "A comprehensive survey on model compression and acceleration." Artificial Intelligence Review (2020): 1- 43. • He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. • Zhu, Michael, and Suyog Gupta. "To prune, or not to prune: exploring the efficacy of pruning for model compression." arXiv preprint arXiv:1710.01878 (2017).
  • 29. © 2021 Xailient Resource Slide - 2 29 • Blalock, D., Ortiz, J. J. G., Frankle, J., & Guttag, J. (2020). What is the state of neural network pruning?. arXiv preprint arXiv:2003.03033. • Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the knowledge in a neural network." arXiv preprint arXiv:1503.02531 (2015). • Gou, Jianping, et al. "Knowledge distillation: A survey." International Journal of Computer Vision (2021): 1-31. • Yaguchi, Atsushi, et al. "Scalable deep neural networks via low-rank matrix factorization." (2019).
  • 30. © 2021 Xailient Resource Slide – 3 30 • https://towardsdatascience.com/model-compression-via-pruning- ac9b730a7c7b • https://www.tensorflow.org/model_optimization/guide/quantization/tr aining • https://towardsdatascience.com/distilling-knowledge-in-neural- network-d8991faa2cdc • https://intellabs.github.io/distiller/knowledge_distillation.html • https://medium.com/@RaghavPrabhu/cnn-architectures-lenet- alexnet-vgg-googlenet-and-resnet-7c81c017b848