Paper review: "NISP: Pruning Networks using Neural Importance Score Propagation"
Presented at Tensorflow-KR paper review forum (#PR12) by Taesu Kim
Paper link: https://arxiv.org/abs/1711.05908
Video link: https://youtu.be/3KoqN_yYhmI (in Korean)
What Are The Drone Anti-jamming Systems Technology?
PR12-193 NISP: Pruning Networks using Neural Importance Score Propagation
1. PR-12 presentation
NISP: Pruning Networks using Neuron Importance Score Propagation
CVPR2018
Authors: Ruichi Yu et al
Presented by Taesu Kim
2. Motivation
• Pruning
• Previous approaches
• Focus on single layer or two layers’ statistics
• Greedy pruning
• Entire network is a whole
• Error propagates, especially when network is deep
3. Motivation
• Entire CNN is a set of feature extractors
• The final responses are the extracted features
• We measured the importance of the neurons across the entire CNN based on
a unified goal
• Minimizing the reconstruction errors of (important) final responses
4. Approach
• Feature ranking on the final response layer
• NISP: Neuron Importance Score Propagation
• Pruning network using NISP
• Fine-tune the pruned network
5. NISP: Objective function
• ! , !
• !
• a binary vector ! : neuron prune indicator for the l-th layer
• !
• ! , !
• ! , ! , !
6. Solution
• The network pruning problem can be formulated as a binary integer program
• Fining the optimal neuron prune indicator s
• It is hard to obtain efficient analytical solutions by directly optimizing the objective
• a sub-optimal solution can be obtained by minimizing the upper bound
• ! !
• Assume the activation function ! is Lipschitz continuous: Identity, ReLU, sigmoid, tanh, etc.
• Lipschitz continuous if ! , ! ,
10. Experiments
• Comparison with random pruning and training-from-scratch baseline
• randomly pruning the pre-trained CNN and then fine-tuning
• training a small CNN with the same number of neurons/filters per layer as our pruned model
• !
11. Experiments
• Feature selection vs. Magnitude of weights
• NISP-FS: using feature selection method in [34]
• NISP-Mag: considering only magnitude of weights
•
[34] Infinite feature selection. G. Roffo et al. ICCV 2015
13. Experiments
• Comparison with existing methods
[11] Acceleration through elimination of redundant convolutions, M. Figurnov et al, NIPS2016
[20] Compression of deep convolutional neural networks for fast and low power mobile applications,
Y. Kim et al, ICLR 2016
[36] Learning the architecture of deep neural networks, S. Srinivas et al, BMVC 2016
[25] Pruning filters for efficient convnets, H. Li et al, ICLR 2017
[29] Thinnet: A filter level pruning method for deep neural network compression, J.-H. Luo et al ICCV 2017
NISP-A: pruning all conv layers
NISP-B: pruning all conv layers except conv5
NISP-C: pruning all conv layers except conv5, conv4
NISP-D: pruning all conv layers except conv2, conv3, FC6
NISP-x-A: prune 15% filters of each layer
NISP-x-B: prune 25% filters of each layer
14. Conclusion
• Generic framework for network compression and acceleration based on identifying
the importance levels of neurons
• Neuron importance scores in the layer of interest are obtained by feature ranking
• Formulate the network pruning problem as a binary integer program
• Obtain a closed-form solution to a relaxed version of the formulation
• NISP algorithm propagates the importance to every neuron in the whole network
• It efficiently reduces CNN redundancy and achieves full-network acceleration and
compression