Coefficient of Thermal Expansion and their Importance.pptx
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks
1. MorphNet: Fast & Simple Resource-Constrained Structure
Learning of Deep Networks
주성훈, 삼성SDS AI선행연구Lab.
2019. 8. 11.
PR-187
Ariel Gordon1, Elad Eban1, Ofir Nachum2, Bo Chen1, Hao Wu1, Tien-Ju Yang1,3, and Edward Choi1,4
1 Google Research
2 Google Brain
3 Energy-efficient multimedia systems group, MIT
4 Georgia Institute of Technology
CVPR 2018
3. 1. Research Background
Introduction
2/20
• What & Why - Automated architecture search
- Automating structure design in DNNs is an active research field that is gaining significance as DNNs become
more ubiquitous in a variety of applications and platforms.
• Previous works
- Sparsifying regularizers (pruning, L1 regularization on weight matrices, designed regularizers
They do not target reduction of a particular resource (e.g., the number of floating point operations, or FLOPs, per inference).
- Optimizing every aspect of network structure (from trial-and-error attemps – RL, evolutionary
algorithm)
but, this methods require months or years of GPU time
• Automatic neural network architecture design is currently effective only under limited
conditions and given knowledge of the right tool to use.
Authors propose a simple and general technique for resource-constrained optimization of DNN architectures.
4. 1. Research Background 3/20
Three advantages :
• (1) it is scalable to large models and large datasets
• (2) it can optimize a DNN structure targeting a specific resource (FLOPs, model size).
• (3) it can learn a structures that improves performance while reducing the targeted resource usage.
MorphNet takes an existing neural network as input and produces a new
neural network that is smaller, faster, and yields better performance tailored
to a new problem.
Objective
6. 2. Methods
Problem setup
5/20
• To optimize over the output widths of all layers
Seed network
Constraint (FLOPs, model size)
Output channels of layer L
a loss measuring a combination of how well the neural network fits the data
and any additional regularization terms
7. 2. Methods
A cycle of shrinking and expanding phases
6/20
Step 1-2: Shrinking
Consuming resource ↓
Performance ↓
(Sparsifying regularizer)
Step 3 : Expansion
(width multiplier)
8. 2. Methods
Two types of constraints
7/20
Model size constraint
FLOPs constraint
Input spatial dimensions : w,x
Output spatial dimensions : y, z
Filter dimensions : f, g
9. 2. Methods
1) Shrinking phase - Sparsifying regularizer (pruning)
8/20
• When shrinking a network, we wish to minimize the loss of the DNN subject to a constraint.
L0 norm : Corresponds to the total number of nonzero elements in a vector.
It is necessary to replace the discontinuous L0 norm with a continuous proxy norm.
Differentiable almost everywhere!
"Batch normalization: Accelerating deep network training by reducing internal covariate shift." (2015).
10. 2. Methods
2) Expanding phase - Width multiplier
9/20
• Uniformly expand all layer sizes
- "Mobilenets: Efficient convolutional neural networks for mobile vision applications." (2017).
- Application of a width multiplier is essentially free.
- The approach suffers, however, with decreased quality of the initial network design.
12. 3. Experimental Results
The effect of the regularizer on the actual FLOPs during training
11/20
• Inception V2 on ImageNet
Seed network : Inception V2
A learning rate of 10-3 (constant in time)
13. 3. Experimental Results
Applying constraint with different strengths (different values of λ)
12/20
0.7×10-9
1.0×10-9
1.3×10-9
2.0×10-9
3.0×10-9
• The accuracy of the DNN can be improved while maintaining a constrained resource usage (FLOPs in
this case).
Seed network : Inception V2, trained on ImageNet
A learning rate of 10-3 (constant in time)
Baseline method : Naïve width multiplier
14. 3. Experimental Results
Improvement the accuracy by shrink-expansion
13/20
λ=1.3×10-9
• Although the number of FLOPs is constant, our method is capable of and chooses to increase the
number of weights in the model.
15. 3. Experimental Results
Improved Performance at No Cost
14/20
• Evaluation MAP (mean-average-precision) on various seeds and data.
* JTL dataset : 350M images and about 20K labels
* Two iteration
* AudioSet : 20M labelled audio segments
1) "CNN architectures for large-scale audio classification." 2017
1)
• Each result requires up to three training runs.
• The first training run is run until the convergence of the FLOPs cost, which is approximately 20 times
faster than the convergence of the performance metric (MAP).
16. 3. Experimental Results
The regularizer methodically targets a particular resource
15/20
AudioResNet-AudioSet
ResNet 101-JFT
FLOP regularizer Size regularizer
17. • FLOP regularizer tends to remove neurons from the lower layers near the input,
whereas the model size regularizer tends to remove neurons from upper layers near the output.
3. Experimental Results
The regularizer methodically targets a particular resource
16/20
MAP : 0.405 MAP : 0.428 MAP : 0.421
18. 3. Experimental Results
Morphing Networks
17/20
https://ai.googleblog.com/2019/04/morphnet-towards-faster-and-smaller.html
• MorphNet can remove residual connections in ResNet-style networks,
and parallel towers in Inception-style networks.
19. 3. Experimental Results
The MorphNet is capable of generating pretty stable DNN architectures.
18/20
The relative standard deviation for FLOPs and test accuracy across 10 runs
are 1.12% and 0.208% respectively.
21. 4. Conclusions 20/20
• Furthermore, we have applied MorphNet to large scale
problems to achieve improvements over human-designed
DNN structures, with little extra training cost compared to
training the DNN once.
Thank you.
• MorphNet can successfully navigate this tradeoff when
targeting either FLOPs or model size.
• As future work, the iterative process of shrinking and
expanding easily lends itself to optimizing over other aspects
(not width of layer output) of network design.