Rudrapur Call Girls 🥰 8617370543 Service Offer VIP Hot Model
PERFORMANCE PREDICTION OF GPU-BASED DEEP LEARNING APPLICATIONS
1. POLITECNICO DI MILANOSlide 1
PERFORMANCE PREDICTION OF
GPU-BASED DEEP LEARNING
APPLICATIONS
Eugenio Gianniti1, Li Zhang2, and
Danilo Ardagna1
1Politecnico di Milano, 2IBM Research
2. Introduction & Motivation
POLITECNICO DI MILANOSlide 2
Deep learning is widely applied across industries
GPU as a service: $200-million market in 2016 with a 30% CAGR
Model learning greatly benefits from GPUs
Goal: performance models with good generality
5. Layer Computational Complexity
POLITECNICO DI MILANOSlide 5
Derived from architecture and hyper-parameters
Layer Forward Backward
Conv HfWfCinCout (2HfWfCin + 1) Cout
FC HinWinCinCout 2HinWinCinCout
Loss 4Cout 1 Cout + 1
Norm 5Cout + Cn 2 8Cout + Cn 1
Pool HfWfCout (HfWf + 1) Cout
ReLU 3Cout 4Cout
6. Per Layer Models
POLITECNICO DI MILANOSlide 6
Learn linear regression models per layer
Sum terms according to CNN architecture
tl = 0l + 1lcl + "l
ˆtcnn
= I
X
l2L
ˆtl
7. End to End Models
POLITECNICO DI MILANOSlide 7
Extract from historical data the execution time of the network in its
entirety
Apply linear regression
Vary batch sizes and number of iterations
Affine dependency
8. Experimental Settings
POLITECNICO DI MILANOSlide 8
Several well-known convolutional neural networks
ImageNet dataset (ILSVRC12)
AlexNet
GoogLeNet
VGG-16
Intel Xeon 10-core processor and NVIDIA Quadro M6000
3,072 CUDA cores
12 GB dedicated RAM, 317 GB/s bandwidth
7.0 TFLOPS in single precision
9. GoogLeNet Pooling Layers
POLITECNICO DI MILANOSlide 9
Large errors with only one model for pooling layers
50 100 150 200 250 300
0
0
5
10
15
20
Millions of operations
Time[ms]
10. GoogLeNet Pooling Layers
POLITECNICO DI MILANOSlide 10
S = 1
50 100 150 200 250 300
0
0
5
10
15
20
Millions of operations
Time[ms]
50 100 150 200 250 300
0
0
5
10
15
20
Millions of operations
Time[ms]
S > 1
11. Linear Regression Coefficients
POLITECNICO DI MILANOSlide 11
Models obtained from GoogLeNet timing data
Category ˆfw
0 [ms] ˆfw
1
h
ms
op
i
ˆbw
0 [ms] ˆbw
1
h
ms
op
i
Conv/FC 1.83e 1 3.43e 10 3.15e 1 3.65e 10
Norm 1.64e 2 7.11e 9 1.01e 1 6.87e 9
Pool S = 1 2.23e 2 1.27e 8 1.52e 1 1.93e 8
Pool S > 1 1.44e 2 1.45e 8 6.11e 3 5.67e 8
ReLU/Drop 8.91e 3 1.17e 8 1.18e 2 1.33e 8
12. Actual vs Fitted Plot – Per Layer
POLITECNICO DI MILANOSlide 12
AlexNet model, AlexNet prediction
0 5 10 15 20
0
5
10
15
20
Predicted [s]
Actual[s]
13. Actual vs Fitted Plot – Per Layer
POLITECNICO DI MILANOSlide 13 Eugenio Gianniti
GoogLeNet model, AlexNet prediction
0 5 10 15 20 25
0
5
10
15
20
25
Predicted [s]
Actual[s]
14. Actual vs Fitted Plot – End to End
POLITECNICO DI MILANOSlide 14 Eugenio Gianniti
VGG model, VGG prediction
0
0
500
500
1000
1000
1500
1500
2000
2000
2500
2500
Predicted [s]
Actual[s]
VGG-16
16. Conclusions & Future Work
POLITECNICO DI MILANOSlide 16
Capacity planning based on performance prediction
Accurate per layer model with good generalization capability
Optimal job scheduling relying on performance models
End to end model linked to the specific CNN, resulting in different
accuracy/generality trade-offs
Multi-GPU, multi-framework modeling
18. 18
Batch Size Extr.
GPUs number Extr.
Comput.
Power Extr.
Extr. Inner Modules number
Multi-GPU results
Slide 18
• Performance prediction goals:
• Extrapolation on batch size
• Exploitation of new hardware
• Training of new versions of a CNN