PERFORMANCE PREDICTION OF GPU-BASED DEEP LEARNING APPLICATIONS

1. POLITECNICO DI MILANOSlide 1 PERFORMANCE PREDICTION OF GPU-BASED DEEP LEARNING APPLICATIONS Eugenio Gianniti1, Li Zhang2, and Danilo Ardagna1 1Politecnico di Milano, 2IBM Research

2. Introduction & Motivation POLITECNICO DI MILANOSlide 2 Deep learning is widely applied across industries GPU as a service: $200-million market in 2016 with a 30% CAGR Model learning greatly benefits from GPUs Goal: performance models with good generality

3. Convolutional Neural Networks POLITECNICO DI MILANOSlide 3 CNNs are characterized by their architecture and hyper-parameters

4. Contributions POLITECNICO DI MILANOSlide 4 Per Layer Models End to End Models

5. Layer Computational Complexity POLITECNICO DI MILANOSlide 5 Derived from architecture and hyper-parameters Layer Forward Backward Conv HfWfCinCout (2HfWfCin + 1) Cout FC HinWinCinCout 2HinWinCinCout Loss 4Cout 1 Cout + 1 Norm 5Cout + Cn 2 8Cout + Cn 1 Pool HfWfCout (HfWf + 1) Cout ReLU 3Cout 4Cout

6. Per Layer Models POLITECNICO DI MILANOSlide 6 Learn linear regression models per layer Sum terms according to CNN architecture tl = 0l + 1lcl + "l ˆtcnn = I X l2L ˆtl

7. End to End Models POLITECNICO DI MILANOSlide 7 Extract from historical data the execution time of the network in its entirety Apply linear regression Vary batch sizes and number of iterations Affine dependency

8. Experimental Settings POLITECNICO DI MILANOSlide 8 Several well-known convolutional neural networks ImageNet dataset (ILSVRC12) AlexNet GoogLeNet VGG-16 Intel Xeon 10-core processor and NVIDIA Quadro M6000 3,072 CUDA cores 12 GB dedicated RAM, 317 GB/s bandwidth 7.0 TFLOPS in single precision

9. GoogLeNet Pooling Layers POLITECNICO DI MILANOSlide 9 Large errors with only one model for pooling layers 50 100 150 200 250 300 0 0 5 10 15 20 Millions of operations Time[ms]

10. GoogLeNet Pooling Layers POLITECNICO DI MILANOSlide 10 S = 1 50 100 150 200 250 300 0 0 5 10 15 20 Millions of operations Time[ms] 50 100 150 200 250 300 0 0 5 10 15 20 Millions of operations Time[ms] S > 1

11. Linear Regression Coefficients POLITECNICO DI MILANOSlide 11 Models obtained from GoogLeNet timing data Category ˆfw 0 [ms] ˆfw 1 h ms op i ˆbw 0 [ms] ˆbw 1 h ms op i Conv/FC 1.83e 1 3.43e 10 3.15e 1 3.65e 10 Norm 1.64e 2 7.11e 9 1.01e 1 6.87e 9 Pool S = 1 2.23e 2 1.27e 8 1.52e 1 1.93e 8 Pool S > 1 1.44e 2 1.45e 8 6.11e 3 5.67e 8 ReLU/Drop 8.91e 3 1.17e 8 1.18e 2 1.33e 8

12. Actual vs Fitted Plot – Per Layer POLITECNICO DI MILANOSlide 12 AlexNet model, AlexNet prediction 0 5 10 15 20 0 5 10 15 20 Predicted [s] Actual[s]

13. Actual vs Fitted Plot – Per Layer POLITECNICO DI MILANOSlide 13 Eugenio Gianniti GoogLeNet model, AlexNet prediction 0 5 10 15 20 25 0 5 10 15 20 25 Predicted [s] Actual[s]

14. Actual vs Fitted Plot – End to End POLITECNICO DI MILANOSlide 14 Eugenio Gianniti VGG model, VGG prediction 0 0 500 500 1000 1000 1500 1500 2000 2000 2500 2500 Predicted [s] Actual[s] VGG-16

15. End to End Models Extrapolation Capabilities POLITECNICO DI MILANOSlide 15 Eugenio Gianniti Iterations Batch size Test Train Pred. 0 20 40 60 Executiontime[s] 80 100 120 140 2000 2500 70 500 1000 1500 0 00 0 10 20 20 30 40 40 50 60 60 AlexNet — ntrain = 6

16. Conclusions & Future Work POLITECNICO DI MILANOSlide 16 Capacity planning based on performance prediction Accurate per layer model with good generalization capability Optimal job scheduling relying on performance models End to end model linked to the specific CNN, resulting in different accuracy/generality trade-offs Multi-GPU, multi-framework modeling

17. The End POLITECNICO DI MILANOSlide 17 Thanks for your attention!

18. 18 Batch Size Extr. GPUs number Extr. Comput. Power Extr. Extr. Inner Modules number Multi-GPU results Slide 18 • Performance prediction goals: • Extrapolation on batch size • Exploitation of new hardware • Training of new versions of a CNN

PERFORMANCE PREDICTION OF GPU-BASED DEEP LEARNING APPLICATIONS

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (10)

Ähnlich wie PERFORMANCE PREDICTION OF GPU-BASED DEEP LEARNING APPLICATIONS

Ähnlich wie PERFORMANCE PREDICTION OF GPU-BASED DEEP LEARNING APPLICATIONS (20)

Mehr von ATMOSPHERE .

Mehr von ATMOSPHERE . (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

PERFORMANCE PREDICTION OF GPU-BASED DEEP LEARNING APPLICATIONS