SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
POLITECNICO DI MILANOSlide 1
PERFORMANCE PREDICTION OF
GPU-BASED DEEP LEARNING
APPLICATIONS
Eugenio Gianniti1, Li Zhang2, and
Danilo Ardagna1
1Politecnico di Milano, 2IBM Research
Introduction & Motivation
POLITECNICO DI MILANOSlide 2
Deep learning is widely applied across industries
GPU as a service: $200-million market in 2016 with a 30% CAGR
Model learning greatly benefits from GPUs
Goal: performance models with good generality
Convolutional Neural Networks
POLITECNICO DI MILANOSlide 3
CNNs are characterized by their architecture and hyper-parameters
Contributions
POLITECNICO DI MILANOSlide 4
Per Layer Models
End to End Models
Layer Computational Complexity
POLITECNICO DI MILANOSlide 5
Derived from architecture and hyper-parameters
Layer Forward Backward
Conv HfWfCinCout (2HfWfCin + 1) Cout
FC HinWinCinCout 2HinWinCinCout
Loss 4Cout 1 Cout + 1
Norm 5Cout + Cn 2 8Cout + Cn 1
Pool HfWfCout (HfWf + 1) Cout
ReLU 3Cout 4Cout
Per Layer Models
POLITECNICO DI MILANOSlide 6
Learn linear regression models per layer
Sum terms according to CNN architecture
tl = 0l + 1lcl + "l
ˆtcnn
= I
X
l2L
ˆtl
End to End Models
POLITECNICO DI MILANOSlide 7
Extract from historical data the execution time of the network in its
entirety
Apply linear regression
Vary batch sizes and number of iterations
Affine dependency
Experimental Settings
POLITECNICO DI MILANOSlide 8
Several well-known convolutional neural networks
ImageNet dataset (ILSVRC12)
AlexNet
GoogLeNet
VGG-16
Intel Xeon 10-core processor and NVIDIA Quadro M6000
3,072 CUDA cores
12 GB dedicated RAM, 317 GB/s bandwidth
7.0 TFLOPS in single precision
GoogLeNet Pooling Layers
POLITECNICO DI MILANOSlide 9
Large errors with only one model for pooling layers
50 100 150 200 250 300
0
0
5
10
15
20
Millions of operations
Time[ms]
GoogLeNet Pooling Layers
POLITECNICO DI MILANOSlide 10
S = 1
50 100 150 200 250 300
0
0
5
10
15
20
Millions of operations
Time[ms]
50 100 150 200 250 300
0
0
5
10
15
20
Millions of operations
Time[ms]
S > 1
Linear Regression Coefficients
POLITECNICO DI MILANOSlide 11
Models obtained from GoogLeNet timing data
Category ˆfw
0 [ms] ˆfw
1
h
ms
op
i
ˆbw
0 [ms] ˆbw
1
h
ms
op
i
Conv/FC 1.83e 1 3.43e 10 3.15e 1 3.65e 10
Norm 1.64e 2 7.11e 9 1.01e 1 6.87e 9
Pool S = 1 2.23e 2 1.27e 8 1.52e 1 1.93e 8
Pool S > 1 1.44e 2 1.45e 8 6.11e 3 5.67e 8
ReLU/Drop 8.91e 3 1.17e 8 1.18e 2 1.33e 8
Actual vs Fitted Plot – Per Layer
POLITECNICO DI MILANOSlide 12
AlexNet model, AlexNet prediction
0 5 10 15 20
0
5
10
15
20
Predicted [s]
Actual[s]
Actual vs Fitted Plot – Per Layer
POLITECNICO DI MILANOSlide 13 Eugenio Gianniti
GoogLeNet model, AlexNet prediction
0 5 10 15 20 25
0
5
10
15
20
25
Predicted [s]
Actual[s]
Actual vs Fitted Plot – End to End
POLITECNICO DI MILANOSlide 14 Eugenio Gianniti
VGG model, VGG prediction
0
0
500
500
1000
1000
1500
1500
2000
2000
2500
2500
Predicted [s]
Actual[s]
VGG-16
End to End Models Extrapolation Capabilities
POLITECNICO DI MILANOSlide 15 Eugenio Gianniti
Iterations Batch size
Test
Train
Pred.
0
20
40
60
Executiontime[s]
80
100
120
140
2000
2500
70
500
1000
1500
0
00
0
10
20
20
30
40
40
50
60
60
AlexNet — ntrain
= 6
Conclusions & Future Work
POLITECNICO DI MILANOSlide 16
Capacity planning based on performance prediction
Accurate per layer model with good generalization capability
Optimal job scheduling relying on performance models
End to end model linked to the specific CNN, resulting in different
accuracy/generality trade-offs
Multi-GPU, multi-framework modeling
The End
POLITECNICO DI MILANOSlide 17
Thanks
for your attention!
18
Batch Size Extr.
GPUs number Extr.
Comput.
Power Extr.
Extr. Inner Modules number
Multi-GPU results
Slide 18
• Performance prediction goals:
• Extrapolation on batch size
• Exploitation of new hardware
• Training of new versions of a CNN

Weitere ähnliche Inhalte

Was ist angesagt?

A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA...
A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA...A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA...
A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA...
NECST Lab @ Politecnico di Milano
 

Was ist angesagt? (10)

A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA...
A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA...A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA...
A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA...
 
3D mapping of a quarry
3D mapping of a quarry3D mapping of a quarry
3D mapping of a quarry
 
Magnetron sputtering equipment simulation
Magnetron sputtering equipment simulationMagnetron sputtering equipment simulation
Magnetron sputtering equipment simulation
 
ICP H2/NF3 remote-plasma simulation
ICP H2/NF3 remote-plasma simulationICP H2/NF3 remote-plasma simulation
ICP H2/NF3 remote-plasma simulation
 
Loader and Tester Swarming Drones for Cellular Phone Network Loading and Fiel...
Loader and Tester Swarming Drones for Cellular Phone Network Loading and Fiel...Loader and Tester Swarming Drones for Cellular Phone Network Loading and Fiel...
Loader and Tester Swarming Drones for Cellular Phone Network Loading and Fiel...
 
DC Hollow Discharge
DC Hollow DischargeDC Hollow Discharge
DC Hollow Discharge
 
Scale surface reconstruction
Scale surface reconstructionScale surface reconstruction
Scale surface reconstruction
 
Landset 8 的雲層去除技巧實作
Landset 8 的雲層去除技巧實作Landset 8 的雲層去除技巧實作
Landset 8 的雲層去除技巧實作
 
DSM Extraction from Pleiades Images using MICMAC
DSM Extraction from Pleiades Images using MICMAC DSM Extraction from Pleiades Images using MICMAC
DSM Extraction from Pleiades Images using MICMAC
 
What we got from the Predicting Red Hat Business Value competition
What we got from the Predicting Red Hat Business Value competitionWhat we got from the Predicting Red Hat Business Value competition
What we got from the Predicting Red Hat Business Value competition
 

Ähnlich wie PERFORMANCE PREDICTION OF GPU-BASED DEEP LEARNING APPLICATIONS

Inverted_Decoupling_2DoF_Internal_Model_Control_fo.pdf
Inverted_Decoupling_2DoF_Internal_Model_Control_fo.pdfInverted_Decoupling_2DoF_Internal_Model_Control_fo.pdf
Inverted_Decoupling_2DoF_Internal_Model_Control_fo.pdf
ssusere3c688
 
09.50 Ernst Vrolijks
09.50 Ernst Vrolijks09.50 Ernst Vrolijks
09.50 Ernst Vrolijks
Themadagen
 
COIFLET-BASED FUZZY-CLASSIFIER FOR DEFECT DETECTION IN INDUSTRIAL LNG/LPG TANKS
COIFLET-BASED FUZZY-CLASSIFIER FOR DEFECT DETECTION IN INDUSTRIAL LNG/LPG TANKSCOIFLET-BASED FUZZY-CLASSIFIER FOR DEFECT DETECTION IN INDUSTRIAL LNG/LPG TANKS
COIFLET-BASED FUZZY-CLASSIFIER FOR DEFECT DETECTION IN INDUSTRIAL LNG/LPG TANKS
csandit
 

Ähnlich wie PERFORMANCE PREDICTION OF GPU-BASED DEEP LEARNING APPLICATIONS (20)

PERFORMANCEATMOSPHERE - PREDICTION OF GPU-BASED DEEP LEARNING APPLICATIONS
PERFORMANCEATMOSPHERE - PREDICTION OF GPU-BASED DEEP LEARNING APPLICATIONSPERFORMANCEATMOSPHERE - PREDICTION OF GPU-BASED DEEP LEARNING APPLICATIONS
PERFORMANCEATMOSPHERE - PREDICTION OF GPU-BASED DEEP LEARNING APPLICATIONS
 
Inverted_Decoupling_2DoF_Internal_Model_Control_fo.pdf
Inverted_Decoupling_2DoF_Internal_Model_Control_fo.pdfInverted_Decoupling_2DoF_Internal_Model_Control_fo.pdf
Inverted_Decoupling_2DoF_Internal_Model_Control_fo.pdf
 
Adaptive Laser Cladding System with Variable Spot Sizes
Adaptive Laser Cladding System with Variable Spot SizesAdaptive Laser Cladding System with Variable Spot Sizes
Adaptive Laser Cladding System with Variable Spot Sizes
 
Software Architecture Challenges in Process Automation - From Code Generation...
Software Architecture Challenges in Process Automation - From Code Generation...Software Architecture Challenges in Process Automation - From Code Generation...
Software Architecture Challenges in Process Automation - From Code Generation...
 
Anticipating Implementation-Level Timing Analysis for Driving Design-Level De...
Anticipating Implementation-Level Timing Analysis for Driving Design-Level De...Anticipating Implementation-Level Timing Analysis for Driving Design-Level De...
Anticipating Implementation-Level Timing Analysis for Driving Design-Level De...
 
09.50 Ernst Vrolijks
09.50 Ernst Vrolijks09.50 Ernst Vrolijks
09.50 Ernst Vrolijks
 
DO-178C OOT supplement: A user's perspective
DO-178C OOT supplement: A user's perspectiveDO-178C OOT supplement: A user's perspective
DO-178C OOT supplement: A user's perspective
 
Woop - Workflow Optimizer
Woop - Workflow OptimizerWoop - Workflow Optimizer
Woop - Workflow Optimizer
 
Statistical Framework for Technology-Model-Product Co-Design and Convergence
Statistical Framework for Technology-Model-Product Co-Design and ConvergenceStatistical Framework for Technology-Model-Product Co-Design and Convergence
Statistical Framework for Technology-Model-Product Co-Design and Convergence
 
On using BS to improve the
On using BS to improve theOn using BS to improve the
On using BS to improve the
 
lecture 2 parametric yield.pdf
lecture 2 parametric yield.pdflecture 2 parametric yield.pdf
lecture 2 parametric yield.pdf
 
Recent Advances in CPLEX 12.6.1
Recent Advances in CPLEX 12.6.1Recent Advances in CPLEX 12.6.1
Recent Advances in CPLEX 12.6.1
 
Workshop Chemical Robotics ChemAI 231116.pptx
Workshop Chemical Robotics ChemAI 231116.pptxWorkshop Chemical Robotics ChemAI 231116.pptx
Workshop Chemical Robotics ChemAI 231116.pptx
 
Silicon Photonics and datacenter
Silicon Photonics and datacenterSilicon Photonics and datacenter
Silicon Photonics and datacenter
 
Clock Gating of Streaming Applications for Power Minimization on FPGA’s
 	  Clock Gating of Streaming Applications for Power Minimization on FPGA’s 	  Clock Gating of Streaming Applications for Power Minimization on FPGA’s
Clock Gating of Streaming Applications for Power Minimization on FPGA’s
 
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a P...
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a P..."Approaches for Energy Efficient Implementation of Deep Neural Networks," a P...
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a P...
 
3D-DRESD Alberto Gallini
3D-DRESD Alberto Gallini3D-DRESD Alberto Gallini
3D-DRESD Alberto Gallini
 
CP3_SDM_2010_Souma
CP3_SDM_2010_SoumaCP3_SDM_2010_Souma
CP3_SDM_2010_Souma
 
Fogify: A Fog Computing Emulation Framework
Fogify: A Fog Computing Emulation FrameworkFogify: A Fog Computing Emulation Framework
Fogify: A Fog Computing Emulation Framework
 
COIFLET-BASED FUZZY-CLASSIFIER FOR DEFECT DETECTION IN INDUSTRIAL LNG/LPG TANKS
COIFLET-BASED FUZZY-CLASSIFIER FOR DEFECT DETECTION IN INDUSTRIAL LNG/LPG TANKSCOIFLET-BASED FUZZY-CLASSIFIER FOR DEFECT DETECTION IN INDUSTRIAL LNG/LPG TANKS
COIFLET-BASED FUZZY-CLASSIFIER FOR DEFECT DETECTION IN INDUSTRIAL LNG/LPG TANKS
 

Mehr von ATMOSPHERE .

Mehr von ATMOSPHERE . (20)

Software Defined Networking in the ATMOSPHERE project
Software Defined Networking in the ATMOSPHERE projectSoftware Defined Networking in the ATMOSPHERE project
Software Defined Networking in the ATMOSPHERE project
 
Managing Trustworthy Big-data Applications in the Cloud with the ATMOSPHERE P...
Managing Trustworthy Big-data Applications in the Cloud with the ATMOSPHERE P...Managing Trustworthy Big-data Applications in the Cloud with the ATMOSPHERE P...
Managing Trustworthy Big-data Applications in the Cloud with the ATMOSPHERE P...
 
On the development of a Visual-Temporal-awareness Rheumatic Heart Disease cla...
On the development of a Visual-Temporal-awareness Rheumatic Heart Disease cla...On the development of a Visual-Temporal-awareness Rheumatic Heart Disease cla...
On the development of a Visual-Temporal-awareness Rheumatic Heart Disease cla...
 
Control Plane Data Characterisation for an 5G NFV Environment
Control Plane Data Characterisation for an 5G NFV EnvironmentControl Plane Data Characterisation for an 5G NFV Environment
Control Plane Data Characterisation for an 5G NFV Environment
 
Designing an Open IoT Ecosystem
Designing an Open IoT EcosystemDesigning an Open IoT Ecosystem
Designing an Open IoT Ecosystem
 
Cloud Robotics: Cognitive Augmentation for Robots via the Cloud
Cloud Robotics: Cognitive Augmentation for Robots via the CloudCloud Robotics: Cognitive Augmentation for Robots via the Cloud
Cloud Robotics: Cognitive Augmentation for Robots via the Cloud
 
Artificial Neural Networks for Resource Allocation in 5G Remote Areas
Artificial Neural Networks for Resource Allocation in 5G Remote AreasArtificial Neural Networks for Resource Allocation in 5G Remote Areas
Artificial Neural Networks for Resource Allocation in 5G Remote Areas
 
Compliance of the privacy regulations in an international Europe-Brazil context
Compliance of the privacy regulations in an international Europe-Brazil contextCompliance of the privacy regulations in an international Europe-Brazil context
Compliance of the privacy regulations in an international Europe-Brazil context
 
Using Computational Back-ends for Artificial Intelligence in Childhood Cancer...
Using Computational Back-ends for Artificial Intelligence in Childhood Cancer...Using Computational Back-ends for Artificial Intelligence in Childhood Cancer...
Using Computational Back-ends for Artificial Intelligence in Childhood Cancer...
 
Optimization Models for on-demand GPUs in the Cloud
Optimization Models for on-demand GPUs in the CloudOptimization Models for on-demand GPUs in the Cloud
Optimization Models for on-demand GPUs in the Cloud
 
SBC Thematic Groups Organisation
SBC Thematic Groups OrganisationSBC Thematic Groups Organisation
SBC Thematic Groups Organisation
 
Cloud Computing Interest Group
Cloud Computing Interest GroupCloud Computing Interest Group
Cloud Computing Interest Group
 
5G-Range - 5G networks for remote areas
5G-Range - 5G networks for remote areas5G-Range - 5G networks for remote areas
5G-Range - 5G networks for remote areas
 
NECOS Project: Lightweight Slicing of CloudFederated Infrastructures
NECOS Project: Lightweight Slicing of CloudFederated InfrastructuresNECOS Project: Lightweight Slicing of CloudFederated Infrastructures
NECOS Project: Lightweight Slicing of CloudFederated Infrastructures
 
SWAMP: Smart Water Management Platform
SWAMP: Smart Water Management PlatformSWAMP: Smart Water Management Platform
SWAMP: Smart Water Management Platform
 
OCARIoT - Smart Childhood Obesity Caring Solution using IoT Potential
OCARIoT - Smart Childhood Obesity Caring Solution using IoT PotentialOCARIoT - Smart Childhood Obesity Caring Solution using IoT Potential
OCARIoT - Smart Childhood Obesity Caring Solution using IoT Potential
 
ATMOSPHERE - Adaptive, Trustworthy, Manageable, Orchestrated, Secure Privacy-...
ATMOSPHERE - Adaptive, Trustworthy, Manageable, Orchestrated, Secure Privacy-...ATMOSPHERE - Adaptive, Trustworthy, Manageable, Orchestrated, Secure Privacy-...
ATMOSPHERE - Adaptive, Trustworthy, Manageable, Orchestrated, Secure Privacy-...
 
Secure containers for trustworthy cloud services: business opportunities
 Secure containers for trustworthy cloud services: business opportunities Secure containers for trustworthy cloud services: business opportunities
Secure containers for trustworthy cloud services: business opportunities
 
Integration of the Trustworthiness Assessment with Industry Systems
Integration of the Trustworthiness Assessment with Industry SystemsIntegration of the Trustworthiness Assessment with Industry Systems
Integration of the Trustworthiness Assessment with Industry Systems
 
Trustworthy cloud services for Medical Imaging Biomarkers
Trustworthy cloud services for Medical Imaging BiomarkersTrustworthy cloud services for Medical Imaging Biomarkers
Trustworthy cloud services for Medical Imaging Biomarkers
 

Kürzlich hochgeladen

saputara Escort💋 Call Girl (Ramya) Service #saputara Call Girl @Independent G...
saputara Escort💋 Call Girl (Ramya) Service #saputara Call Girl @Independent G...saputara Escort💋 Call Girl (Ramya) Service #saputara Call Girl @Independent G...
saputara Escort💋 Call Girl (Ramya) Service #saputara Call Girl @Independent G...
mountabuangels4u
 
Sample sample sample sample sample sample
Sample sample sample sample sample sampleSample sample sample sample sample sample
Sample sample sample sample sample sample
Casey Keith
 
Bhavnagar Escort💋 Call Girl (Komal) Service #Bhavnagar Call Girl @Independent...
Bhavnagar Escort💋 Call Girl (Komal) Service #Bhavnagar Call Girl @Independent...Bhavnagar Escort💋 Call Girl (Komal) Service #Bhavnagar Call Girl @Independent...
Bhavnagar Escort💋 Call Girl (Komal) Service #Bhavnagar Call Girl @Independent...
mountabuangels4u
 
Vadodara Escort💋 Call Girl (Bindu) Service #Vadodara Call Girl @Independent G...
Vadodara Escort💋 Call Girl (Bindu) Service #Vadodara Call Girl @Independent G...Vadodara Escort💋 Call Girl (Bindu) Service #Vadodara Call Girl @Independent G...
Vadodara Escort💋 Call Girl (Bindu) Service #Vadodara Call Girl @Independent G...
mountabuangels4u
 

Kürzlich hochgeladen (20)

Ramnagar Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Ramnagar Call Girls 🥰 8617370543 Service Offer VIP Hot ModelRamnagar Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Ramnagar Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Purba Bardhaman Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Purba Bardhaman Call Girls 🥰 8617370543 Service Offer VIP Hot ModelPurba Bardhaman Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Purba Bardhaman Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
saputara Escort💋 Call Girl (Ramya) Service #saputara Call Girl @Independent G...
saputara Escort💋 Call Girl (Ramya) Service #saputara Call Girl @Independent G...saputara Escort💋 Call Girl (Ramya) Service #saputara Call Girl @Independent G...
saputara Escort💋 Call Girl (Ramya) Service #saputara Call Girl @Independent G...
 
Champawat Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Champawat Call Girls 🥰 8617370543 Service Offer VIP Hot ModelChampawat Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Champawat Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Krishnanagar Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Krishnanagar Call Girls 🥰 8617370543 Service Offer VIP Hot ModelKrishnanagar Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Krishnanagar Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
A high-altitude adventure in the mountain kingdom
A high-altitude adventure in the mountain kingdomA high-altitude adventure in the mountain kingdom
A high-altitude adventure in the mountain kingdom
 
Sample sample sample sample sample sample
Sample sample sample sample sample sampleSample sample sample sample sample sample
Sample sample sample sample sample sample
 
TOURISM ATTRACTION IN LESOTHO 2024.Pptx.
TOURISM ATTRACTION IN LESOTHO 2024.Pptx.TOURISM ATTRACTION IN LESOTHO 2024.Pptx.
TOURISM ATTRACTION IN LESOTHO 2024.Pptx.
 
Alipurduar Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Alipurduar Call Girls 🥰 8617370543 Service Offer VIP Hot ModelAlipurduar Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Alipurduar Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Bhavnagar Escort💋 Call Girl (Komal) Service #Bhavnagar Call Girl @Independent...
Bhavnagar Escort💋 Call Girl (Komal) Service #Bhavnagar Call Girl @Independent...Bhavnagar Escort💋 Call Girl (Komal) Service #Bhavnagar Call Girl @Independent...
Bhavnagar Escort💋 Call Girl (Komal) Service #Bhavnagar Call Girl @Independent...
 
Pithoragarh Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Pithoragarh Call Girls 🥰 8617370543 Service Offer VIP Hot ModelPithoragarh Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Pithoragarh Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Raiganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Raiganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelRaiganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Raiganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Bhuj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Bhuj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelBhuj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Bhuj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Awesome places one can visit in Lesotho.
Awesome places one can visit in Lesotho.Awesome places one can visit in Lesotho.
Awesome places one can visit in Lesotho.
 
abortion pills in Riyadh+966572737505 Cytotec Riyadh
abortion pills in  Riyadh+966572737505    Cytotec Riyadhabortion pills in  Riyadh+966572737505    Cytotec Riyadh
abortion pills in Riyadh+966572737505 Cytotec Riyadh
 
Ooty Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Ooty Call Girls 🥰 8617370543 Service Offer VIP Hot ModelOoty Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Ooty Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Rajkot Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Rajkot Call Girls 🥰 8617370543 Service Offer VIP Hot ModelRajkot Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Rajkot Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Mumbai Call Girls In 09167354423, Mira Road Call Girls.docx
Mumbai Call Girls In 09167354423, Mira Road Call Girls.docxMumbai Call Girls In 09167354423, Mira Road Call Girls.docx
Mumbai Call Girls In 09167354423, Mira Road Call Girls.docx
 
Vadodara Escort💋 Call Girl (Bindu) Service #Vadodara Call Girl @Independent G...
Vadodara Escort💋 Call Girl (Bindu) Service #Vadodara Call Girl @Independent G...Vadodara Escort💋 Call Girl (Bindu) Service #Vadodara Call Girl @Independent G...
Vadodara Escort💋 Call Girl (Bindu) Service #Vadodara Call Girl @Independent G...
 
Rudrapur Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Rudrapur Call Girls 🥰 8617370543 Service Offer VIP Hot ModelRudrapur Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Rudrapur Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

PERFORMANCE PREDICTION OF GPU-BASED DEEP LEARNING APPLICATIONS

  • 1. POLITECNICO DI MILANOSlide 1 PERFORMANCE PREDICTION OF GPU-BASED DEEP LEARNING APPLICATIONS Eugenio Gianniti1, Li Zhang2, and Danilo Ardagna1 1Politecnico di Milano, 2IBM Research
  • 2. Introduction & Motivation POLITECNICO DI MILANOSlide 2 Deep learning is widely applied across industries GPU as a service: $200-million market in 2016 with a 30% CAGR Model learning greatly benefits from GPUs Goal: performance models with good generality
  • 3. Convolutional Neural Networks POLITECNICO DI MILANOSlide 3 CNNs are characterized by their architecture and hyper-parameters
  • 4. Contributions POLITECNICO DI MILANOSlide 4 Per Layer Models End to End Models
  • 5. Layer Computational Complexity POLITECNICO DI MILANOSlide 5 Derived from architecture and hyper-parameters Layer Forward Backward Conv HfWfCinCout (2HfWfCin + 1) Cout FC HinWinCinCout 2HinWinCinCout Loss 4Cout 1 Cout + 1 Norm 5Cout + Cn 2 8Cout + Cn 1 Pool HfWfCout (HfWf + 1) Cout ReLU 3Cout 4Cout
  • 6. Per Layer Models POLITECNICO DI MILANOSlide 6 Learn linear regression models per layer Sum terms according to CNN architecture tl = 0l + 1lcl + "l ˆtcnn = I X l2L ˆtl
  • 7. End to End Models POLITECNICO DI MILANOSlide 7 Extract from historical data the execution time of the network in its entirety Apply linear regression Vary batch sizes and number of iterations Affine dependency
  • 8. Experimental Settings POLITECNICO DI MILANOSlide 8 Several well-known convolutional neural networks ImageNet dataset (ILSVRC12) AlexNet GoogLeNet VGG-16 Intel Xeon 10-core processor and NVIDIA Quadro M6000 3,072 CUDA cores 12 GB dedicated RAM, 317 GB/s bandwidth 7.0 TFLOPS in single precision
  • 9. GoogLeNet Pooling Layers POLITECNICO DI MILANOSlide 9 Large errors with only one model for pooling layers 50 100 150 200 250 300 0 0 5 10 15 20 Millions of operations Time[ms]
  • 10. GoogLeNet Pooling Layers POLITECNICO DI MILANOSlide 10 S = 1 50 100 150 200 250 300 0 0 5 10 15 20 Millions of operations Time[ms] 50 100 150 200 250 300 0 0 5 10 15 20 Millions of operations Time[ms] S > 1
  • 11. Linear Regression Coefficients POLITECNICO DI MILANOSlide 11 Models obtained from GoogLeNet timing data Category ˆfw 0 [ms] ˆfw 1 h ms op i ˆbw 0 [ms] ˆbw 1 h ms op i Conv/FC 1.83e 1 3.43e 10 3.15e 1 3.65e 10 Norm 1.64e 2 7.11e 9 1.01e 1 6.87e 9 Pool S = 1 2.23e 2 1.27e 8 1.52e 1 1.93e 8 Pool S > 1 1.44e 2 1.45e 8 6.11e 3 5.67e 8 ReLU/Drop 8.91e 3 1.17e 8 1.18e 2 1.33e 8
  • 12. Actual vs Fitted Plot – Per Layer POLITECNICO DI MILANOSlide 12 AlexNet model, AlexNet prediction 0 5 10 15 20 0 5 10 15 20 Predicted [s] Actual[s]
  • 13. Actual vs Fitted Plot – Per Layer POLITECNICO DI MILANOSlide 13 Eugenio Gianniti GoogLeNet model, AlexNet prediction 0 5 10 15 20 25 0 5 10 15 20 25 Predicted [s] Actual[s]
  • 14. Actual vs Fitted Plot – End to End POLITECNICO DI MILANOSlide 14 Eugenio Gianniti VGG model, VGG prediction 0 0 500 500 1000 1000 1500 1500 2000 2000 2500 2500 Predicted [s] Actual[s] VGG-16
  • 15. End to End Models Extrapolation Capabilities POLITECNICO DI MILANOSlide 15 Eugenio Gianniti Iterations Batch size Test Train Pred. 0 20 40 60 Executiontime[s] 80 100 120 140 2000 2500 70 500 1000 1500 0 00 0 10 20 20 30 40 40 50 60 60 AlexNet — ntrain = 6
  • 16. Conclusions & Future Work POLITECNICO DI MILANOSlide 16 Capacity planning based on performance prediction Accurate per layer model with good generalization capability Optimal job scheduling relying on performance models End to end model linked to the specific CNN, resulting in different accuracy/generality trade-offs Multi-GPU, multi-framework modeling
  • 17. The End POLITECNICO DI MILANOSlide 17 Thanks for your attention!
  • 18. 18 Batch Size Extr. GPUs number Extr. Comput. Power Extr. Extr. Inner Modules number Multi-GPU results Slide 18 • Performance prediction goals: • Extrapolation on batch size • Exploitation of new hardware • Training of new versions of a CNN