SlideShare ist ein Scribd-Unternehmen logo
1 von 21
The benefits of upgrading to Haswell Architecture
and Windows 8.1:
Benchmarking of Hybrid (CPUGPU) Parallel
Processing (CUDA) – enabled, MATLAB Image
Processing Algorithms in GTX TITAN and GTX 780M
DIMITRIS VAYENAS, POSTGRADUATE STUDENT
DEPARTMENT OF COMPUTER SCIENCE @ THE UNIVERSITY OF OXFORD &
SOFTWARE INCUBATOR AT ISIS INNOVATION LTD.
Contents
 Introduction
 A “Real-Life” Hybrid (CPU-GPU) Algorithm
 Hardware and Software of Testing
 Performance
 Comparison
 Conclusion
 Acknowledgements
Introduction
 In this laboratory we are attempting to address the following question:
Is it is worth upgrading from Ivy Bridge to a Haswell Architecture in order to
improve performance?
 Intel claims that its new HD 4600 Integrated Graphics Core in the 4th
Generation Intel i7 processors can increase performance over the previous
architecture by up to 7 times.
 What kind of performance improvements can we look forward in “real life
examples” and under what conditions?
A “Real-Life” Hybrid Algorithm (1/2)
 Hybrid: Executes in both CPU and GPU
Consider a MATLAB implemented algorithm containing the following steps:
A “Real-Life” Hybrid Algorithm (2/2)
 In the hybrid Algorithm the tasks in black are performed in the GPU while
the tasks in red performed in the CPU.
 Thus, we have the usual overhead of transferring the data to and from the
GPU whereas the performance of the CPU plays a significant role; this
consideration is usually ignored by most graphics performance
benchmarks who test either the GPU or the CPU, but not both.
 Ideally we should have liked to run all tasks in the GPU, however the
current version of MATLAB does not, yet, support these functions in the
Parallel Processing Unit.
 As we will see the NVIDIA Drivers have substantial impact on Performance
Hardware and Software of Testing
 System I:
 SCAN Workstation with NVIDIA GTX TITAN, Intel i7 3770K @ 4.5 GHz, 32GB RAM @ 2133
MHz, SSD with over 500 MB/s at Read and Write
OS: Windows Server 2012 Datacentre Edition
NVIDIA Driver: 320.49
 System II:
 Schenker W503 with NVIDIA GTX 780M, Intel i7 4800 @ 3.5 GHz, 16 GB RAM @1600
MHz, SSD with over 500 MB/s at Read and Write
 A) OS: Windows Server 2012 Datacentre Edition
NVIDIA Driver: 320.49
 B) OS: Windows 8.1
NVIDIA Driver: 326.01
(Important Notice: Figures for System I on Windows 8.1 will be added here by
Wednesday 3/7/2013)
Performance (total runtimes)
Task System I
(TITAN on WinSrv 2012)
System II (a)
(780M on WinSrv 2012)
System II (b)
(780M on Win 8.1)
(number of runs per
test/where (CPU or GPU))
(results in seconds – best is less)
Edge (800/CPU) 1720.265 1661.289 1261.870
Regionprops (400/CPU) 956.622 899.934 646.883
Imfilter (1600/GPU) 339.045 339.477 263.572
Imresize(1200/CPU) 338.574 295.782 199.593
Padarray (2000/CPU) 204.734 196.303 149.067
Imfilter (1600/GPU) 126.362 131.112 101.717
Performance (total run times)
1720.265
956.622
339.045 338.574
204.734
1661.289
899.934
339.477 295.782
196.303
1261.87
646.883
263.572
199.593
149.067
0
200
400
600
800
1000
1200
1400
1600
1800
2000
Edge (800) Regionprops (400) Imfilter (1600) Imresize(1200) Padarray (2000)
Task time totals (less is better)
System I System II (a) System II (b)
Performance (Indicative times to process an image)
Parameters: Magnification, Fudge Factor, Sigma and HSize
Image Processing System I System II (a) System II (b)
(results in seconds)
Mag_1_FF_0.2_S_0.2_HS_1 0.39699 0.49159 0.18465
Mag_1_FF_0.6_S_0.6_HS_61 0.46689 0.62617 0.38815
Mag_1_FF_1_S_0.8_HS_1 11.4042 8.1427 0.49579
Mag_3_FF_0.4_S_0.8_HS_41 3.1976 2.8881 1.4568
Mag_5_FF_0.4_S_0.8_HS_41 5.7096 4.4588 3.9456
Mag_7_FF_0.4_S_0.8_HS_41 9.1622 10.6905 8.4348
Mag_9_FF_0.4_S_0.8_HS_41 14.5562 17.9971 14.8889
Mag_9_FF_1_S_0.8_HS_41 28.8458 17.0872 15.5799
Performance (Indicative times to process an image)
Parameters: Magnification, Fudge Factor, Sigma and HSize
0
0.39699
0.46689
11.4042
3.1976
5.7096
9.1622
14.5562
28.8458
0.49159
0.62617
8.1427
2.8881
4.4588
10.6905
17.9971
17.0872
0.18465
0.38815
0.49579
1.4568
3.9456
8.4348
14.8889
15.5799
EXECUTION TIME IN SECONDS TO PROCESS SPECIFIC IMAGES
System I System II (a) System II (b)
Performance Comparison (total run times)
Task System II (a) vs. System I System II (b) vs.
System II (a)
System II (b) vs. System I
(number of runs per
test/where (CPU or GPU))
Percentage Change
Edge (800/CPU) 3.4 24.0 26.6
Regionprops (400/CPU) 5.9 28.1 32.4
Imfilter (1600/GPU) -0.1 22.4 22.3
Imresize(1200/CPU) 12.6 32.5 41.0
Padarray (2000/CPU) 4.1 24.1 27.2
Imfilter (1600/GPU) -3.8 22.4 19.5
Performance Comparison (total run times)
3.4
5.9
-0.1
12.6
4.1
-3.8
24
28.1
22.4
32.5
24.1
22.4
26.6
32.4
22.3
41
27.2
-10
-5
0
5
10
15
20
25
30
35
40
45
Percentage Change
System II (a) vs. System I System II (b) vs. System II (a) System II (b) vs. System I
Performance Comparison based on the time to process image
Parameters: Magnification, Fudge Factor, Sigma and HSize
Image Processing System II (a) vs.
System I
System II (b) vs. System
II (a)
System II (b) vs.
System I
Percentage of Change
Mag_1_FF_0.2_S_0.2_HS_1 -23.8 62.4 53.5
Mag_1_FF_0.6_S_0.6_HS_61 -34.1 38.0 16.9
Mag_1_FF_1_S_0.8_HS_1 28.6 93.9 95.7
Mag_3_FF_0.4_S_0.8_HS_41 9.7 49.6 54.4
Mag_5_FF_0.4_S_0.8_HS_41 21.9 11.5 30.9
Mag_7_FF_0.4_S_0.8_HS_41 -16.7 21.1 7.9
Mag_9_FF_0.4_S_0.8_HS_41 -23.6 17.3 -2.3
Mag_9_FF_1_S_0.8_HS_41 40.8 8.8 46.0
Performance Comparison based on the time to process image
Parameters: Magnification, Fudge Factor, Sigma and HSize
0
-23.8
-34.1
28.6
9.7
21.9
-16.7
-23.6
40.8
62.4
38
93.9
49.6
11.5
21.1 17.3
8.8
53.5
16.9
95.7
54.4
30.9
7.9
-2.3
46
-60
-40
-20
0
20
40
60
80
100
120
Percentage change in image processing
System II (a) vs. System I System II (b) vs. System II (a) System II (b) vs. System I
Conclusion
 The performance improvements due to the new architecture in Intel’s fourth
generation i7 family are substantial as we notice the great improvements for
related of the i7 4800 Mobile CPU over the overclocked i7 3770K!
 NVIDIA also seems to offer improved support of its GTX 7*** Series on Windows
8.1 where we have seen improvement of over 93.9% for a set of parameters
and over 20% overall on an identical hardware running on Windows 8.1 with
326.01 driver vs. the 320.49 driver.
 Obviously, measuring the performance of hybrid algorithms is similar to asking
“how long is a piece of spring”, but given the fact that we see manufacturers
fine-tuning their products in order to perform better in standard benchmarking
tools it is always wise to create your own benchmarks that fit your applications
Acknowledgements
I would like to thank the following individuals for their help in measuring and
optimising the performance of my MATLAB code, through their extensive
knowledge of MATLAB andor CUDA:
 Dr. Mike Giles, Professor of Scientific Computing at the University of Oxford; resident
expert for NVIDIA and MATLAB
 Dr. James Lebak, Parallel Computing Software Engineer at MathWorksat
Mathworks Boston HQ.
 Captain (USMC) John Roberts, Senior Principal GPGPU Software Engineer at BAE
Systems, Inc. (formerly of NVIDIA); John also heads the CUDA Vision Workbench
project.
I would also like to thank XMG-Schenker for supporting my research effort
through their generous sponsorship of my Schenker W503
Hybrid CPU GPU MATLAB Image Processing Benchmarking
Hybrid CPU GPU MATLAB Image Processing Benchmarking
Hybrid CPU GPU MATLAB Image Processing Benchmarking
Hybrid CPU GPU MATLAB Image Processing Benchmarking
Hybrid CPU GPU MATLAB Image Processing Benchmarking

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (8)

Latest HPC News from NVIDIA
Latest HPC News from NVIDIALatest HPC News from NVIDIA
Latest HPC News from NVIDIA
 
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloudPart 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloud
 
Nvidia SC13 Podcast
Nvidia SC13 PodcastNvidia SC13 Podcast
Nvidia SC13 Podcast
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
 
RAPIDS Overview
RAPIDS OverviewRAPIDS Overview
RAPIDS Overview
 
Ai Forum at Computex 2017 - Keynote Slides by Jensen Huang
Ai Forum at Computex 2017 - Keynote Slides by Jensen HuangAi Forum at Computex 2017 - Keynote Slides by Jensen Huang
Ai Forum at Computex 2017 - Keynote Slides by Jensen Huang
 
GTC 2017: Powering the AI Revolution
GTC 2017: Powering the AI RevolutionGTC 2017: Powering the AI Revolution
GTC 2017: Powering the AI Revolution
 
OpenACC Monthly Highlights June 2017
OpenACC Monthly Highlights June 2017OpenACC Monthly Highlights June 2017
OpenACC Monthly Highlights June 2017
 

Ähnlich wie Hybrid CPU GPU MATLAB Image Processing Benchmarking

Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Databricks
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red_Hat_Storage
 

Ähnlich wie Hybrid CPU GPU MATLAB Image Processing Benchmarking (20)

Get results from demanding workflows in less time with the new HP Z8 Fury G5 ...
Get results from demanding workflows in less time with the new HP Z8 Fury G5 ...Get results from demanding workflows in less time with the new HP Z8 Fury G5 ...
Get results from demanding workflows in less time with the new HP Z8 Fury G5 ...
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
 
High End Modeling & Imaging with Intel Iris Pro Graphics
High End Modeling & Imaging with Intel Iris Pro GraphicsHigh End Modeling & Imaging with Intel Iris Pro Graphics
High End Modeling & Imaging with Intel Iris Pro Graphics
 
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
 
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing
 
Dell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation WebinarDell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation Webinar
 
IRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDA
IRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDAIRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDA
IRJET-A Study on Parallization of Genetic Algorithms on GPUS using CUDA
 
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...
 
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
 
Get in and stay in the productivity zone with the HP Z2 G9 Tower Workstation
Get in and stay in the productivity zone with the HP Z2 G9 Tower WorkstationGet in and stay in the productivity zone with the HP Z2 G9 Tower Workstation
Get in and stay in the productivity zone with the HP Z2 G9 Tower Workstation
 
The new HP Z8 Fury G5 Workstation Desktop PC: Crunch through demanding worklo...
The new HP Z8 Fury G5 Workstation Desktop PC: Crunch through demanding worklo...The new HP Z8 Fury G5 Workstation Desktop PC: Crunch through demanding worklo...
The new HP Z8 Fury G5 Workstation Desktop PC: Crunch through demanding worklo...
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai
 
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
 
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
 
Accelerating AI from the Cloud to the Edge
Accelerating AI from the Cloud to the EdgeAccelerating AI from the Cloud to the Edge
Accelerating AI from the Cloud to the Edge
 
Laptop drive performance comparison: Seagate Solid State Hybrid Drive vs. har...
Laptop drive performance comparison: Seagate Solid State Hybrid Drive vs. har...Laptop drive performance comparison: Seagate Solid State Hybrid Drive vs. har...
Laptop drive performance comparison: Seagate Solid State Hybrid Drive vs. har...
 
Trends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsTrends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systems
 
Benchmark of common AI accelerators: NVIDIA GPU vs. Intel Movidius
Benchmark of common AI accelerators: NVIDIA GPU vs. Intel MovidiusBenchmark of common AI accelerators: NVIDIA GPU vs. Intel Movidius
Benchmark of common AI accelerators: NVIDIA GPU vs. Intel Movidius
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
 
Performance and power comparisons between nvidia and ati gpus
Performance and power comparisons between nvidia and ati gpusPerformance and power comparisons between nvidia and ati gpus
Performance and power comparisons between nvidia and ati gpus
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Hybrid CPU GPU MATLAB Image Processing Benchmarking

  • 1. The benefits of upgrading to Haswell Architecture and Windows 8.1: Benchmarking of Hybrid (CPUGPU) Parallel Processing (CUDA) – enabled, MATLAB Image Processing Algorithms in GTX TITAN and GTX 780M DIMITRIS VAYENAS, POSTGRADUATE STUDENT DEPARTMENT OF COMPUTER SCIENCE @ THE UNIVERSITY OF OXFORD & SOFTWARE INCUBATOR AT ISIS INNOVATION LTD.
  • 2. Contents  Introduction  A “Real-Life” Hybrid (CPU-GPU) Algorithm  Hardware and Software of Testing  Performance  Comparison  Conclusion  Acknowledgements
  • 3. Introduction  In this laboratory we are attempting to address the following question: Is it is worth upgrading from Ivy Bridge to a Haswell Architecture in order to improve performance?  Intel claims that its new HD 4600 Integrated Graphics Core in the 4th Generation Intel i7 processors can increase performance over the previous architecture by up to 7 times.  What kind of performance improvements can we look forward in “real life examples” and under what conditions?
  • 4. A “Real-Life” Hybrid Algorithm (1/2)  Hybrid: Executes in both CPU and GPU Consider a MATLAB implemented algorithm containing the following steps:
  • 5. A “Real-Life” Hybrid Algorithm (2/2)  In the hybrid Algorithm the tasks in black are performed in the GPU while the tasks in red performed in the CPU.  Thus, we have the usual overhead of transferring the data to and from the GPU whereas the performance of the CPU plays a significant role; this consideration is usually ignored by most graphics performance benchmarks who test either the GPU or the CPU, but not both.  Ideally we should have liked to run all tasks in the GPU, however the current version of MATLAB does not, yet, support these functions in the Parallel Processing Unit.  As we will see the NVIDIA Drivers have substantial impact on Performance
  • 6. Hardware and Software of Testing  System I:  SCAN Workstation with NVIDIA GTX TITAN, Intel i7 3770K @ 4.5 GHz, 32GB RAM @ 2133 MHz, SSD with over 500 MB/s at Read and Write OS: Windows Server 2012 Datacentre Edition NVIDIA Driver: 320.49  System II:  Schenker W503 with NVIDIA GTX 780M, Intel i7 4800 @ 3.5 GHz, 16 GB RAM @1600 MHz, SSD with over 500 MB/s at Read and Write  A) OS: Windows Server 2012 Datacentre Edition NVIDIA Driver: 320.49  B) OS: Windows 8.1 NVIDIA Driver: 326.01 (Important Notice: Figures for System I on Windows 8.1 will be added here by Wednesday 3/7/2013)
  • 7. Performance (total runtimes) Task System I (TITAN on WinSrv 2012) System II (a) (780M on WinSrv 2012) System II (b) (780M on Win 8.1) (number of runs per test/where (CPU or GPU)) (results in seconds – best is less) Edge (800/CPU) 1720.265 1661.289 1261.870 Regionprops (400/CPU) 956.622 899.934 646.883 Imfilter (1600/GPU) 339.045 339.477 263.572 Imresize(1200/CPU) 338.574 295.782 199.593 Padarray (2000/CPU) 204.734 196.303 149.067 Imfilter (1600/GPU) 126.362 131.112 101.717
  • 8. Performance (total run times) 1720.265 956.622 339.045 338.574 204.734 1661.289 899.934 339.477 295.782 196.303 1261.87 646.883 263.572 199.593 149.067 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Edge (800) Regionprops (400) Imfilter (1600) Imresize(1200) Padarray (2000) Task time totals (less is better) System I System II (a) System II (b)
  • 9. Performance (Indicative times to process an image) Parameters: Magnification, Fudge Factor, Sigma and HSize Image Processing System I System II (a) System II (b) (results in seconds) Mag_1_FF_0.2_S_0.2_HS_1 0.39699 0.49159 0.18465 Mag_1_FF_0.6_S_0.6_HS_61 0.46689 0.62617 0.38815 Mag_1_FF_1_S_0.8_HS_1 11.4042 8.1427 0.49579 Mag_3_FF_0.4_S_0.8_HS_41 3.1976 2.8881 1.4568 Mag_5_FF_0.4_S_0.8_HS_41 5.7096 4.4588 3.9456 Mag_7_FF_0.4_S_0.8_HS_41 9.1622 10.6905 8.4348 Mag_9_FF_0.4_S_0.8_HS_41 14.5562 17.9971 14.8889 Mag_9_FF_1_S_0.8_HS_41 28.8458 17.0872 15.5799
  • 10. Performance (Indicative times to process an image) Parameters: Magnification, Fudge Factor, Sigma and HSize 0 0.39699 0.46689 11.4042 3.1976 5.7096 9.1622 14.5562 28.8458 0.49159 0.62617 8.1427 2.8881 4.4588 10.6905 17.9971 17.0872 0.18465 0.38815 0.49579 1.4568 3.9456 8.4348 14.8889 15.5799 EXECUTION TIME IN SECONDS TO PROCESS SPECIFIC IMAGES System I System II (a) System II (b)
  • 11. Performance Comparison (total run times) Task System II (a) vs. System I System II (b) vs. System II (a) System II (b) vs. System I (number of runs per test/where (CPU or GPU)) Percentage Change Edge (800/CPU) 3.4 24.0 26.6 Regionprops (400/CPU) 5.9 28.1 32.4 Imfilter (1600/GPU) -0.1 22.4 22.3 Imresize(1200/CPU) 12.6 32.5 41.0 Padarray (2000/CPU) 4.1 24.1 27.2 Imfilter (1600/GPU) -3.8 22.4 19.5
  • 12. Performance Comparison (total run times) 3.4 5.9 -0.1 12.6 4.1 -3.8 24 28.1 22.4 32.5 24.1 22.4 26.6 32.4 22.3 41 27.2 -10 -5 0 5 10 15 20 25 30 35 40 45 Percentage Change System II (a) vs. System I System II (b) vs. System II (a) System II (b) vs. System I
  • 13. Performance Comparison based on the time to process image Parameters: Magnification, Fudge Factor, Sigma and HSize Image Processing System II (a) vs. System I System II (b) vs. System II (a) System II (b) vs. System I Percentage of Change Mag_1_FF_0.2_S_0.2_HS_1 -23.8 62.4 53.5 Mag_1_FF_0.6_S_0.6_HS_61 -34.1 38.0 16.9 Mag_1_FF_1_S_0.8_HS_1 28.6 93.9 95.7 Mag_3_FF_0.4_S_0.8_HS_41 9.7 49.6 54.4 Mag_5_FF_0.4_S_0.8_HS_41 21.9 11.5 30.9 Mag_7_FF_0.4_S_0.8_HS_41 -16.7 21.1 7.9 Mag_9_FF_0.4_S_0.8_HS_41 -23.6 17.3 -2.3 Mag_9_FF_1_S_0.8_HS_41 40.8 8.8 46.0
  • 14. Performance Comparison based on the time to process image Parameters: Magnification, Fudge Factor, Sigma and HSize 0 -23.8 -34.1 28.6 9.7 21.9 -16.7 -23.6 40.8 62.4 38 93.9 49.6 11.5 21.1 17.3 8.8 53.5 16.9 95.7 54.4 30.9 7.9 -2.3 46 -60 -40 -20 0 20 40 60 80 100 120 Percentage change in image processing System II (a) vs. System I System II (b) vs. System II (a) System II (b) vs. System I
  • 15. Conclusion  The performance improvements due to the new architecture in Intel’s fourth generation i7 family are substantial as we notice the great improvements for related of the i7 4800 Mobile CPU over the overclocked i7 3770K!  NVIDIA also seems to offer improved support of its GTX 7*** Series on Windows 8.1 where we have seen improvement of over 93.9% for a set of parameters and over 20% overall on an identical hardware running on Windows 8.1 with 326.01 driver vs. the 320.49 driver.  Obviously, measuring the performance of hybrid algorithms is similar to asking “how long is a piece of spring”, but given the fact that we see manufacturers fine-tuning their products in order to perform better in standard benchmarking tools it is always wise to create your own benchmarks that fit your applications
  • 16. Acknowledgements I would like to thank the following individuals for their help in measuring and optimising the performance of my MATLAB code, through their extensive knowledge of MATLAB andor CUDA:  Dr. Mike Giles, Professor of Scientific Computing at the University of Oxford; resident expert for NVIDIA and MATLAB  Dr. James Lebak, Parallel Computing Software Engineer at MathWorksat Mathworks Boston HQ.  Captain (USMC) John Roberts, Senior Principal GPGPU Software Engineer at BAE Systems, Inc. (formerly of NVIDIA); John also heads the CUDA Vision Workbench project. I would also like to thank XMG-Schenker for supporting my research effort through their generous sponsorship of my Schenker W503