SlideShare a Scribd company logo
1 of 19
GPU ACCELERATED
HIGH PERFORMANCE
COMPUTING PRIMER
G RAJA SUMANT
ASHOK ASHWIN
//early draft copy
CONTENTS
1. INTRODUCTION
2.WHY USE GPU?
3.CPU ARCHITECTURE
4.GPU ARCHITECTURE
5.WHY PYTHON FOR GPU
6.HOW GPU ACCELERATION WORKS
7.TECHNOLOGIES AVAILABLE TODAY FOR GPU COMPUTING
8.CUDA+PYTHON
9.PyCuda sample code
INTRODUCTION
GPU-accelerated computing is the use of a graphics
processing unit (GPU) together with a CPU to
accelerate scientific, engineering, and enterprise
applications. Pioneered in 2007 by NVIDIA, GPUs now
power energy-efficient datacenters in government labs,
universities, enterprises, and small-and-medium
businesses around the world.
WHY USE GPU?
GPU-accelerated computing offers unprecedented
application performance by offloading compute-
intensive portions of the application to the GPU, while
the remainder of the code still runs on the CPU. From a
user's perspective, applications simply run significantly
faster.
CPU ARCHITECTURE
Design target for CPUs:
1)Focus on Task parallelism
2)Make a single thread very fast
3)Hide latency through large caches
4)Predict, speculate
GPU ARCHITECTURE
The GPU architecture of AMD
GPU ARCHITECTURE
NVIDIA ARCHITECTURE
GPU ARCHITECTURE
WHY PYTHON FOR GPU
Go to a terminal , type python
>> import this
read the output
WHY PYTHON FOR GPU
GPUs are everything that scripting languages are not.
>Highly parallel
>Very architecture-sensitive
>Built for maximum FP/memory throughput
>complement each other
CPU: largely restricted to control
>tasks (1000/sec)
>Scripting fast enough
>Python + CUDA = PyCUDA
>Python + OpenCL = PyOpenCL
http://www.nvidia.com/object/what-is-gpu-computing.html
TECHNOLOGIES AVAILABLE
TODAY FOR GPU COMPUTING
Open computing language (OpenCL)
> Many vendors: AMD, Nvidia, Apple, Intel, IBM...
> Standard CPUs may report themselves as OpenCL capable
>Works on most devices, but
>Implemented feature set and extensions may vary
Compute unified device architecture (CUDA)
>One vendor: Nvidia (more mature tools)
>Better coherence across a limited set of devices
CUDA + PYTHON
PyCUDA
>You still have to write your kernel in CUDA C
>. . . but integrates easily with numpy
>Higher level than CUDA C, but not much higher
>Full CUDA support and performance
gnumpy/CUDAMat/cuBLAS
>gnumpy: numpy-like wrapper for CUDAMat
>CUDAMat: Pre-written kernels and partial cuBLAS wrapper
>cuBLAS: (incomplete) CUDA implementation of BLAS
PyCUDA sample code
>> open helloCUDA.py in editor
look and analyse the code
CUDAMAT
The aim of the cudamat project is to make it easy to perform basic matrix
calculations on CUDA-enabled GPUs from Python. cudamat provides a
Python matrix class that performs calculations on a GPU. At present, some
of the operations the GPU matrix class supports include: Easy conversion to
and from instances of numpy.ndarray. Limited slicing support. Matrix
multiplication and transpose, Elementwise addition, subtraction,
multiplication, and division.
open cudamat examples.
GNumpy
Module gnumpy contains class garray, which behaves
much like numpy.ndarray
Module gnumpy also contains methods like tile() and
rand(), which behave like their numpy counterparts
except that they deal with gnumpy.garray instances,
instead of numpy.ndarray instances.
gnumpy builds on cudamat
References
● http://documen.tician.de/pycuda/tutorial.html
● http://on-demand.gputechconf.com/gtc/2010/presentations/S12041-PyCUDA-Simpler-GPU-Programming-Python.pdf
● http://www.tsc.uc3m.es/~miguel/MLG/adjuntos/slidesCUDA.pdf
● http://femhub.com/docs/cuda_en.pdf
● http://www.ieap.uni-kiel.de/et/people/kruse/tutorials/cuda/tutorial01p/web01p/tutorial01p.pdf
● http://conference.scipy.org/static/wiki/scipy09-pycuda-tut.pdf

More Related Content

What's hot

CUDA by Example : The Final Countdown : Notes
CUDA by Example : The Final Countdown : NotesCUDA by Example : The Final Countdown : Notes
CUDA by Example : The Final Countdown : Notes
Subhajit Sahu
 

What's hot (20)

PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
 
WT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian Ballantyne
WT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian BallantyneWT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian Ballantyne
WT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian Ballantyne
 
Hadoop + GPU
Hadoop + GPUHadoop + GPU
Hadoop + GPU
 
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin CoumansGS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
 
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoMM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
 
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
 
Final lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbFinal lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tb
 
PT-4056, Harnessing Heterogeneous Systems Using C++ AMP – How the Story is Ev...
PT-4056, Harnessing Heterogeneous Systems Using C++ AMP – How the Story is Ev...PT-4056, Harnessing Heterogeneous Systems Using C++ AMP – How the Story is Ev...
PT-4056, Harnessing Heterogeneous Systems Using C++ AMP – How the Story is Ev...
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
 
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
 
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu FengHC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
 
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
 
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyPT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
 
CE-4028, Miracast with AMD Wireless Display technology – Kickass gaming and o...
CE-4028, Miracast with AMD Wireless Display technology – Kickass gaming and o...CE-4028, Miracast with AMD Wireless Display technology – Kickass gaming and o...
CE-4028, Miracast with AMD Wireless Display technology – Kickass gaming and o...
 
CUDA by Example : The Final Countdown : Notes
CUDA by Example : The Final Countdown : NotesCUDA by Example : The Final Countdown : Notes
CUDA by Example : The Final Countdown : Notes
 
Gpu perf-presentation
Gpu perf-presentationGpu perf-presentation
Gpu perf-presentation
 
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
 

Similar to Pycon2014 GPU computing

IIT ropar_CUDA_Report_Ankita Dewan
IIT ropar_CUDA_Report_Ankita DewanIIT ropar_CUDA_Report_Ankita Dewan
IIT ropar_CUDA_Report_Ankita Dewan
Ankita Dewan
 
IIT ropar_CUDA_Report_Ankita Dewan
IIT ropar_CUDA_Report_Ankita DewanIIT ropar_CUDA_Report_Ankita Dewan
IIT ropar_CUDA_Report_Ankita Dewan
Ankita Dewan
 
“A New, Open-standards-based, Open-source Programming Model for All Accelerat...
“A New, Open-standards-based, Open-source Programming Model for All Accelerat...“A New, Open-standards-based, Open-source Programming Model for All Accelerat...
“A New, Open-standards-based, Open-source Programming Model for All Accelerat...
Edge AI and Vision Alliance
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
theijes
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 

Similar to Pycon2014 GPU computing (20)

GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
 
Cuda lab manual
Cuda lab manualCuda lab manual
Cuda lab manual
 
Cuda
CudaCuda
Cuda
 
CUDA DLI Training Courses at GTC 2019
CUDA DLI Training Courses at GTC 2019CUDA DLI Training Courses at GTC 2019
CUDA DLI Training Courses at GTC 2019
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDA
 
GPU - An Introduction
GPU - An IntroductionGPU - An Introduction
GPU - An Introduction
 
CUDA
CUDACUDA
CUDA
 
NVIDIA GTC 2019: Red Hat and the NVIDIA DGX: Tried, Tested, Trusted
NVIDIA GTC 2019:  Red Hat and the NVIDIA DGX: Tried, Tested, TrustedNVIDIA GTC 2019:  Red Hat and the NVIDIA DGX: Tried, Tested, Trusted
NVIDIA GTC 2019: Red Hat and the NVIDIA DGX: Tried, Tested, Trusted
 
IIT ropar_CUDA_Report_Ankita Dewan
IIT ropar_CUDA_Report_Ankita DewanIIT ropar_CUDA_Report_Ankita Dewan
IIT ropar_CUDA_Report_Ankita Dewan
 
IIT ropar_CUDA_Report_Ankita Dewan
IIT ropar_CUDA_Report_Ankita DewanIIT ropar_CUDA_Report_Ankita Dewan
IIT ropar_CUDA_Report_Ankita Dewan
 
Python и программирование GPU (Ивашкевич Глеб)
Python и программирование GPU (Ивашкевич Глеб)Python и программирование GPU (Ивашкевич Глеб)
Python и программирование GPU (Ивашкевич Глеб)
 
CuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPUCuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPU
 
“A New, Open-standards-based, Open-source Programming Model for All Accelerat...
“A New, Open-standards-based, Open-source Programming Model for All Accelerat...“A New, Open-standards-based, Open-source Programming Model for All Accelerat...
“A New, Open-standards-based, Open-source Programming Model for All Accelerat...
 
Using GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaUsing GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with Java
 
Cuda
CudaCuda
Cuda
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
An exposition of performance comparison of graphic processing unit virtualiza...
An exposition of performance comparison of graphic processing unit virtualiza...An exposition of performance comparison of graphic processing unit virtualiza...
An exposition of performance comparison of graphic processing unit virtualiza...
 
Apu fc & s project
Apu fc & s projectApu fc & s project
Apu fc & s project
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 

Recently uploaded

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 

Recently uploaded (20)

UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 

Pycon2014 GPU computing

  • 1. GPU ACCELERATED HIGH PERFORMANCE COMPUTING PRIMER G RAJA SUMANT ASHOK ASHWIN //early draft copy
  • 2. CONTENTS 1. INTRODUCTION 2.WHY USE GPU? 3.CPU ARCHITECTURE 4.GPU ARCHITECTURE 5.WHY PYTHON FOR GPU 6.HOW GPU ACCELERATION WORKS 7.TECHNOLOGIES AVAILABLE TODAY FOR GPU COMPUTING 8.CUDA+PYTHON 9.PyCuda sample code
  • 3. INTRODUCTION GPU-accelerated computing is the use of a graphics processing unit (GPU) together with a CPU to accelerate scientific, engineering, and enterprise applications. Pioneered in 2007 by NVIDIA, GPUs now power energy-efficient datacenters in government labs, universities, enterprises, and small-and-medium businesses around the world.
  • 4. WHY USE GPU? GPU-accelerated computing offers unprecedented application performance by offloading compute- intensive portions of the application to the GPU, while the remainder of the code still runs on the CPU. From a user's perspective, applications simply run significantly faster.
  • 5. CPU ARCHITECTURE Design target for CPUs: 1)Focus on Task parallelism 2)Make a single thread very fast 3)Hide latency through large caches 4)Predict, speculate
  • 6. GPU ARCHITECTURE The GPU architecture of AMD
  • 9. WHY PYTHON FOR GPU Go to a terminal , type python >> import this read the output
  • 10. WHY PYTHON FOR GPU GPUs are everything that scripting languages are not. >Highly parallel >Very architecture-sensitive >Built for maximum FP/memory throughput >complement each other CPU: largely restricted to control >tasks (1000/sec) >Scripting fast enough >Python + CUDA = PyCUDA >Python + OpenCL = PyOpenCL
  • 11.
  • 12.
  • 14. TECHNOLOGIES AVAILABLE TODAY FOR GPU COMPUTING Open computing language (OpenCL) > Many vendors: AMD, Nvidia, Apple, Intel, IBM... > Standard CPUs may report themselves as OpenCL capable >Works on most devices, but >Implemented feature set and extensions may vary Compute unified device architecture (CUDA) >One vendor: Nvidia (more mature tools) >Better coherence across a limited set of devices
  • 15. CUDA + PYTHON PyCUDA >You still have to write your kernel in CUDA C >. . . but integrates easily with numpy >Higher level than CUDA C, but not much higher >Full CUDA support and performance gnumpy/CUDAMat/cuBLAS >gnumpy: numpy-like wrapper for CUDAMat >CUDAMat: Pre-written kernels and partial cuBLAS wrapper >cuBLAS: (incomplete) CUDA implementation of BLAS
  • 16. PyCUDA sample code >> open helloCUDA.py in editor look and analyse the code
  • 17. CUDAMAT The aim of the cudamat project is to make it easy to perform basic matrix calculations on CUDA-enabled GPUs from Python. cudamat provides a Python matrix class that performs calculations on a GPU. At present, some of the operations the GPU matrix class supports include: Easy conversion to and from instances of numpy.ndarray. Limited slicing support. Matrix multiplication and transpose, Elementwise addition, subtraction, multiplication, and division. open cudamat examples.
  • 18. GNumpy Module gnumpy contains class garray, which behaves much like numpy.ndarray Module gnumpy also contains methods like tile() and rand(), which behave like their numpy counterparts except that they deal with gnumpy.garray instances, instead of numpy.ndarray instances. gnumpy builds on cudamat
  • 19. References ● http://documen.tician.de/pycuda/tutorial.html ● http://on-demand.gputechconf.com/gtc/2010/presentations/S12041-PyCUDA-Simpler-GPU-Programming-Python.pdf ● http://www.tsc.uc3m.es/~miguel/MLG/adjuntos/slidesCUDA.pdf ● http://femhub.com/docs/cuda_en.pdf ● http://www.ieap.uni-kiel.de/et/people/kruse/tutorials/cuda/tutorial01p/web01p/tutorial01p.pdf ● http://conference.scipy.org/static/wiki/scipy09-pycuda-tut.pdf