2. CONTENTS
1. INTRODUCTION
2.WHY USE GPU?
3.CPU ARCHITECTURE
4.GPU ARCHITECTURE
5.WHY PYTHON FOR GPU
6.HOW GPU ACCELERATION WORKS
7.TECHNOLOGIES AVAILABLE TODAY FOR GPU COMPUTING
8.CUDA+PYTHON
9.PyCuda sample code
3. INTRODUCTION
GPU-accelerated computing is the use of a graphics
processing unit (GPU) together with a CPU to
accelerate scientific, engineering, and enterprise
applications. Pioneered in 2007 by NVIDIA, GPUs now
power energy-efficient datacenters in government labs,
universities, enterprises, and small-and-medium
businesses around the world.
4. WHY USE GPU?
GPU-accelerated computing offers unprecedented
application performance by offloading compute-
intensive portions of the application to the GPU, while
the remainder of the code still runs on the CPU. From a
user's perspective, applications simply run significantly
faster.
5. CPU ARCHITECTURE
Design target for CPUs:
1)Focus on Task parallelism
2)Make a single thread very fast
3)Hide latency through large caches
4)Predict, speculate
9. WHY PYTHON FOR GPU
Go to a terminal , type python
>> import this
read the output
10. WHY PYTHON FOR GPU
GPUs are everything that scripting languages are not.
>Highly parallel
>Very architecture-sensitive
>Built for maximum FP/memory throughput
>complement each other
CPU: largely restricted to control
>tasks (1000/sec)
>Scripting fast enough
>Python + CUDA = PyCUDA
>Python + OpenCL = PyOpenCL
14. TECHNOLOGIES AVAILABLE
TODAY FOR GPU COMPUTING
Open computing language (OpenCL)
> Many vendors: AMD, Nvidia, Apple, Intel, IBM...
> Standard CPUs may report themselves as OpenCL capable
>Works on most devices, but
>Implemented feature set and extensions may vary
Compute unified device architecture (CUDA)
>One vendor: Nvidia (more mature tools)
>Better coherence across a limited set of devices
15. CUDA + PYTHON
PyCUDA
>You still have to write your kernel in CUDA C
>. . . but integrates easily with numpy
>Higher level than CUDA C, but not much higher
>Full CUDA support and performance
gnumpy/CUDAMat/cuBLAS
>gnumpy: numpy-like wrapper for CUDAMat
>CUDAMat: Pre-written kernels and partial cuBLAS wrapper
>cuBLAS: (incomplete) CUDA implementation of BLAS
17. CUDAMAT
The aim of the cudamat project is to make it easy to perform basic matrix
calculations on CUDA-enabled GPUs from Python. cudamat provides a
Python matrix class that performs calculations on a GPU. At present, some
of the operations the GPU matrix class supports include: Easy conversion to
and from instances of numpy.ndarray. Limited slicing support. Matrix
multiplication and transpose, Elementwise addition, subtraction,
multiplication, and division.
open cudamat examples.
18. GNumpy
Module gnumpy contains class garray, which behaves
much like numpy.ndarray
Module gnumpy also contains methods like tile() and
rand(), which behave like their numpy counterparts
except that they deal with gnumpy.garray instances,
instead of numpy.ndarray instances.
gnumpy builds on cudamat