This is a presentation I gave on last GPGPU workshop we did on April 2013.
The usage of GPGPU is expanding, and creates a continuum from Mobile to HPC. At the same time, question is whether the GPGPU languages are the right ones (well, no) and aren't we wasting resources on re-developing the same SW stack instead of converging.
6. GPGPU EVOLUTION
Nov 2009 - First Hybrid SC in the Top10: Chinese Tianhe-1
1,024 Intel Xeon E5450 CPUs
5,120 Radeon 4870 X2 GPUs
Nov 2010 â First Hybrid SC reaches #1 on Top500 list: Tianhe-1A
14,336 Xeon X5670 CPUs
7,168 Nvidia Tesla M2050 GPUs
Source: http://www.top500.org/lists/
Tianhe-1 : 563 TFLOPS
Tianhe-1A : 2577 TFLOPS
7. GPGPU EVOLUTION
2013 - OpenCL on : Nexus 4 (Qualcomm Adreno 320)
Nexus 10 (ARM Mali T604)
Android 4.2 adds GPU support for Renderscript
2014 â NVIDIA Tegra 5 will support CUDA
2013 â GPGPU Continuum becomes a reality
8. THE GPGPU CONTINUUM
Apple A6 GPU
25 GFLOPS
< 2W
AMD G-T16R
46 GFLOPS*
4.5W
Intel i7-3770
511 GFLOPS*
77W
NVIDIA GTX Titan
4500 GFLOPS
250W
ORNL TITAN SC
27 PFLOPS
8200 KW
* GFLOPS of CPU+GPU
Take Intelâs vision on Compute Continuum, and aspire for that on the GPGPU continuum:
A common ecosystem
built on a common (SW) architecture
9. INTRO TO LEADING MOBILE GPU VENDORS
Imagination PowerVR 543
⢠Apple, Samsung, Motorola,
Intel
⢠Unified Shaders
⢠Supports OpenCL 1.1 (E)
⢠38 Gflops (Appleâs MP4 ver)
Vivante CG4000
⢠Unified Shaders
⢠4 Cores, SIMD4 each
⢠Supports OpenCL 1.2
⢠48 Gflops
Qualcomm Adreno 320
⢠Part of Snapdragon S4
⢠Unified Shader
⢠SIMD4 ?
⢠Supports OpenCL 1.1 (E)
⢠50 GFlops
ARM Mali T604
⢠4 Cores
⢠Multiple âpipesâ per core
⢠Supports OpenCL 1.1
⢠68 GFlops
NVIDIA Tegra 4
⢠6 X 4-wide Vertex shaders
⢠4 X 4-wide Pixel Shaders
⢠No GPGPU support
⢠74 GFLOPS
http://kyokojap.myweb.hinet.net/gpu_gflops/
10. MOBILE GPGPU CHALLENGES
â˘
Many Different GPU Architectures
⢠Optimizing for each sets high bar on development costs
â˘
Development Tools
⢠Immature (stability, performance)
⢠No common SDK / Debugger / Profiler (different per vendor)
â˘
Ecosystem
â˘
â˘
Lack of libraries, wizards, middleware ď Slow & expensive development
Distribution Model
⢠Driver updates are part of OS distribution (no more per-month updatesâŚ)
⢠End users are less likely to update version ď higher standards on stability &
performance of driver release
â˘
Security â the unspoken issue (hole) âŚ
11. GPGPU CONTINUUM CHALLENGES
â˘
Many Different GPU Architectures
⢠Optimizing for each sets high bar on development costs
â˘
Development Tools
⢠Immature (stability, performance)
⢠No common SDK / Debugger / Profiler (different per vendor)
â˘
Ecosystem
â˘
â˘
Lack of libraries, wizards, middleware ď Slow & expensive development
Distribution Model
⢠End users are less likely to update versionď higher standards on stability &
performance of driver release
â˘
Security â the unspoken issue (hole) âŚ
These challenges are a barrier to GPGPU adoption across the continuum
13. TOWARDS THE CONTINUUM (1) - LANGUAGES
⢠Welcome to the GPGPU (SW) jungle âŚ
OpenCL
Render
Script
GPU
Direct
Compute
CUDA
14. TOWARDS THE CONTINUUM (1) - LANGUAGES
⢠Welcome to the GPGPU (SW) jungle âŚ
PyOpenCL
WebCL
Aparapi
(Java)
OpenCL
OpenACC
Render
Script
GPU
Direct
Compute
C++ AMP
CUDA
Fortran
NumbaPro
(Python)
15. TOWARDS THE CONTINUUM (1) - LANGUAGES
⢠Welcome to the GPGPU (SW) jungle âŚ
PyOpenCL
WebCL
Aparapi
(Java)
OpenCL
OpenACC
Render
Script
GPU
Direct
Compute
C++ AMP
CUDA
Fortran
NumbaPro
(Python)
A Jungle of languages⌠but are these the right ones ?
16. TOWARDS THE CONTINUUM (1) - LANGUAGES
â˘
Current GPGPU languages are C/C++
based
⢠There are âbindingâ to Python, Java,
Javascript â but kernels are still C/C++
â˘
Current developers trends:
⢠Managed languages (Java , C#)
⢠Scripting languages (Python, PHP)
https://sites.google.com/site/pydatalog/pypl/PyPL-PopularitY-ofProgramming-Language
⢠Higher abstraction & manageability:
⢠More room for tools to excel on
optimization
⢠Mitigate difference between GPU
architectures
GPGPU languages need to evolve
Data from CodeEval.com, based on 100K+ code samples
18. TOWARDS THE CONTINUUM (2) - SOFTWARE STACK
OpenCL
LLVM IR
Vendor X IL
Vendor X GPU
CUDA
19. TOWARDS THE CONTINUUM (2) - SOFTWARE STACK
â˘
Most GPGPU languages already use
LLVM compilation framework
⢠Slight âflavorsâ of LLVM IR
â˘
Most languages also posses similar
âAPI capabilitiesâ set
OpenACC
Render
Script
OpenCL
LLVM IR
Vendor X IL
Vendor X GPU
CUDA
20. TOWARDS THE CONTINUUM (2) - SOFTWARE STACK
â˘
Most GPGPU languages already use
LLVM compilation framework
⢠Slight âflavorsâ of LLVM IR
â˘
â˘
Most languages also posses similar
âAPI capabilitiesâ set
Defining a common stack based on
LLVM & common API will:
⢠Improve the compiler
OpenACC
Render
Script
OpenCL
LLVM IR
Vendor X IL
⢠Increase driver quality & stability
⢠Enable unified debugger / profiler
Vendor X GPU
⢠âŚ
Define GPGPU Virtual Machine based on LLVM
CUDA
21. TAKEAWAYS
⢠GPGPU Continuum is here - from Mobile devices to HPC
⢠Vision: A common ecosystem built on a common (SW)
architecture
⢠Challenges: many architectures, immature tools, ecosystem
22. QUESTIONS
⢠Q: What about âHeterogeneous Computingâ ?
⢠A: Go back, replace each âGPGPUâ with âHeterogeneous
Computingâ â and it all fitsâŚ
⢠More ?