The GPGPU Continuum

THE GPGPU CONTINUUM

Ofer Rosenberg

The GPU continuum workshop, April 25 2013

CONTENT
• Intel’s Compute Continuum
• GPGPU Evolution
• The GPGPU Continuum
• Mobile GPGPU challenges
• GPGPU Continuum challenges
• Towards the Continuum

INTEL’S “COMPUTE CONTINUUM” FROM IDC 2010

GPGPU EVOLUTION

G80 – 346 GFLOPS

2004 – Stanford University: Brook for GPUs
2006 – AMD releases CTM
NVIDIA releases CUDA
2008 – OpenCL 1.0 released

R580 – 375 GFLOPS

GPGPU EVOLUTION

Nov 2009 - First Hybrid SC in the Top10: Chinese Tianhe-1
1,024 Intel Xeon E5450 CPUs
5,120 Radeon 4870 X2 GPUs
Nov 2010 – First Hybrid SC reaches #1 on Top500 list: Tianhe-1A
14,336 Xeon X5670 CPUs
7,168 Nvidia Tesla M2050 GPUs
Source: http://www.top500.org/lists/

Tianhe-1 : 563 TFLOPS
Tianhe-1A : 2577 TFLOPS

GPGPU EVOLUTION

2013 - OpenCL on : Nexus 4 (Qualcomm Adreno 320)
Nexus 10 (ARM Mali T604)
Android 4.2 adds GPU support for Renderscript
2014 – NVIDIA Tegra 5 will support CUDA

2013 – GPGPU Continuum becomes a reality

THE GPGPU CONTINUUM

Apple A6 GPU
25 GFLOPS
< 2W

AMD G-T16R
46 GFLOPS*
4.5W

Intel i7-3770
511 GFLOPS*
77W

NVIDIA GTX Titan
4500 GFLOPS
250W

ORNL TITAN SC
27 PFLOPS
8200 KW

* GFLOPS of CPU+GPU

Take Intel’s vision on Compute Continuum, and aspire for that on the GPGPU continuum:

A common ecosystem
built on a common (SW) architecture

INTRO TO LEADING MOBILE GPU VENDORS
Imagination PowerVR 543
• Apple, Samsung, Motorola,
Intel
• Unified Shaders
• Supports OpenCL 1.1 (E)
• 38 Gflops (Apple’s MP4 ver)

Vivante CG4000
• Unified Shaders
• 4 Cores, SIMD4 each
• Supports OpenCL 1.2
• 48 Gflops

Qualcomm Adreno 320
• Part of Snapdragon S4
• Unified Shader
• SIMD4 ?
• Supports OpenCL 1.1 (E)
• 50 GFlops

ARM Mali T604
• 4 Cores
• Multiple “pipes” per core
• Supports OpenCL 1.1
• 68 GFlops

NVIDIA Tegra 4
• 6 X 4-wide Vertex shaders
• 4 X 4-wide Pixel Shaders
• No GPGPU support
• 74 GFLOPS
http://kyokojap.myweb.hinet.net/gpu_gflops/

MOBILE GPGPU CHALLENGES
•

Many Different GPU Architectures
• Optimizing for each sets high bar on development costs

•

Development Tools
• Immature (stability, performance)
• No common SDK / Debugger / Profiler (different per vendor)

•

Ecosystem
•

•

Lack of libraries, wizards, middleware  Slow & expensive development

Distribution Model
• Driver updates are part of OS distribution (no more per-month updates…)
• End users are less likely to update version  higher standards on stability &
performance of driver release

•

Security – the unspoken issue (hole) …

GPGPU CONTINUUM CHALLENGES
•

Many Different GPU Architectures
• Optimizing for each sets high bar on development costs

•

Development Tools
• Immature (stability, performance)
• No common SDK / Debugger / Profiler (different per vendor)

•

Ecosystem
•

•

Lack of libraries, wizards, middleware  Slow & expensive development

Distribution Model
• End users are less likely to update version higher standards on stability &
performance of driver release

•

Security – the unspoken issue (hole) …

These challenges are a barrier to GPGPU adoption across the continuum

TOWARDS THE CONTINUUM (1) - LANGUAGES
• Welcome to the GPGPU (SW) jungle …

GPU


OpenCL
Render
Script

GPU

Direct
Compute

CUDA

PyOpenCL

WebCL
Aparapi
(Java)

OpenCL
OpenACC

Render
Script

GPU

Direct
Compute
C++ AMP

CUDA
Fortran
NumbaPro
(Python)

PyOpenCL

WebCL
Aparapi
(Java)

OpenCL
OpenACC

Render
Script

GPU

Direct
Compute
C++ AMP

CUDA
Fortran
NumbaPro
(Python)

A Jungle of languages… but are these the right ones ?

•

Current GPGPU languages are C/C++
based
• There are “binding” to Python, Java,
Javascript – but kernels are still C/C++

•

Current developers trends:
• Managed languages (Java , C#)
• Scripting languages (Python, PHP)

https://sites.google.com/site/pydatalog/pypl/PyPL-PopularitY-ofProgramming-Language

• Higher abstraction & manageability:
• More room for tools to excel on
optimization
• Mitigate difference between GPU
architectures

GPGPU languages need to evolve
Data from CodeEval.com, based on 100K+ code samples

TOWARDS THE CONTINUUM (2) - SOFTWARE STACK

CUDA

LLVM IR

Vendor X IL
Vendor X GPU


OpenCL

LLVM IR

Vendor X IL
Vendor X GPU

CUDA

•

Most GPGPU languages already use
LLVM compilation framework
• Slight “flavors” of LLVM IR

•

Most languages also posses similar
“API capabilities” set

OpenACC
Render
Script

OpenCL

LLVM IR

Vendor X IL
Vendor X GPU

CUDA

•

Most GPGPU languages already use
LLVM compilation framework
• Slight “flavors” of LLVM IR

•

•

Most languages also posses similar
“API capabilities” set
Defining a common stack based on
LLVM & common API will:
• Improve the compiler

OpenACC
Render
Script

OpenCL

LLVM IR

Vendor X IL

• Increase driver quality & stability
• Enable unified debugger / profiler

Vendor X GPU

• …

Define GPGPU Virtual Machine based on LLVM

CUDA

TAKEAWAYS
• GPGPU Continuum is here - from Mobile devices to HPC
• Vision: A common ecosystem built on a common (SW)
architecture

• Challenges: many architectures, immature tools, ecosystem

QUESTIONS
• Q: What about “Heterogeneous Computing” ?
• A: Go back, replace each “GPGPU” with “Heterogeneous
Computing” – and it all fits…

• More ?

SOME SOURCES:
•

http://www.nordichardware.com/CPU-Chipset/intel-core-i7-3770k-ivy-bridge-and-the-3d-transistor-is-here/Newgraphics-the-biggest-news-in-Ivy-Bridge.html

•

http://elrond.informatik.tu-freiberg.de/papers/WorldComp2012/PDP2833.pdf

•

http://www.anandtech.com/show/6787/nvidia-tegra-4-architecture-deep-dive-plus-tegra-4i-phoenix-hands-on/5

•

http://www.anandtech.com/show/5077/arms-malit658-gpu-in-2013-up-to-10x-faster-than-mali400

•

http://www.chipdesignmag.com/pallab/2011/06/30/arm-mali-gpu-unifying-graphics-across-platforms/

•

http://en.wikipedia.org/wiki/Adreno#Renaming_to_Adreno

•

http://en.wikipedia.org/wiki/PowerVR#Series_5_.28SGX.29

•

http://en.wikipedia.org/wiki/Mali_(GPU)

•

http://johndayautomotivelectronics.com/?p=12412

•

http://www.cnx-software.com/2013/01/19/gpus-comparison-arm-mali-vs-vivante-gcxxx-vs-powervr-sgx-vs-nvidiageforce-ulp/

•

http://www.brightsideofnews.com/print/2013/1/30/rise-of-vivante-fastest-tablet-gpu-on-the-market.aspx

•

https://www.uplinq.com/2012/schedule/accelerating-your-android-application-renderscript-and-llvm-0

•

http://www.androidauthority.com/adreno-320-features-performance-benchmarks-103269/

The GPGPU Continuum

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (6)

Ähnlich wie The GPGPU Continuum

Ähnlich wie The GPGPU Continuum (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

The GPGPU Continuum