This presentation describes the components of GPU ecosystem for compute, provides overview of existing ecosystems, and contains a case study on NVIDIA Nsight
2. Page 2
Content
GPU Ecosystem
Ecosystem on Mobile/Embedded Platforms
NSIGHT - Tools case study
Libraries
3. Page 3
Product
GPU Ecosystem
Software Product Development cycle:
The GPU Ecosystem role is to support, speedup, and
improve this cycle for GPU Compute
Design
Write
Code
Debug
Profile
4. Page 4
GPU Ecosystem
Support writing code by:
IDE integration – Compiler, Parser, Wizards
Libraries: Math (BLAS, IPP-like, Matrix, etc.),
STL-like (Thrust, BOLT)
Support Debugging by:
IDE integration of the debugger (preferred)
Provide usable execution control (breakpoints, pause/resume, etc.)
Providing reliable memory view of various address spaces
Support Profiling by:
Provide two levels of profiling: System Tracing and Kernel Profiling
System Tracing - quick highlighting of hotspots and device optimal access
Statistical and TimeLine-based Kernel Profiling (using perf. counters)
Design
Write
Code
Debug
Profile
6. Page 6
ARM MALI
Part of ARM SoC
OpenCL 1.1Full Profile (Linux, Android)
Renderscript (Android only)
OpenCL SDK – Samples, Tutorials, etc.
No GPU debugging capability
ARM DS-5 (Developer Suite 5)
Eclipse IDE integration
Compiler, Debugger (CPU only)
System Trace – CPU & GPU
Deep Profiling - CPU & GPU
7. Page 7
Intel Haswell GPU
Part of Haswell (CPU & GPU)
OpenCL 1.2 Full Profile
Windows only for now (Linux @ alpha stage)
OpenCL SDK
Samples
Tools: Kernel Builder, VS/Eclipse Integration, Offline Compiler, GDB support (CPU Only)
No GPU debugging capability
VTune Amplifier XE supports OpenCL (CPU & GPU)
System level tracing (Application, Memory, Kernel launch)
Kernel Profiling
8. Page 8
Intel BayTrail platform (Atom)
BayTrail < 13W, BayTrail-M < 6.5W
Vallyview SoC (Z37xx)
GPU is based on Gen7 (same arch as IvyBridge)
Same as previous slide:
OpenCL 1.2 (windows only for now)
OpenCL SDK
VTune support
System level tracing
Kernel Profiling
9. Page 9
NVIDIA Tegra 5 ? (Codename: Logan)
Disclaimer: Logan is due early 2014. Part of the information is speculations
Development Boards and Samples available to selected customers
Logan SoC – 2W
ARM CPU A15 4+1 :speculated
Kepler based GPU : verified
CUDA Support : verified
CUDA SDK – Dozens of samples
CUDA Libraries: Thrust, cuBLAS, cuNVPP, etc.
NSIGHT : speculated
System Trace
Profiling, Debugging
11. Page 11
Nsight Highlights
“NVIDIA® Nsight™ is the ultimate development platform for heterogeneous
computing”
( Taken from Nsight page )
IDE integration
Windows – integration with Visual Studio
Linux – specialized Eclipse version
Debugging , System Trace , Profiling
Graphics (DX, OpenGL)
Computing (OpenCL, CUDA, C++ AMP)
Profiling only on CUDA kernels
Debug/Trace/Profile Information is highly shaped
Highly efficient information fields, windows, diagrams
Feedback from professional users is noticed
12. Page 12
Debugging
Much more than “just integrated” with the IDE
Shaped windows showing valuable info
Assembly (GPU!)
Variables across
all warpsVisible layout of the stopped thread
13. Page 13
Debugging – Eclipse edition
Seems that Eclipse integration is deeper than Visual Studio
Unified CPU / GPU Debugging
Simultaneous visibility into both CPU and GPU state
Multi-GPU support
Slides from: “CUDA Development Using NVIDIA Nsight,
Eclipse Edition” by David Goodwin, SC12
Full GPU debugging
Set kernel breakpoints
Single-step, run until, etc.
View values across multiple GPU
threads at the same time
Examine thread, warp, block state
Source and assembly level debugging
15. Page 15
Kernel Profiling
Choose a kernel to profile
Skip N kernels, Profile M kernels
Choose “experiments”
Experiment - Types of profiling/analysis
NVIDIA runs each kernel launch dozens of times with the same data
16. Page 16
Profiling Results
Experiment list
Each experiment is a tabbed window
Profiling information is shaped in graphs,
pie charts, diagrams, etc.
Taking HW counters and shaping them to easy-
to-understand graphics
Information targets known HW bottlenecks, Code
inefficiencies, etc.
Amazingly shaped…
17. Page 17
Profiling Results
The information provides a quick & easy methodic way to identify the performance
bottlenecks
1 2
3 4
18. Page 18
Eclipse Edition - Source Code Editor
Project Templates
CUDA code highlighting
CUDA aware refactoring
CUDA aware code completion and inline help
20. Page 20
CUDA Libraries – Part of the SDK
cuFFT
cuBLAS
cuRAND
cuSPARSE
NPP (like IPP)
Math Library
Thrust (next slide)
21. Page 21
Thrust Library
https://developer.nvidia.com/thrust
Works on top of CUDA
Open-source version is available at github
http://thrust.github.io/
Presentations:
http://on-demand.gputechconf.com/gtc-
express/2011/presentations/introductiontothrust.pdf