SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
Page 1
GPU Ecosystem
Introduction & Case Study
Ofer Rosenberg
October 2013
Page 2
Content
 GPU Ecosystem
 Ecosystem on Mobile/Embedded Platforms
 NSIGHT - Tools case study
 Libraries
Page 3
Product
GPU Ecosystem
Software Product Development cycle:
The GPU Ecosystem role is to support, speedup, and
improve this cycle for GPU Compute
Design
Write
Code
Debug
Profile
Page 4
GPU Ecosystem
 Support writing code by:
 IDE integration – Compiler, Parser, Wizards
 Libraries: Math (BLAS, IPP-like, Matrix, etc.),
STL-like (Thrust, BOLT)
 Support Debugging by:
 IDE integration of the debugger (preferred)
 Provide usable execution control (breakpoints, pause/resume, etc.)
 Providing reliable memory view of various address spaces
 Support Profiling by:
 Provide two levels of profiling: System Tracing and Kernel Profiling
 System Tracing - quick highlighting of hotspots and device optimal access
 Statistical and TimeLine-based Kernel Profiling (using perf. counters)
Design
Write
Code
Debug
Profile
Page 5
Ecosystem on
Mobile/Embedded Platforms
Page 6
ARM MALI
 Part of ARM SoC
 OpenCL 1.1Full Profile (Linux, Android)
 Renderscript (Android only)
 OpenCL SDK – Samples, Tutorials, etc.
 No GPU debugging capability
 ARM DS-5 (Developer Suite 5)
 Eclipse IDE integration
 Compiler, Debugger (CPU only)
 System Trace – CPU & GPU
 Deep Profiling - CPU & GPU
Page 7
Intel Haswell GPU
 Part of Haswell (CPU & GPU)
 OpenCL 1.2 Full Profile
 Windows only for now (Linux @ alpha stage)
 OpenCL SDK
 Samples
 Tools: Kernel Builder, VS/Eclipse Integration, Offline Compiler, GDB support (CPU Only)
 No GPU debugging capability
 VTune Amplifier XE supports OpenCL (CPU & GPU)
 System level tracing (Application, Memory, Kernel launch)
 Kernel Profiling
Page 8
Intel BayTrail platform (Atom)
 BayTrail < 13W, BayTrail-M < 6.5W
 Vallyview SoC (Z37xx)
 GPU is based on Gen7 (same arch as IvyBridge)
 Same as previous slide:
 OpenCL 1.2 (windows only for now)
 OpenCL SDK
 VTune support
 System level tracing
 Kernel Profiling
Page 9
NVIDIA Tegra 5 ? (Codename: Logan)
 Disclaimer: Logan is due early 2014. Part of the information is speculations
 Development Boards and Samples available to selected customers
 Logan SoC – 2W
 ARM CPU A15 4+1 :speculated
 Kepler based GPU : verified
 CUDA Support : verified
 CUDA SDK – Dozens of samples
 CUDA Libraries: Thrust, cuBLAS, cuNVPP, etc.
 NSIGHT : speculated
 System Trace
 Profiling, Debugging
Page 10
NSIGHT
TOOLS CASE STUDY
Design
Write
Code
Debug
Profile
Page 11
Nsight Highlights
 “NVIDIA® Nsight™ is the ultimate development platform for heterogeneous
computing”
( Taken from Nsight page )
 IDE integration
 Windows – integration with Visual Studio
 Linux – specialized Eclipse version
 Debugging , System Trace , Profiling
 Graphics (DX, OpenGL)
 Computing (OpenCL, CUDA, C++ AMP)
 Profiling only on CUDA kernels
 Debug/Trace/Profile Information is highly shaped
 Highly efficient information fields, windows, diagrams
 Feedback from professional users is noticed
Page 12
Debugging
 Much more than “just integrated” with the IDE
 Shaped windows showing valuable info
Assembly (GPU!)
Variables across
all warpsVisible layout of the stopped thread
Page 13
Debugging – Eclipse edition
 Seems that Eclipse integration is deeper than Visual Studio
 Unified CPU / GPU Debugging
 Simultaneous visibility into both CPU and GPU state
 Multi-GPU support
Slides from: “CUDA Development Using NVIDIA Nsight,
Eclipse Edition” by David Goodwin, SC12
 Full GPU debugging
 Set kernel breakpoints
 Single-step, run until, etc.
 View values across multiple GPU
threads at the same time
 Examine thread, warp, block state
 Source and assembly level debugging
Page 14
System Trace
Page 15
Kernel Profiling
 Choose a kernel to profile
 Skip N kernels, Profile M kernels
 Choose “experiments”
 Experiment - Types of profiling/analysis
 NVIDIA runs each kernel launch dozens of times with the same data
Page 16
Profiling Results
 Experiment list
 Each experiment is a tabbed window
 Profiling information is shaped in graphs,
pie charts, diagrams, etc.
 Taking HW counters and shaping them to easy-
to-understand graphics
 Information targets known HW bottlenecks, Code
inefficiencies, etc.
 Amazingly shaped…
Page 17
Profiling Results
 The information provides a quick & easy methodic way to identify the performance
bottlenecks
1 2
3 4
Page 18
Eclipse Edition - Source Code Editor
 Project Templates
 CUDA code highlighting
 CUDA aware refactoring
 CUDA aware code completion and inline help
Page 19
LIBRARIES EXAMPLES
Page 20
CUDA Libraries – Part of the SDK
 cuFFT
 cuBLAS
 cuRAND
 cuSPARSE
 NPP (like IPP)
 Math Library
 Thrust (next slide)
Page 21
Thrust Library
 https://developer.nvidia.com/thrust
 Works on top of CUDA
 Open-source version is available at github
 http://thrust.github.io/
 Presentations:
 http://on-demand.gputechconf.com/gtc-
express/2011/presentations/introductiontothrust.pdf
Page 22
OPENCL LIBRARIES
Page 23
CLPP
 OpenCL Data Parallel Primitives Library (similar to thrust)
 Source : https://code.google.com/p/clpp/
 7 committers, last commit 1.5Y ago
Page 24
OpenCL BLAS
 OpenCL BLAS
 http://openclblas.sourceforge.net/
 Code is available here (GPLv2):
 http://sourceforge.net/projects/openclblas/
Page 25
ViennaCL
 BLAS implementation
 http://viennacl.sourceforge.net/
 Looks very promising
Page 26
REFERENCES
Page 27
Platform links:
 ARM
 Developer site : http://malideveloper.arm.com
 OpenCL tracing : http://malideveloper.arm.com/develop-for-mali/tools/mali-graphics-debugger/
 DS-5 suite : http://www.arm.com/products/tools/software-tools/ds-5/index.php
 OpenCL SDK : http://malideveloper.arm.com/develop-for-mali/sdks/mali-opencl-sdk/
 OpenCL developer guide:
 Online: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0538e/index.html
 PDF: http://infocenter.arm.com/help/topic/com.arm.doc.dui0538e/DUI0538E_mali_t600_opencl_dg.pdf
 NVIDIA
 http://www.anandtech.com/show/7169/nvidia-demonstrates-logan-soc-mobile-kepler
 http://www.slashgear.com/nvidia-tegra-logan-detailed-with-game-changing-cuda-integration-19274630/
 http://www.ubergizmo.com/2013/07/nvidia-tegra-5-release-date-specs-news/
Page 28
Links:
 Intel
 OpenCL sdk http://software.intel.com/en-us/vcsource/tools/opencl-sdk
 GPA http://software.intel.com/en-us/vcsource/tools/intel-gpa
 vTune support in OpenCL http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe-getting-started-with-opencl-
performance-analysis-on-intel-hd-graphics
 http://www.theinquirer.net/inquirer/news/2266966/intel-releases-opencl-sdk-for-windows-and-linux
 Haswell Linux support: http://www.phoronix.com/scan.php?page=news_item&px=MTA3NDc
 OpenCL “Beignet” – open source linux compiler :
 http://software.intel.com/en-us/forums/topic/402118
 http://linux.slashdot.org/story/13/04/16/014233/intel-releases-new-opencl-implementation-for-gnulinux
 ATOM BayTrail:
 http://arstechnica.com/gadgets/2013/02/intel-gets-aggressive-with-new-smartphone-and-tablet-chips/
 http://www.anandtech.com/show/7314/intel-baytrail-preview-intel-atom-z3770-tested
 http://www.tomshardware.com/reviews/bay-trail-celeron-j1750-performance,3614-6.html
 http://software.intel.com/en-us/forums/topic/476221
 http://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors#.22Bay_Trail.22_.2822_nm.29
Page 29
NSIGHT Links
 http://www.nvidia.com/object/nsight.html
 https://developer.nvidia.com/nsight-visual-studio-edition-videos
 https://developer.nvidia.com/developer-webinars
 http://on-demand.gputechconf.com/supercomputing/2012/presentation/SB006-Goodwin-
CUDA-Development-Nsight.pdf
 http://on-demand.gputechconf.com/gtc/2013/presentations/S3011-CUDA-Optimization-
With-Nsight-VSE.pdf

Weitere ähnliche Inhalte

Was ist angesagt?

MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...AMD Developer Central
 
LCE13: Android Graphics Upstreaming
LCE13: Android Graphics UpstreamingLCE13: Android Graphics Upstreaming
LCE13: Android Graphics UpstreamingLinaro
 
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...AMD Developer Central
 
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerPL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerAMD Developer Central
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornAMD Developer Central
 
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...AMD Developer Central
 
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...AMD Developer Central
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.J On The Beach
 
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...AMD Developer Central
 
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...AMD Developer Central
 
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin CoumansGS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin CoumansAMD Developer Central
 
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...AMD Developer Central
 
Getting Space Pirate Trainer* to Perform on Intel® Graphics
Getting Space Pirate Trainer* to Perform on Intel® GraphicsGetting Space Pirate Trainer* to Perform on Intel® Graphics
Getting Space Pirate Trainer* to Perform on Intel® GraphicsIntel® Software
 
Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wallugur candan
 
Ostech war story using mainline linux for an android tv bsp
Ostech  war story  using mainline linux  for an android tv bspOstech  war story  using mainline linux  for an android tv bsp
Ostech war story using mainline linux for an android tv bspNeil Armstrong
 
GPU power consumption and performance trends
GPU power consumption and performance trendsGPU power consumption and performance trends
GPU power consumption and performance trendsAlessio Villardita
 

Was ist angesagt? (20)

MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
 
LCE13: Android Graphics Upstreaming
LCE13: Android Graphics UpstreamingLCE13: Android Graphics Upstreaming
LCE13: Android Graphics Upstreaming
 
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
 
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerPL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
 
Hands on OpenCL
Hands on OpenCLHands on OpenCL
Hands on OpenCL
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
PostgreSQL with OpenCL
PostgreSQL with OpenCLPostgreSQL with OpenCL
PostgreSQL with OpenCL
 
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
 
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
 
Cuda
CudaCuda
Cuda
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
 
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
 
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin CoumansGS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
 
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
 
Getting Space Pirate Trainer* to Perform on Intel® Graphics
Getting Space Pirate Trainer* to Perform on Intel® GraphicsGetting Space Pirate Trainer* to Perform on Intel® Graphics
Getting Space Pirate Trainer* to Perform on Intel® Graphics
 
Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wall
 
Ostech war story using mainline linux for an android tv bsp
Ostech  war story  using mainline linux  for an android tv bspOstech  war story  using mainline linux  for an android tv bsp
Ostech war story using mainline linux for an android tv bsp
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
GPU power consumption and performance trends
GPU power consumption and performance trendsGPU power consumption and performance trends
GPU power consumption and performance trends
 

Andere mochten auch

GPUs in Big Data - StampedeCon 2014
GPUs in Big Data - StampedeCon 2014GPUs in Big Data - StampedeCon 2014
GPUs in Big Data - StampedeCon 2014StampedeCon
 
Accelerating Machine Learning Applications on Spark Using GPUs
Accelerating Machine Learning Applications on Spark Using GPUsAccelerating Machine Learning Applications on Spark Using GPUs
Accelerating Machine Learning Applications on Spark Using GPUsIBM
 
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingSIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingMark Kilgard
 
Computational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in RComputational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in Rherbps10
 
GTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path RenderingGTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path Rendering Mark Kilgard
 
Deep learning on spark
Deep learning on sparkDeep learning on spark
Deep learning on sparkSatyendra Rana
 
PG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrPG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrKohei KaiGai
 
Enabling Graph Analytics at Scale: The Opportunity for GPU-Acceleration of D...
Enabling Graph Analytics at Scale:  The Opportunity for GPU-Acceleration of D...Enabling Graph Analytics at Scale:  The Opportunity for GPU-Acceleration of D...
Enabling Graph Analytics at Scale: The Opportunity for GPU-Acceleration of D...odsc
 
Heterogeneous System Architecture Overview
Heterogeneous System Architecture OverviewHeterogeneous System Architecture Overview
Heterogeneous System Architecture Overviewinside-BigData.com
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015Kohei KaiGai
 
PyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScalePyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScaleGoDataDriven
 
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...Spark Summit
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillotsparktc
 
How to Solve Real-Time Data Problems
How to Solve Real-Time Data ProblemsHow to Solve Real-Time Data Problems
How to Solve Real-Time Data ProblemsIBM Power Systems
 
Containerizing GPU Applications with Docker for Scaling to the Cloud
Containerizing GPU Applications with Docker for Scaling to the CloudContainerizing GPU Applications with Docker for Scaling to the Cloud
Containerizing GPU Applications with Docker for Scaling to the CloudSubbu Rama
 
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...Chris Fregly
 
The Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in SparkThe Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in SparkSpark Summit
 
Spark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim HunterSpark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim HunterSpark Summit
 

Andere mochten auch (20)

GPUs in Big Data - StampedeCon 2014
GPUs in Big Data - StampedeCon 2014GPUs in Big Data - StampedeCon 2014
GPUs in Big Data - StampedeCon 2014
 
Accelerating Machine Learning Applications on Spark Using GPUs
Accelerating Machine Learning Applications on Spark Using GPUsAccelerating Machine Learning Applications on Spark Using GPUs
Accelerating Machine Learning Applications on Spark Using GPUs
 
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingSIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
 
Computational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in RComputational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in R
 
GTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path RenderingGTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path Rendering
 
Deep learning on spark
Deep learning on sparkDeep learning on spark
Deep learning on spark
 
PG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrPG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated Asyncr
 
Enabling Graph Analytics at Scale: The Opportunity for GPU-Acceleration of D...
Enabling Graph Analytics at Scale:  The Opportunity for GPU-Acceleration of D...Enabling Graph Analytics at Scale:  The Opportunity for GPU-Acceleration of D...
Enabling Graph Analytics at Scale: The Opportunity for GPU-Acceleration of D...
 
Heterogeneous System Architecture Overview
Heterogeneous System Architecture OverviewHeterogeneous System Architecture Overview
Heterogeneous System Architecture Overview
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
 
PyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScalePyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at Scale
 
Deep Learning on Hadoop
Deep Learning on HadoopDeep Learning on Hadoop
Deep Learning on Hadoop
 
Hadoop + GPU
Hadoop + GPUHadoop + GPU
Hadoop + GPU
 
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillot
 
How to Solve Real-Time Data Problems
How to Solve Real-Time Data ProblemsHow to Solve Real-Time Data Problems
How to Solve Real-Time Data Problems
 
Containerizing GPU Applications with Docker for Scaling to the Cloud
Containerizing GPU Applications with Docker for Scaling to the CloudContainerizing GPU Applications with Docker for Scaling to the Cloud
Containerizing GPU Applications with Docker for Scaling to the Cloud
 
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
 
The Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in SparkThe Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in Spark
 
Spark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim HunterSpark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim Hunter
 

Ähnlich wie GPU Ecosystem

PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyPT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyAMD Developer Central
 
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono..."The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...Edge AI and Vision Alliance
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018NVIDIA
 
gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsnARUNACHALAM468781
 
Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Intel® Software
 
Lab Handson: Power your Creations with Intel Edison!
Lab Handson: Power your Creations with Intel Edison!Lab Handson: Power your Creations with Intel Edison!
Lab Handson: Power your Creations with Intel Edison!Codemotion
 
Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaKazuaki Ishizaki
 
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Stefano Di Carlo
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio Owen Wu
 
Performance and Power Profiling on Intel Android Devices
Performance and Power Profiling on Intel Android DevicesPerformance and Power Profiling on Intel Android Devices
Performance and Power Profiling on Intel Android DevicesIntel® Software
 
Making Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseMaking Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseKazuaki Ishizaki
 
[03 2][gpu용 개발자 도구 - parallel nsight 및 axe] gateau parallel-nsight
[03 2][gpu용 개발자 도구 - parallel nsight 및 axe] gateau parallel-nsight[03 2][gpu용 개발자 도구 - parallel nsight 및 axe] gateau parallel-nsight
[03 2][gpu용 개발자 도구 - parallel nsight 및 axe] gateau parallel-nsightlaparuma
 
Compute API –Past & Future
Compute API –Past & FutureCompute API –Past & Future
Compute API –Past & FutureOfer Rosenberg
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0Ganesan Narayanasamy
 

Ähnlich wie GPU Ecosystem (20)

PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyPT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
 
DRIVE PX 2
DRIVE PX 2DRIVE PX 2
DRIVE PX 2
 
GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
 
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono..."The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
 
Choosing the right processor
Choosing the right processorChoosing the right processor
Choosing the right processor
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018
 
gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsn
 
Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)
 
Lab Handson: Power your Creations with Intel Edison!
Lab Handson: Power your Creations with Intel Edison!Lab Handson: Power your Creations with Intel Edison!
Lab Handson: Power your Creations with Intel Edison!
 
Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for Java
 
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
 
Performance and Power Profiling on Intel Android Devices
Performance and Power Profiling on Intel Android DevicesPerformance and Power Profiling on Intel Android Devices
Performance and Power Profiling on Intel Android Devices
 
Resume
ResumeResume
Resume
 
Making Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseMaking Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to Use
 
[03 2][gpu용 개발자 도구 - parallel nsight 및 axe] gateau parallel-nsight
[03 2][gpu용 개발자 도구 - parallel nsight 및 axe] gateau parallel-nsight[03 2][gpu용 개발자 도구 - parallel nsight 및 axe] gateau parallel-nsight
[03 2][gpu용 개발자 도구 - parallel nsight 및 axe] gateau parallel-nsight
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
Cuda materials
Cuda materialsCuda materials
Cuda materials
 
Compute API –Past & Future
Compute API –Past & FutureCompute API –Past & Future
Compute API –Past & Future
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0
 

Kürzlich hochgeladen

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

GPU Ecosystem

  • 1. Page 1 GPU Ecosystem Introduction & Case Study Ofer Rosenberg October 2013
  • 2. Page 2 Content  GPU Ecosystem  Ecosystem on Mobile/Embedded Platforms  NSIGHT - Tools case study  Libraries
  • 3. Page 3 Product GPU Ecosystem Software Product Development cycle: The GPU Ecosystem role is to support, speedup, and improve this cycle for GPU Compute Design Write Code Debug Profile
  • 4. Page 4 GPU Ecosystem  Support writing code by:  IDE integration – Compiler, Parser, Wizards  Libraries: Math (BLAS, IPP-like, Matrix, etc.), STL-like (Thrust, BOLT)  Support Debugging by:  IDE integration of the debugger (preferred)  Provide usable execution control (breakpoints, pause/resume, etc.)  Providing reliable memory view of various address spaces  Support Profiling by:  Provide two levels of profiling: System Tracing and Kernel Profiling  System Tracing - quick highlighting of hotspots and device optimal access  Statistical and TimeLine-based Kernel Profiling (using perf. counters) Design Write Code Debug Profile
  • 6. Page 6 ARM MALI  Part of ARM SoC  OpenCL 1.1Full Profile (Linux, Android)  Renderscript (Android only)  OpenCL SDK – Samples, Tutorials, etc.  No GPU debugging capability  ARM DS-5 (Developer Suite 5)  Eclipse IDE integration  Compiler, Debugger (CPU only)  System Trace – CPU & GPU  Deep Profiling - CPU & GPU
  • 7. Page 7 Intel Haswell GPU  Part of Haswell (CPU & GPU)  OpenCL 1.2 Full Profile  Windows only for now (Linux @ alpha stage)  OpenCL SDK  Samples  Tools: Kernel Builder, VS/Eclipse Integration, Offline Compiler, GDB support (CPU Only)  No GPU debugging capability  VTune Amplifier XE supports OpenCL (CPU & GPU)  System level tracing (Application, Memory, Kernel launch)  Kernel Profiling
  • 8. Page 8 Intel BayTrail platform (Atom)  BayTrail < 13W, BayTrail-M < 6.5W  Vallyview SoC (Z37xx)  GPU is based on Gen7 (same arch as IvyBridge)  Same as previous slide:  OpenCL 1.2 (windows only for now)  OpenCL SDK  VTune support  System level tracing  Kernel Profiling
  • 9. Page 9 NVIDIA Tegra 5 ? (Codename: Logan)  Disclaimer: Logan is due early 2014. Part of the information is speculations  Development Boards and Samples available to selected customers  Logan SoC – 2W  ARM CPU A15 4+1 :speculated  Kepler based GPU : verified  CUDA Support : verified  CUDA SDK – Dozens of samples  CUDA Libraries: Thrust, cuBLAS, cuNVPP, etc.  NSIGHT : speculated  System Trace  Profiling, Debugging
  • 10. Page 10 NSIGHT TOOLS CASE STUDY Design Write Code Debug Profile
  • 11. Page 11 Nsight Highlights  “NVIDIA® Nsight™ is the ultimate development platform for heterogeneous computing” ( Taken from Nsight page )  IDE integration  Windows – integration with Visual Studio  Linux – specialized Eclipse version  Debugging , System Trace , Profiling  Graphics (DX, OpenGL)  Computing (OpenCL, CUDA, C++ AMP)  Profiling only on CUDA kernels  Debug/Trace/Profile Information is highly shaped  Highly efficient information fields, windows, diagrams  Feedback from professional users is noticed
  • 12. Page 12 Debugging  Much more than “just integrated” with the IDE  Shaped windows showing valuable info Assembly (GPU!) Variables across all warpsVisible layout of the stopped thread
  • 13. Page 13 Debugging – Eclipse edition  Seems that Eclipse integration is deeper than Visual Studio  Unified CPU / GPU Debugging  Simultaneous visibility into both CPU and GPU state  Multi-GPU support Slides from: “CUDA Development Using NVIDIA Nsight, Eclipse Edition” by David Goodwin, SC12  Full GPU debugging  Set kernel breakpoints  Single-step, run until, etc.  View values across multiple GPU threads at the same time  Examine thread, warp, block state  Source and assembly level debugging
  • 15. Page 15 Kernel Profiling  Choose a kernel to profile  Skip N kernels, Profile M kernels  Choose “experiments”  Experiment - Types of profiling/analysis  NVIDIA runs each kernel launch dozens of times with the same data
  • 16. Page 16 Profiling Results  Experiment list  Each experiment is a tabbed window  Profiling information is shaped in graphs, pie charts, diagrams, etc.  Taking HW counters and shaping them to easy- to-understand graphics  Information targets known HW bottlenecks, Code inefficiencies, etc.  Amazingly shaped…
  • 17. Page 17 Profiling Results  The information provides a quick & easy methodic way to identify the performance bottlenecks 1 2 3 4
  • 18. Page 18 Eclipse Edition - Source Code Editor  Project Templates  CUDA code highlighting  CUDA aware refactoring  CUDA aware code completion and inline help
  • 20. Page 20 CUDA Libraries – Part of the SDK  cuFFT  cuBLAS  cuRAND  cuSPARSE  NPP (like IPP)  Math Library  Thrust (next slide)
  • 21. Page 21 Thrust Library  https://developer.nvidia.com/thrust  Works on top of CUDA  Open-source version is available at github  http://thrust.github.io/  Presentations:  http://on-demand.gputechconf.com/gtc- express/2011/presentations/introductiontothrust.pdf
  • 23. Page 23 CLPP  OpenCL Data Parallel Primitives Library (similar to thrust)  Source : https://code.google.com/p/clpp/  7 committers, last commit 1.5Y ago
  • 24. Page 24 OpenCL BLAS  OpenCL BLAS  http://openclblas.sourceforge.net/  Code is available here (GPLv2):  http://sourceforge.net/projects/openclblas/
  • 25. Page 25 ViennaCL  BLAS implementation  http://viennacl.sourceforge.net/  Looks very promising
  • 27. Page 27 Platform links:  ARM  Developer site : http://malideveloper.arm.com  OpenCL tracing : http://malideveloper.arm.com/develop-for-mali/tools/mali-graphics-debugger/  DS-5 suite : http://www.arm.com/products/tools/software-tools/ds-5/index.php  OpenCL SDK : http://malideveloper.arm.com/develop-for-mali/sdks/mali-opencl-sdk/  OpenCL developer guide:  Online: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0538e/index.html  PDF: http://infocenter.arm.com/help/topic/com.arm.doc.dui0538e/DUI0538E_mali_t600_opencl_dg.pdf  NVIDIA  http://www.anandtech.com/show/7169/nvidia-demonstrates-logan-soc-mobile-kepler  http://www.slashgear.com/nvidia-tegra-logan-detailed-with-game-changing-cuda-integration-19274630/  http://www.ubergizmo.com/2013/07/nvidia-tegra-5-release-date-specs-news/
  • 28. Page 28 Links:  Intel  OpenCL sdk http://software.intel.com/en-us/vcsource/tools/opencl-sdk  GPA http://software.intel.com/en-us/vcsource/tools/intel-gpa  vTune support in OpenCL http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe-getting-started-with-opencl- performance-analysis-on-intel-hd-graphics  http://www.theinquirer.net/inquirer/news/2266966/intel-releases-opencl-sdk-for-windows-and-linux  Haswell Linux support: http://www.phoronix.com/scan.php?page=news_item&px=MTA3NDc  OpenCL “Beignet” – open source linux compiler :  http://software.intel.com/en-us/forums/topic/402118  http://linux.slashdot.org/story/13/04/16/014233/intel-releases-new-opencl-implementation-for-gnulinux  ATOM BayTrail:  http://arstechnica.com/gadgets/2013/02/intel-gets-aggressive-with-new-smartphone-and-tablet-chips/  http://www.anandtech.com/show/7314/intel-baytrail-preview-intel-atom-z3770-tested  http://www.tomshardware.com/reviews/bay-trail-celeron-j1750-performance,3614-6.html  http://software.intel.com/en-us/forums/topic/476221  http://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors#.22Bay_Trail.22_.2822_nm.29
  • 29. Page 29 NSIGHT Links  http://www.nvidia.com/object/nsight.html  https://developer.nvidia.com/nsight-visual-studio-edition-videos  https://developer.nvidia.com/developer-webinars  http://on-demand.gputechconf.com/supercomputing/2012/presentation/SB006-Goodwin- CUDA-Development-Nsight.pdf  http://on-demand.gputechconf.com/gtc/2013/presentations/S3011-CUDA-Optimization- With-Nsight-VSE.pdf