2. INTRODUCING HETEROGENEOUS SYSTEM ARCHITECTURE (HSA)
HSA is a purpose designed architecture to enable the
software ecosystem to combine and exploit the
complementary capabilities of sequential programming
elements (CPUs) and parallel processing elements (such as
GPUs) to deliver new capabilities to users that go beyond
the traditional usage scenarios
AMD is making HSA an open standard to jumpstart the
ecosystem
2 | Heterogeneous System Architecture | June 2012
3. EFFECTIVE COMPUTE OFFLOAD IS MADE EASY BY HSA
APP Accelerated Software Accelerated Processing Unit
Applications
Graphics Workloads
Data Parallel Workloads
Serial and Task Parallel Workloads
3 | Heterogeneous System Architecture | June 2012
4. AMD HSA FEATURE ROADMAP
Physical Optimized Architectural System
Integration Platforms Integration Integration
Integrate CPU & GPU GPU Compute C++ Unified Address Space GPU compute context
in silicon support for CPU and GPU switch
GPU uses pageable
Unified Memory HSA Memory GPU graphics pre-
system memory via
Controller Management Unit emption
CPU pointers
Common Bi-Directional Power
Fully coherent memory
Manufacturing Mgmt between CPU Quality of service
between CPU & GPU
Technology and GPU
4 | Heterogeneous System Architecture | June 2012
5. HSA COMPLIANT FEATURES
Optimized
Platforms
Support OpenCL C++ directions and Microsoft’s upcoming C++ AMP language.
GPU Compute C++ This eases programming of both CPU and GPU working together to process
support parallel workloads, such as Computer Vision, Video Encoding/Transcoding, etc.
CPU and GPU can share system memory. This means all system memory is
HSA Memory accessible by both CPU or GPU, depending on need. In today’s world, only a
Management Unit subset of system memory can be used by the GPU.
Bi-Directional Power Enables “power sloshing” where CPU and GPU are able to dynamically lower or
Mgmt between CPU raise their power and performance, depending on the activity and which one is
and GPU more suited to the task at hand.
5 | Heterogeneous System Architecture | June 2012
6. HSA COMPLIANT FEATURES
Architectural
Integration
The unified address space provides ease of programming for developers to create
Unified Address Space
for CPU and GPU
applications. For HSA platforms, a pointer is really a pointer and does not require
separate memory pointers for CPU and GPU.
GPU uses pageable The GPU can take advantage of the CPU virtual address space. With pageable
system memory via system memory, the GPU can reference the data directly in the CPU domain. In
CPU pointers prior architectures, data had to be copied between the two spaces or page-locked
prior to use.
Allows for data to be cached by both the CPU and the GPU, and referenced by
Fully coherent memory either. In all previous generations, GPU caches had to be flushed at command
between CPU & GPU buffer boundaries prior to CPU access. And unlike discrete GPUs, the CPU
and GPU in an APU share a high speed coherent bus.
6 | Heterogeneous System Architecture | June 2012
7. FULL HSA FEATURES
System
Integration
GPU tasks can be context switched, making the GPU a multi-tasker. Context
GPU compute context switching means faster application, graphics and compute
switch
interoperation. Users get a snappier, more interactive experience.
As more applications enjoy the performance and features of the GPU, it is important
GPU graphics pre- that interactivity of the system is good. This means low latency access to the GPU
emption from any process.
With context switching and pre-emption, time criticality is added to the tasks
Quality of service assigned to the processors. Direct access to the hardware for multi-users or
multiple applications are either prioritized or equalized.
7 | Heterogeneous System Architecture | June 2012
8. UNLEASHING DEVELOPER INNOVATION
PROBLEM HSA + SDKs = SOLUTION
Productivity & Performance with low Power
Few M
Few K
Wide range of GPU/HW blocks hard to program
HSA Differentiated Not all workloads accelerate
Apps
Coders Experiences
Developer
Return ~100K
~200+
Significant
GPU niche
(Differentiation in Apps
Coders Value
Performance, Developers historically program CPUs
Power,
Features, ~30+M
~4M+ Good User
Time2Market) CPU
Apps Experiences
Coders
Developer Investment
(Effort, Time, New skills)
8 | Heterogeneous System Architecture | June 2012
9. HSA SOLUTION STACK
How we deliver the HSA value
proposition Application
SW Developers
Domain Specific Libs
Overall Vision: Standard SW (Bolt, OpenCV,…)
– Make GPU easily accessible
OpenCL DirectX Other
Support mainstream languages Runtime Runtime Runtime
Expandable to domain specific
languages
Legacy
– Make compute offload efficient HSA Runtime
User Mode
Direct path to GPU (avoid Graphics Drivers
overhead) HSAIL
Eliminate memory copy HW Vendors
Finalizer
Low-latency dispatch Custom Drivers
GPU ISA
– Make it ubiquitous
Other
Drive HSA as a standard through Differentiated HW CPU(s) GPU(s)
Accelerators
HSA Foundation
Open Source key components
9 | Heterogeneous System Architecture | June 2012
10. HSA INTERMEDIATE LAYER - HSAIL
HSAIL is a virtual ISA for parallel programs
Finalized to native ISA by a JIT compiler or “Finalizer”
Allow rapid innovations in native GPU architectures
HSAIL will be constant across implementations
Explicitly parallel
Designed for data parallel programming
Support for exceptions, virtual functions, and other high level language features
Syscall methods
GPU code can call directly to system services, IO, printf, etc
Debugging support
10 | Heterogeneous System Architecture | June 2012
11. C++ AMP
C++ AMP: a data parallel programming model initiated by Microsoft for accelerators
First announced at the 2011 AFDS
C++ based higher level programming model with advanced C++11 features
Single source model to well integrate host and device programming
Implicit programming model that is “future proofed” to enable HSA features, e.g. avoiding
host-to-device copies
A C++ AMP implementation available from the Microsoft Visual Studio 11 suite under a beta
release
11 | Heterogeneous System Architecture | June 2012
12. C++ AMP AND HSA
Compute-focused efficient HSA implementation to replace a graphics-centric implementation
for C++ AMP
E.g. low latency dispatch, HSAIL enabled
The shared virtual memory in HSA eliminates the data copies between host and device in
existing C++ AMP programs without any source changes.
Additional advanced C++ features on GPU, e.g.
More data types
Function calls
Virtual functions
Arbitrary control flow
Exceptional handling
Device and platform atomics
12 | Heterogeneous System Architecture | June 2012
13. OPENCL™ AND HSA
HSA is an optimized platform architecture for OpenCL™
Not an alternative to OpenCL™
OpenCL™ on HSA will benefit from
Avoidance of wasteful copies
Low latency dispatch
Improved memory model
Pointers shared between CPU and GPU
HSA also exposes a lower level programming interface, for those that want the
ultimate in control and performance
Optimized libraries may choose the lower level interface
13 | Heterogeneous System Architecture | June 2012
14. HSA TAKING PLATFORM TO PROGRAMMERS
Balance between CPU and GPU for performance and power efficiency
Make GPUs accessible to wider audience of programmers
Programming models close to today’s CPU programming models
Enabling more advanced language features on GPU
Shared virtual memory enables complex pointer-containing data structures (lists, trees,
etc) and hence more applications on GPU
Kernel can enqueue work to any other device in the system (e.g. GPU->GPU, GPU->CPU)
• Enabling task-graph style algorithms, Ray-Tracing, etc
Clearly defined HSA memory model enables effective reasoning for parallel programming
HSA provides a compatible architecture across a wide range of programming models and
HW implementations.
14 | Heterogeneous System Architecture | June 2012
15. THE HSA FOUNDATION - BRINGING ABOUT THE NEXT GENERATION PLATFORM
An open standardization body to bring about broad industry support for Heterogeneous
Computing via the full value chain Silicon IP to ISV.
GPU computing as a first class co-processor to the CPU through architecture definition
Architectural support for special purpose hardware accelerators ( Rasterizer, Security
Processors, DSP, etc.)
Own and evolve the specifications and conformance suite
Bring to market strong development solutions to drive innovative advanced content and
applications
Cultivate programing talent via HSA developer training and academic programs
15 | Heterogeneous System Architecture | June 2012
16. THANK YOU
16 | Heterogeneous System Architecture | June 2012