SlideShare a Scribd company logo
1 of 17
Download to read offline
HETEROGENEOUS SYSTEM ARCHITECTURE

AND

THE HSA FOUNDATION
INTRODUCING HETEROGENEOUS SYSTEM ARCHITECTURE (HSA)



HSA is a purpose designed architecture to enable the
software ecosystem to combine and exploit the
complementary capabilities of sequential programming
elements (CPUs) and parallel processing elements (such as
GPUs) to deliver new capabilities to users that go beyond
the traditional usage scenarios

AMD is making HSA an open standard to jumpstart the
ecosystem



2 | Heterogeneous System Architecture   | June 2012
EFFECTIVE COMPUTE OFFLOAD IS MADE EASY BY HSA


   APP Accelerated Software                                                  Accelerated Processing Unit
         Applications




                                        Graphics Workloads


                                        Data Parallel Workloads


                                        Serial and Task Parallel Workloads




3 | Heterogeneous System Architecture   | June 2012
AMD HSA FEATURE ROADMAP


         Physical                                     Optimized          Architectural              System
        Integration                                   Platforms           Integration             Integration


 Integrate CPU & GPU                            GPU Compute C++       Unified Address Space   GPU compute context
        in silicon                                  support             for CPU and GPU             switch


                                                                       GPU uses pageable
      Unified Memory                              HSA Memory                                   GPU graphics pre-
                                                                       system memory via
        Controller                               Management Unit                                   emption
                                                                          CPU pointers


         Common                                Bi-Directional Power
                                                                      Fully coherent memory
       Manufacturing                           Mgmt between CPU                                 Quality of service
                                                                       between CPU & GPU
        Technology                                   and GPU




4 | Heterogeneous System Architecture   | June 2012
HSA COMPLIANT FEATURES



             Optimized
             Platforms

                                            Support OpenCL C++ directions and Microsoft’s upcoming C++ AMP language.
     GPU Compute C++                        This eases programming of both CPU and GPU working together to process
         support                            parallel workloads, such as Computer Vision, Video Encoding/Transcoding, etc.


                                            CPU and GPU can share system memory. This means all system memory is
        HSA Memory                          accessible by both CPU or GPU, depending on need. In today’s world, only a
       Management Unit                      subset of system memory can be used by the GPU.


    Bi-Directional Power                    Enables “power sloshing” where CPU and GPU are able to dynamically lower or
    Mgmt between CPU                        raise their power and performance, depending on the activity and which one is
          and GPU                           more suited to the task at hand.



5 | Heterogeneous System Architecture   | June 2012
HSA COMPLIANT FEATURES



        Architectural
         Integration

                                            The unified address space provides ease of programming for developers to create
   Unified Address Space
     for CPU and GPU
                                            applications. For HSA platforms, a pointer is really a pointer and does not require
                                            separate memory pointers for CPU and GPU.

     GPU uses pageable                      The GPU can take advantage of the CPU virtual address space. With pageable
     system memory via                      system memory, the GPU can reference the data directly in the CPU domain. In
        CPU pointers                        prior architectures, data had to be copied between the two spaces or page-locked
                                            prior to use.
                                            Allows for data to be cached by both the CPU and the GPU, and referenced by
   Fully coherent memory                    either. In all previous generations, GPU caches had to be flushed at command
    between CPU & GPU                       buffer boundaries prior to CPU access. And unlike discrete GPUs, the CPU
                                            and GPU in an APU share a high speed coherent bus.


6 | Heterogeneous System Architecture   | June 2012
FULL HSA FEATURES


            System
          Integration

                                            GPU tasks can be context switched, making the GPU a multi-tasker. Context
   GPU compute context                      switching means faster application, graphics and compute
         switch
                                            interoperation. Users get a snappier, more interactive experience.

                                            As more applications enjoy the performance and features of the GPU, it is important
     GPU graphics pre-                      that interactivity of the system is good. This means low latency access to the GPU
         emption                            from any process.


                                            With context switching and pre-emption, time criticality is added to the tasks
       Quality of service                   assigned to the processors. Direct access to the hardware for multi-users or
                                            multiple applications are either prioritized or equalized.




7 | Heterogeneous System Architecture   | June 2012
UNLEASHING DEVELOPER INNOVATION
PROBLEM                               HSA + SDKs =                                                                SOLUTION
                                      Productivity & Performance with low Power

                                          Few M
                                                        Few K
                                                                    Wide range of        GPU/HW blocks hard to program
                                           HSA                      Differentiated       Not all workloads accelerate
                                                        Apps
                                          Coders                    Experiences
 Developer
  Return                                                                                      ~100K
                                                                                                       ~200+
                                                                                                                Significant
                                                                                               GPU                niche
(Differentiation in                                                                                    Apps
                                                                                              Coders              Value
  Performance,                    Developers historically program CPUs
      Power,
    Features,                        ~30+M
                                                        ~4M+        Good User
  Time2Market)                        CPU
                                                        Apps       Experiences
                                     Coders




                                                                Developer Investment
                                                                 (Effort, Time, New skills)

  8 | Heterogeneous System Architecture   | June 2012
HSA SOLUTION STACK

 How we deliver the HSA value
  proposition                                                                                               Application




                                                         SW Developers
                                                                                                         Domain Specific Libs
 Overall Vision:                                                           Standard SW                   (Bolt, OpenCV,…)
   – Make GPU easily accessible
                                                                                                             OpenCL       DirectX       Other
        Support mainstream languages                                                                        Runtime      Runtime      Runtime
        Expandable to domain specific
         languages
                                                                                                                                 Legacy
   – Make compute offload efficient                                                                   HSA Runtime
                                                                                                                                User Mode
        Direct path to GPU (avoid Graphics                                                                                      Drivers
         overhead)                                                                                           HSAIL
        Eliminate memory copy                           HW Vendors
                                                                                                        Finalizer
        Low-latency dispatch                                             Custom Drivers
                                                                                                             GPU ISA
   – Make it ubiquitous
                                                                                                                                       Other
        Drive HSA as a standard through                                 Differentiated HW   CPU(s)           GPU(s)
                                                                                                                                    Accelerators
         HSA Foundation
        Open Source key components

   9 | Heterogeneous System Architecture   | June 2012
HSA INTERMEDIATE LAYER - HSAIL

 HSAIL is a virtual ISA for parallel programs
     Finalized to native ISA by a JIT compiler or “Finalizer”

 Allow rapid innovations in native GPU architectures
     HSAIL will be constant across implementations

 Explicitly parallel
     Designed for data parallel programming

 Support for exceptions, virtual functions, and other high level language features

 Syscall methods
     GPU code can call directly to system services, IO, printf, etc

 Debugging support


10 | Heterogeneous System Architecture   | June 2012
C++ AMP

 C++ AMP: a data parallel programming model initiated by Microsoft for accelerators
     First announced at the 2011 AFDS

 C++ based higher level programming model with advanced C++11 features

 Single source model to well integrate host and device programming

 Implicit programming model that is “future proofed” to enable HSA features, e.g. avoiding
  host-to-device copies

 A C++ AMP implementation available from the Microsoft Visual Studio 11 suite under a beta
  release




11 | Heterogeneous System Architecture   | June 2012
C++ AMP AND HSA

 Compute-focused efficient HSA implementation to replace a graphics-centric implementation
  for C++ AMP
     E.g. low latency dispatch, HSAIL enabled

 The shared virtual memory in HSA eliminates the data copies between host and device in
  existing C++ AMP programs without any source changes.

 Additional advanced C++ features on GPU, e.g.
     More data types
     Function calls
     Virtual functions
     Arbitrary control flow
     Exceptional handling
     Device and platform atomics


12 | Heterogeneous System Architecture   | June 2012
OPENCL™ AND HSA

    HSA is an optimized platform architecture for OpenCL™
        Not an alternative to OpenCL™
    OpenCL™ on HSA will benefit from
          Avoidance of wasteful copies
          Low latency dispatch
          Improved memory model
          Pointers shared between CPU and GPU
    HSA also exposes a lower level programming interface, for those that want the
     ultimate in control and performance
        Optimized libraries may choose the lower level interface




13 | Heterogeneous System Architecture   | June 2012
HSA TAKING PLATFORM TO PROGRAMMERS

 Balance between CPU and GPU for performance and power efficiency

 Make GPUs accessible to wider audience of programmers
     Programming models close to today’s CPU programming models
     Enabling more advanced language features on GPU
     Shared virtual memory enables complex pointer-containing data structures (lists, trees,
      etc) and hence more applications on GPU
     Kernel can enqueue work to any other device in the system (e.g. GPU->GPU, GPU->CPU)
        • Enabling task-graph style algorithms, Ray-Tracing, etc

 Clearly defined HSA memory model enables effective reasoning for parallel programming

 HSA provides a compatible architecture across a wide range of programming models and
  HW implementations.


14 | Heterogeneous System Architecture   | June 2012
THE HSA FOUNDATION - BRINGING ABOUT THE NEXT GENERATION PLATFORM



 An open standardization body to bring about broad industry support for Heterogeneous
  Computing via the full value chain Silicon IP to ISV.
 GPU computing as a first class co-processor to the CPU through architecture definition
 Architectural support for special purpose hardware accelerators ( Rasterizer, Security
  Processors, DSP, etc.)
 Own and evolve the specifications and conformance suite
 Bring to market strong development solutions to drive innovative advanced content and
  applications
 Cultivate programing talent via HSA developer training and academic programs




15 | Heterogeneous System Architecture   | June 2012
THANK YOU




16 | Heterogeneous System Architecture   | June 2012
Disclaimer & Attribution
            The information presented in this document is for informational purposes only and may contain technical inaccuracies,
            omissions and typographical errors.

            The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not
            limited to product and roadmap changes, component and motherboard version changes, new model and/or product
            releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the
            like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise
            this information and to make changes from time to time to the content hereof without obligation to notify any person of such
            revisions or changes.

            NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO
            RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS
            INFORMATION.

            ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE
            EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT,
            INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION
            CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

            AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names
            used in this presentation are for informational purposes only and may be trademarks of their respective owners.

            OpenCL is a trademark of Apple Inc. used by permission by Khronos.

            © 2012 Advanced Micro Devices, Inc.


17 | Heterogeneous System Architecture   | June 2012

More Related Content

What's hot

HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Foundation
 
ISCA Final Presentation - Applications
ISCA Final Presentation - ApplicationsISCA Final Presentation - Applications
ISCA Final Presentation - ApplicationsHSA Foundation
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013 HSA Foundation
 
HSA Foundation Overview
HSA Foundation OverviewHSA Foundation Overview
HSA Foundation OverviewHSA Foundation
 
HSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterHSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterAMD Developer Central
 
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA  by Ben Sanders, AMDBolt C++ Standard Template Libary for HSA  by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMDHSA Foundation
 
Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)dibyendu.das
 
Gpu Compute
Gpu ComputeGpu Compute
Gpu Computejworth
 
Greenplum Database on HDFS
Greenplum Database on HDFSGreenplum Database on HDFS
Greenplum Database on HDFSDataWorks Summit
 
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...AMD Developer Central
 
Software Parallelisation & Platform Generation for Heterogeneous Multicore Ar...
Software Parallelisation & Platform Generation for Heterogeneous Multicore Ar...Software Parallelisation & Platform Generation for Heterogeneous Multicore Ar...
Software Parallelisation & Platform Generation for Heterogeneous Multicore Ar...chiportal
 
Compute API –Past & Future
Compute API –Past & FutureCompute API –Past & Future
Compute API –Past & FutureOfer Rosenberg
 
Droidcon2013 ndk cpu_architecture_optimization_weggerle_intel
Droidcon2013 ndk cpu_architecture_optimization_weggerle_intelDroidcon2013 ndk cpu_architecture_optimization_weggerle_intel
Droidcon2013 ndk cpu_architecture_optimization_weggerle_intelDroidcon Berlin
 
Droidcon ndk cpu_architecture_optimization
Droidcon ndk cpu_architecture_optimizationDroidcon ndk cpu_architecture_optimization
Droidcon ndk cpu_architecture_optimizationDroidcon Berlin
 
CUDA by Example : The Final Countdown : Notes
CUDA by Example : The Final Countdown : NotesCUDA by Example : The Final Countdown : Notes
CUDA by Example : The Final Countdown : NotesSubhajit Sahu
 
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAsScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAsShinya Takamaeda-Y
 

What's hot (20)

HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013
 
ISCA Final Presentation - Applications
ISCA Final Presentation - ApplicationsISCA Final Presentation - Applications
ISCA Final Presentation - Applications
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013
 
HSA Foundation Overview
HSA Foundation OverviewHSA Foundation Overview
HSA Foundation Overview
 
HSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterHSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben Gaster
 
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA  by Ben Sanders, AMDBolt C++ Standard Template Libary for HSA  by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
 
Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)
 
Gpu Compute
Gpu ComputeGpu Compute
Gpu Compute
 
Cuda lab manual
Cuda lab manualCuda lab manual
Cuda lab manual
 
openCL Paper
openCL PaperopenCL Paper
openCL Paper
 
Greenplum Database on HDFS
Greenplum Database on HDFSGreenplum Database on HDFS
Greenplum Database on HDFS
 
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
 
Software Parallelisation & Platform Generation for Heterogeneous Multicore Ar...
Software Parallelisation & Platform Generation for Heterogeneous Multicore Ar...Software Parallelisation & Platform Generation for Heterogeneous Multicore Ar...
Software Parallelisation & Platform Generation for Heterogeneous Multicore Ar...
 
Compute API –Past & Future
Compute API –Past & FutureCompute API –Past & Future
Compute API –Past & Future
 
Droidcon2013 ndk cpu_architecture_optimization_weggerle_intel
Droidcon2013 ndk cpu_architecture_optimization_weggerle_intelDroidcon2013 ndk cpu_architecture_optimization_weggerle_intel
Droidcon2013 ndk cpu_architecture_optimization_weggerle_intel
 
Droidcon ndk cpu_architecture_optimization
Droidcon ndk cpu_architecture_optimizationDroidcon ndk cpu_architecture_optimization
Droidcon ndk cpu_architecture_optimization
 
CUDA by Example : The Final Countdown : Notes
CUDA by Example : The Final Countdown : NotesCUDA by Example : The Final Countdown : Notes
CUDA by Example : The Final Countdown : Notes
 
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAsScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
 
IBM Power 755 Server Data Sheet
IBM Power 755 Server Data SheetIBM Power 755 Server Data Sheet
IBM Power 755 Server Data Sheet
 
User Group Bi
User Group BiUser Group Bi
User Group Bi
 

Similar to HSA Overview

Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit pptSandeep Singh
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
GPGPU algorithms in games
GPGPU algorithms in gamesGPGPU algorithms in games
GPGPU algorithms in gameszlatan4177
 
Cg 4278
Cg 4278Cg 4278
Cg 4278Abu85
 
Heterogenous system architecture(HSA)
Heterogenous system architecture(HSA)Heterogenous system architecture(HSA)
Heterogenous system architecture(HSA)Dr. Michael Agbaje
 
AMD 2012: HSA in Gaming
AMD 2012: HSA in GamingAMD 2012: HSA in Gaming
AMD 2012: HSA in Gamingnaroon2
 
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONSA SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONScseij
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecturemohamedragabslideshare
 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Editor IJARCET
 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Editor IJARCET
 
CMP208_Unleash Your Graphics Solutions with the Flexibility of Elastic GPUs
CMP208_Unleash Your Graphics Solutions with the Flexibility of Elastic GPUsCMP208_Unleash Your Graphics Solutions with the Flexibility of Elastic GPUs
CMP208_Unleash Your Graphics Solutions with the Flexibility of Elastic GPUsAmazon Web Services
 
Graphics Processing Unit: An Introduction
Graphics Processing Unit: An IntroductionGraphics Processing Unit: An Introduction
Graphics Processing Unit: An Introductionijtsrd
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)Kohei KaiGai
 
Achieving Improved Performance In Multi-threaded Programming With GPU Computing
Achieving Improved Performance In Multi-threaded Programming With GPU ComputingAchieving Improved Performance In Multi-threaded Programming With GPU Computing
Achieving Improved Performance In Multi-threaded Programming With GPU ComputingMesbah Uddin Khan
 
19564926 graphics-processing-unit
19564926 graphics-processing-unit19564926 graphics-processing-unit
19564926 graphics-processing-unitDayakar Siddula
 

Similar to HSA Overview (20)

Amd fusion apus
Amd fusion apusAmd fusion apus
Amd fusion apus
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit ppt
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
GPGPU algorithms in games
GPGPU algorithms in gamesGPGPU algorithms in games
GPGPU algorithms in games
 
Cg 4278
Cg 4278Cg 4278
Cg 4278
 
Heterogenous system architecture(HSA)
Heterogenous system architecture(HSA)Heterogenous system architecture(HSA)
Heterogenous system architecture(HSA)
 
AMD 2012: HSA in Gaming
AMD 2012: HSA in GamingAMD 2012: HSA in Gaming
AMD 2012: HSA in Gaming
 
GPU Computing
GPU ComputingGPU Computing
GPU Computing
 
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONSA SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045
 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045
 
CMP208_Unleash Your Graphics Solutions with the Flexibility of Elastic GPUs
CMP208_Unleash Your Graphics Solutions with the Flexibility of Elastic GPUsCMP208_Unleash Your Graphics Solutions with the Flexibility of Elastic GPUs
CMP208_Unleash Your Graphics Solutions with the Flexibility of Elastic GPUs
 
GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
 
Graphics Processing Unit: An Introduction
Graphics Processing Unit: An IntroductionGraphics Processing Unit: An Introduction
Graphics Processing Unit: An Introduction
 
HSA Introduction
HSA IntroductionHSA Introduction
HSA Introduction
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
 
GPU - Basic Working
GPU - Basic WorkingGPU - Basic Working
GPU - Basic Working
 
Achieving Improved Performance In Multi-threaded Programming With GPU Computing
Achieving Improved Performance In Multi-threaded Programming With GPU ComputingAchieving Improved Performance In Multi-threaded Programming With GPU Computing
Achieving Improved Performance In Multi-threaded Programming With GPU Computing
 
19564926 graphics-processing-unit
19564926 graphics-processing-unit19564926 graphics-processing-unit
19564926 graphics-processing-unit
 

More from HSA Foundation

Hsa Runtime version 1.00 Provisional
Hsa Runtime version  1.00  ProvisionalHsa Runtime version  1.00  Provisional
Hsa Runtime version 1.00 ProvisionalHSA Foundation
 
Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)HSA Foundation
 
ISCA final presentation - Runtime
ISCA final presentation - RuntimeISCA final presentation - Runtime
ISCA final presentation - RuntimeHSA Foundation
 
ISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelHSA Foundation
 
ISCA final presentation - Memory Model
ISCA final presentation - Memory ModelISCA final presentation - Memory Model
ISCA final presentation - Memory ModelHSA Foundation
 
ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton -  CompilationsISCA Final Presentaiton -  Compilations
ISCA Final Presentaiton - CompilationsHSA Foundation
 
ISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAILISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAILHSA Foundation
 
ISCA Final Presentation - Intro
ISCA Final Presentation - IntroISCA Final Presentation - Intro
ISCA Final Presentation - IntroHSA Foundation
 
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...HSA Foundation
 
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed HSA Foundation
 
Apu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshareApu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshareHSA Foundation
 
HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation
 
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Foundation
 
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...HSA Foundation
 
Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012HSA Foundation
 
Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime HSA Foundation
 
Hsa2012 logo guidelines.
Hsa2012 logo guidelines.Hsa2012 logo guidelines.
Hsa2012 logo guidelines.HSA Foundation
 
What Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAWhat Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAHSA Foundation
 
Fabric Engine: Why HSA is Invaluable
Fabric Engine: Why HSA is  InvaluableFabric Engine: Why HSA is  Invaluable
Fabric Engine: Why HSA is InvaluableHSA Foundation
 

More from HSA Foundation (19)

Hsa Runtime version 1.00 Provisional
Hsa Runtime version  1.00  ProvisionalHsa Runtime version  1.00  Provisional
Hsa Runtime version 1.00 Provisional
 
Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)
 
ISCA final presentation - Runtime
ISCA final presentation - RuntimeISCA final presentation - Runtime
ISCA final presentation - Runtime
 
ISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelISCA final presentation - Queuing Model
ISCA final presentation - Queuing Model
 
ISCA final presentation - Memory Model
ISCA final presentation - Memory ModelISCA final presentation - Memory Model
ISCA final presentation - Memory Model
 
ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton -  CompilationsISCA Final Presentaiton -  Compilations
ISCA Final Presentaiton - Compilations
 
ISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAILISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAIL
 
ISCA Final Presentation - Intro
ISCA Final Presentation - IntroISCA Final Presentation - Intro
ISCA Final Presentation - Intro
 
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
 
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
 
Apu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshareApu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshare
 
HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer
 
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
 
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
 
Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012
 
Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime
 
Hsa2012 logo guidelines.
Hsa2012 logo guidelines.Hsa2012 logo guidelines.
Hsa2012 logo guidelines.
 
What Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAWhat Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSA
 
Fabric Engine: Why HSA is Invaluable
Fabric Engine: Why HSA is  InvaluableFabric Engine: Why HSA is  Invaluable
Fabric Engine: Why HSA is Invaluable
 

Recently uploaded

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

HSA Overview

  • 2. INTRODUCING HETEROGENEOUS SYSTEM ARCHITECTURE (HSA) HSA is a purpose designed architecture to enable the software ecosystem to combine and exploit the complementary capabilities of sequential programming elements (CPUs) and parallel processing elements (such as GPUs) to deliver new capabilities to users that go beyond the traditional usage scenarios AMD is making HSA an open standard to jumpstart the ecosystem 2 | Heterogeneous System Architecture | June 2012
  • 3. EFFECTIVE COMPUTE OFFLOAD IS MADE EASY BY HSA APP Accelerated Software Accelerated Processing Unit Applications Graphics Workloads Data Parallel Workloads Serial and Task Parallel Workloads 3 | Heterogeneous System Architecture | June 2012
  • 4. AMD HSA FEATURE ROADMAP Physical Optimized Architectural System Integration Platforms Integration Integration Integrate CPU & GPU GPU Compute C++ Unified Address Space GPU compute context in silicon support for CPU and GPU switch GPU uses pageable Unified Memory HSA Memory GPU graphics pre- system memory via Controller Management Unit emption CPU pointers Common Bi-Directional Power Fully coherent memory Manufacturing Mgmt between CPU Quality of service between CPU & GPU Technology and GPU 4 | Heterogeneous System Architecture | June 2012
  • 5. HSA COMPLIANT FEATURES Optimized Platforms Support OpenCL C++ directions and Microsoft’s upcoming C++ AMP language. GPU Compute C++ This eases programming of both CPU and GPU working together to process support parallel workloads, such as Computer Vision, Video Encoding/Transcoding, etc. CPU and GPU can share system memory. This means all system memory is HSA Memory accessible by both CPU or GPU, depending on need. In today’s world, only a Management Unit subset of system memory can be used by the GPU. Bi-Directional Power Enables “power sloshing” where CPU and GPU are able to dynamically lower or Mgmt between CPU raise their power and performance, depending on the activity and which one is and GPU more suited to the task at hand. 5 | Heterogeneous System Architecture | June 2012
  • 6. HSA COMPLIANT FEATURES Architectural Integration The unified address space provides ease of programming for developers to create Unified Address Space for CPU and GPU applications. For HSA platforms, a pointer is really a pointer and does not require separate memory pointers for CPU and GPU. GPU uses pageable The GPU can take advantage of the CPU virtual address space. With pageable system memory via system memory, the GPU can reference the data directly in the CPU domain. In CPU pointers prior architectures, data had to be copied between the two spaces or page-locked prior to use. Allows for data to be cached by both the CPU and the GPU, and referenced by Fully coherent memory either. In all previous generations, GPU caches had to be flushed at command between CPU & GPU buffer boundaries prior to CPU access. And unlike discrete GPUs, the CPU and GPU in an APU share a high speed coherent bus. 6 | Heterogeneous System Architecture | June 2012
  • 7. FULL HSA FEATURES System Integration GPU tasks can be context switched, making the GPU a multi-tasker. Context GPU compute context switching means faster application, graphics and compute switch interoperation. Users get a snappier, more interactive experience. As more applications enjoy the performance and features of the GPU, it is important GPU graphics pre- that interactivity of the system is good. This means low latency access to the GPU emption from any process. With context switching and pre-emption, time criticality is added to the tasks Quality of service assigned to the processors. Direct access to the hardware for multi-users or multiple applications are either prioritized or equalized. 7 | Heterogeneous System Architecture | June 2012
  • 8. UNLEASHING DEVELOPER INNOVATION PROBLEM HSA + SDKs = SOLUTION Productivity & Performance with low Power Few M Few K Wide range of GPU/HW blocks hard to program HSA Differentiated Not all workloads accelerate Apps Coders Experiences Developer Return ~100K ~200+ Significant GPU niche (Differentiation in Apps Coders Value Performance, Developers historically program CPUs Power, Features, ~30+M ~4M+ Good User Time2Market) CPU Apps Experiences Coders Developer Investment (Effort, Time, New skills) 8 | Heterogeneous System Architecture | June 2012
  • 9. HSA SOLUTION STACK  How we deliver the HSA value proposition Application SW Developers Domain Specific Libs  Overall Vision: Standard SW (Bolt, OpenCV,…) – Make GPU easily accessible OpenCL DirectX Other  Support mainstream languages Runtime Runtime Runtime  Expandable to domain specific languages Legacy – Make compute offload efficient HSA Runtime User Mode  Direct path to GPU (avoid Graphics Drivers overhead) HSAIL  Eliminate memory copy HW Vendors Finalizer  Low-latency dispatch Custom Drivers GPU ISA – Make it ubiquitous Other  Drive HSA as a standard through Differentiated HW CPU(s) GPU(s) Accelerators HSA Foundation  Open Source key components 9 | Heterogeneous System Architecture | June 2012
  • 10. HSA INTERMEDIATE LAYER - HSAIL  HSAIL is a virtual ISA for parallel programs  Finalized to native ISA by a JIT compiler or “Finalizer”  Allow rapid innovations in native GPU architectures  HSAIL will be constant across implementations  Explicitly parallel  Designed for data parallel programming  Support for exceptions, virtual functions, and other high level language features  Syscall methods  GPU code can call directly to system services, IO, printf, etc  Debugging support 10 | Heterogeneous System Architecture | June 2012
  • 11. C++ AMP  C++ AMP: a data parallel programming model initiated by Microsoft for accelerators  First announced at the 2011 AFDS  C++ based higher level programming model with advanced C++11 features  Single source model to well integrate host and device programming  Implicit programming model that is “future proofed” to enable HSA features, e.g. avoiding host-to-device copies  A C++ AMP implementation available from the Microsoft Visual Studio 11 suite under a beta release 11 | Heterogeneous System Architecture | June 2012
  • 12. C++ AMP AND HSA  Compute-focused efficient HSA implementation to replace a graphics-centric implementation for C++ AMP  E.g. low latency dispatch, HSAIL enabled  The shared virtual memory in HSA eliminates the data copies between host and device in existing C++ AMP programs without any source changes.  Additional advanced C++ features on GPU, e.g.  More data types  Function calls  Virtual functions  Arbitrary control flow  Exceptional handling  Device and platform atomics 12 | Heterogeneous System Architecture | June 2012
  • 13. OPENCL™ AND HSA  HSA is an optimized platform architecture for OpenCL™  Not an alternative to OpenCL™  OpenCL™ on HSA will benefit from  Avoidance of wasteful copies  Low latency dispatch  Improved memory model  Pointers shared between CPU and GPU  HSA also exposes a lower level programming interface, for those that want the ultimate in control and performance  Optimized libraries may choose the lower level interface 13 | Heterogeneous System Architecture | June 2012
  • 14. HSA TAKING PLATFORM TO PROGRAMMERS  Balance between CPU and GPU for performance and power efficiency  Make GPUs accessible to wider audience of programmers  Programming models close to today’s CPU programming models  Enabling more advanced language features on GPU  Shared virtual memory enables complex pointer-containing data structures (lists, trees, etc) and hence more applications on GPU  Kernel can enqueue work to any other device in the system (e.g. GPU->GPU, GPU->CPU) • Enabling task-graph style algorithms, Ray-Tracing, etc  Clearly defined HSA memory model enables effective reasoning for parallel programming  HSA provides a compatible architecture across a wide range of programming models and HW implementations. 14 | Heterogeneous System Architecture | June 2012
  • 15. THE HSA FOUNDATION - BRINGING ABOUT THE NEXT GENERATION PLATFORM  An open standardization body to bring about broad industry support for Heterogeneous Computing via the full value chain Silicon IP to ISV.  GPU computing as a first class co-processor to the CPU through architecture definition  Architectural support for special purpose hardware accelerators ( Rasterizer, Security Processors, DSP, etc.)  Own and evolve the specifications and conformance suite  Bring to market strong development solutions to drive innovative advanced content and applications  Cultivate programing talent via HSA developer training and academic programs 15 | Heterogeneous System Architecture | June 2012
  • 16. THANK YOU 16 | Heterogeneous System Architecture | June 2012
  • 17. Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes. NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in this presentation are for informational purposes only and may be trademarks of their respective owners. OpenCL is a trademark of Apple Inc. used by permission by Khronos. © 2012 Advanced Micro Devices, Inc. 17 | Heterogeneous System Architecture | June 2012