SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Downloaden Sie, um offline zu lesen
GPGPU ALGORITHMS IN GAMES
How Heterogeneous Systems Architecture can be
leveraged to optimize algorithms in video games
Matthijs De Smedt
Nixxes Software B.V.
Lead Graphics Programmer
| HSA Algorithms in Games | June 13th, 2012
CONTENTS
 A short introduction
 Current usage of GPGPU in games
 Heterogeneous Systems Architecture
 Examples made possible by HSA
INTRODUCTION
| HSA Algorithms in Games | June 13th, 2012
VIDEOGAMES
 Games are near real-time simulations
 Response time is key
 Most systems run in sync with the output frequency
– Rendering 60 frames per second
– Allows for 16ms of processing time
 Framerate is limited either by:
– GPU
– CPU
– Display (VSync)
CPU
GPU
Input
Simulate
Render
Render
| HSA Algorithms in Games | June 13th, 2012
HARDWARE
 Typical hardware target for PC games:
– One multicore CPU
– One GPU
 Multiple GPUs: CrossFire
– Transparent to the application
– Driver alternates frames between GPUs
 GPUs are becoming more general purpose:
– General Purpose GPU algorithms (GPGPU)
CrossFire
GPGPU IN GAMES
| HSA Algorithms in Games | June 13th, 2012
INTRODUCTION TO GPGPU
 Rendering is a sequence of parallel algorithms
 GPUs are great at parallel computation
 Evolution of hardware and software to general purpose
 First GPGPU was accomplished with programmable rendering
– DirectX
– OpenGL
 Second generation using dedicated GPGPU APIs:
– CUDA
– OpenCL
– DirectCompute
 Third generation of GPGPU on the way:
– Heterogeneous Systems Architecture
| HSA Algorithms in Games | June 13th, 2012
GPGPU IN GAMES
 Some GPGPU algorithms are being used in
games right now. For example:
– Physics
 Particles
 Fluid simulation
 Destruction
– Specialized graphics algorithms
 Post-processing
 All these algorithms drive visual effects
GPU particle system by Fairlight
| HSA Algorithms in Games | June 13th, 2012
CURRENT PHYSICS EXAMPLE
 GPGPU particle simulation using DirectCompute
 Great for simulating thousands of visible particles
 Results of simulation are never copied back to CPU
– Can not interfere with gameplay
– Not synced in networked games
 Example: Smoke particles that affect game AI
CPU
GPU
Call GPU
Simulate
particles
Render
particles
| HSA Algorithms in Games | June 13th, 2012
GPGPU LIMITATIONS
 Why isn’t GPGPU used more for non-graphics?
 Latency
– DirectX has many layers and buffers
– DirectX commands are buffered up to multiple frames
– Actual execution on the GPU is delayed
 Copy overhead
– GPU cannot directly access application memory
– Must copy all data from and to the application
 Functionality
– Constrained programming models
HETEROGENEOUS SYSTEMS
ARCHITECTURE
| HSA Algorithms in Games | June 13th, 2012
HETEROGENEOUS SYSTEMS ARCHITECTURE
Hardware Software
 "Drivers"
– HSA provides a new, thin Compute API
– Very low latency
– Unified Address Space
– Exposes more hardware capabilities
 HSA Intermediate Language
– Virtual ISA
– Introduces CPU programming features to the GPU
 New features on discrete GPUs
 Accelerated Processing Unit
– Next generation processor
– Multiple CPU and GPU cores on
the same die
– Shared memory access
– Soon to be as widespread as
multicore CPUs
 New hardware and software
| HSA Algorithms in Games | June 13th, 2012
USING THE APU
 Distinction between two hardware configurations
 APU without discrete GPU
– Found in many laptops, soon in many desktops
– Use the on-die GPU for rendering
 APU with discrete GPU:
– Hard-core gamers will still use discrete GPUs
– Asymmetrical CrossFire
– Or: Dedicate the on-die GPU to Compute algorithms
 Could result in massive speedup of algorithms
 Using SIMD co-processors to offload the CPU is familiar to PS3 developers
| HSA Algorithms in Games | June 13th, 2012
COPY OVERHEAD
 Current Compute APIs require the application to explicitly copy all input and output memory
– Copying can easily takes longer than processing on CPU!
– Only small datasets or very expensive computations benefit from GPGPU
 HSA introduces a Unified Address Space for CPU and GPU memory
– CPU pointers on the GPU
– Virtual memory on the GPU
 Paging over PCI-Express (discrete) or shared memory controller (APU)
– Fully coherent
– Will make GPGPU an option for many more algorithms
| HSA Algorithms in Games | June 13th, 2012
LATENCY
 DirectX commands are buffered
 When the GPU is fully loaded this buffer is saturated
 Delay between scheduling and executing a GPGPU program on a busy GPU can take multiple frames
– Results will be several frames behind
– Game simulation needs all objects to be in sync
 GPGPU is currently impractical to use for anything but visual effects
| HSA Algorithms in Games | June 13th, 2012
| HSA Algorithms in Games | June 13th, 2012
| HSA Algorithms in Games | June 13th, 2012
| HSA Algorithms in Games | June 13th, 2012
| HSA Algorithms in Games | June 13th, 2012
LATENCY
 HSA’s new Compute API will reduce latency
 How to deal with a saturated GPU?
 A second GPU
– Dedicate the APU to Compute
– Virtually no latency
 HSA feature: Graphics pre-emption
– Context switching on the GPU
 Interrupt a graphics task (typically a large command list)
 Execute Compute algorithm
 Switch back to graphics
– Can be used both on discrete GPUs or on the APU
 Choose the solution best suited to your needs
| HSA Algorithms in Games | June 13th, 2012
APU USAGE EXAMPLE
GPU
CPU
HSA
Frame
Schedule
DirectCompute
Execute
Execute
| HSA Algorithms in Games | June 13th, 2012
PROGRAMMING MODEL
 HSA Intermediate Language: HSAIL
 Designed for parallel algorithms
 JIT compiles your algorithm to CPU or GPU hardware
– Also makes multi-core SIMD programming easy!
 High level language features
– Object-oriented programming
– Virtual functions
– Exceptions
 Debugging
 SysCall support
– I/O
EXAMPLE ALGORITHMS
| HSA Algorithms in Games | June 13th, 2012
PHYSICS
 Current GPGPU physics solutions only output to
the renderer
 With HSA you can simulate physics on the GPU
and get the results back in the same frame
 Use hardware acceleration to compute physics for
gameplay objects
 Reduced CPU load
 More objects, higher fidelity
| HSA Algorithms in Games | June 13th, 2012
FRUSTUM CULLING
 Videogames tend to be GPU-bound
 Avoid rendering what cannot be seen
 Cull objects outside the camera viewport
– Test the bounding box of every object against
the camera frustum
– Currently done on the CPU
– Lots of vector math
– Can be computed completely in parallel!
 CPU needs the results immediately
– HSA will allow low-latency execution
| HSA Algorithms in Games | June 13th, 2012
OCCLUSION CULLING
 Objects may be hidden behind others: Occlusion
 Final per-pixel occlusion is only known after
rendering the scene
 Approximate occlusion by rendering low-detail
geometry
– This kind of occlusion culling is currently being
done on CPU or on SPUs
– Rendering is better suited to GPUs
 HSA solution:
– Software rasterization in Compute on the GPU
– HSA does not yet expose graphics pipeline 
– Still much faster than a multicore CPU
Software occlusion culling in Battlefield 3
| HSA Algorithms in Games | June 13th, 2012
SORTING
 Typically several long lists per frame need sorting
 Sorting on the GPU using a parallel sort algorithm
– Ken Batcher: Bitonic or Odd-even mergesort
 Copy overhead currently negates the performance
advantage of using a GPU sorting algorithm
 HSA solution:
– Unified Address Space
– GPU can sort in-place in system memory
| HSA Algorithms in Games | June 13th, 2012
ASSET DECOMPRESSION
 Game assets are stored compressed on disk
 Decompression is expensive
 The usage of some compression algorithms is
prevented by CPU speed
 Games are moving away from loading screens
 An APU with Unified Address Space
– Can be used to decompress new assets
without taxing the CPU or discrete GPU
– Perhaps even use HSAIL I/O to read from disk
– A better streaming experience for gamers
| HSA Algorithms in Games | June 13th, 2012
PATHFINDING
 Some strategy games simulate thousands of units
 Pathfinding over complex terrain with thousands of
moving units is very expensive
 Clever approximate solutions are often used
– Supreme Commander 2 “Flow field”
 GPGPU pathfinding with HSA
– Use one GPU thread per unit to do a deep
search for an optimal path
– With HSA such an algorithm can page all
requisite data from system memory and write
back found paths
– APU could be fully saturated with pathfinding
without impacting framerate
| HSA Algorithms in Games | June 13th, 2012
CONCLUSION
 Many algorithms in games are suitable for offloading to the GPU
 Heterogeneous Systems Architecture solves two major obstacles
– Latency
– Memory access
 HSAIL allows for entirely new kinds of GPGPU programs
 APUs can be used to offload the CPU
 HSA will finally make GPUs available to developers as full-featured co-processors
| HSA Algorithms in Games | June 13th, 2012
THANK YOU
 Any questions?
| HSA Algorithms in Games | June 13th, 2012
Disclaimer & Attribution
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions
and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited
to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product
differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no
obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to
make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes.
NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO
RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS
INFORMATION.
ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY
DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL
OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF
EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in
this presentation are for informational purposes only and may be trademarks of their respective owners.
The contents of this presentation were provided by individual(s) and/or company listed on the title page. The information and
opinions presented in this presentation may not represent AMD’s positions, strategies or opinions. Unless explicitly stated, AMD is
not responsible for the content herein and no endorsements are implied.

Weitere Àhnliche Inhalte

Was ist angesagt?

CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentationVishal Singh
 
Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wallugur candan
 
Graphics processing unit
Graphics processing unitGraphics processing unit
Graphics processing unitShashwat Shriparv
 
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...AMD Developer Central
 
Graphic Processing Unit (GPU)
Graphic Processing Unit (GPU)Graphic Processing Unit (GPU)
Graphic Processing Unit (GPU)Jafar Khan
 
GPU - Basic Working
GPU - Basic WorkingGPU - Basic Working
GPU - Basic WorkingNived R Nambiar
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
 
CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...
CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...
CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...AMD Developer Central
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...AMD Developer Central
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...AMD Developer Central
 
Apu fc & s project
Apu fc & s projectApu fc & s project
Apu fc & s projectNeelesh Vaish
 
PG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU devicePG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU deviceKohei KaiGai
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
 
Computer Graphic System
Computer Graphic SystemComputer Graphic System
Computer Graphic Systemhassan arshad
 
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...AMD Developer Central
 
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...AMD Developer Central
 
MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder
MM-4085, Designing a game audio engine for HSA, by Laurent BetbederMM-4085, Designing a game audio engine for HSA, by Laurent Betbeder
MM-4085, Designing a game audio engine for HSA, by Laurent BetbederAMD Developer Central
 
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...AMD Developer Central
 
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderPT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderAMD Developer Central
 
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...AMD Developer Central
 

Was ist angesagt? (20)

CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentation
 
Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wall
 
Graphics processing unit
Graphics processing unitGraphics processing unit
Graphics processing unit
 
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
 
Graphic Processing Unit (GPU)
Graphic Processing Unit (GPU)Graphic Processing Unit (GPU)
Graphic Processing Unit (GPU)
 
GPU - Basic Working
GPU - Basic WorkingGPU - Basic Working
GPU - Basic Working
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...
CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...
CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
 
Apu fc & s project
Apu fc & s projectApu fc & s project
Apu fc & s project
 
PG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU devicePG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU device
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
Computer Graphic System
Computer Graphic SystemComputer Graphic System
Computer Graphic System
 
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
 
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
 
MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder
MM-4085, Designing a game audio engine for HSA, by Laurent BetbederMM-4085, Designing a game audio engine for HSA, by Laurent Betbeder
MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder
 
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
 
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderPT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
 
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...
CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley...
 

Ähnlich wie AMD 2012: HSA in Gaming

Cg 4278
Cg 4278Cg 4278
Cg 4278Abu85
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit pptSandeep Singh
 
GPU Computing
GPU ComputingGPU Computing
GPU ComputingKhan Mostafa
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDASavith Satheesh
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecturemohamedragabslideshare
 
Graphics Processing Unit: An Introduction
Graphics Processing Unit: An IntroductionGraphics Processing Unit: An Introduction
Graphics Processing Unit: An Introductionijtsrd
 
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONSA SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONScseij
 
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...AMD Developer Central
 
Final lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbFinal lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbr Skip
 
GPGPU in Commercial Software: Lessons From Three Cycles of the Adobe Creative...
GPGPU in Commercial Software: Lessons From Three Cycles of the Adobe Creative...GPGPU in Commercial Software: Lessons From Three Cycles of the Adobe Creative...
GPGPU in Commercial Software: Lessons From Three Cycles of the Adobe Creative...Kevin Goldsmith
 
AMD Heterogeneous Uniform Memory Access
AMD Heterogeneous Uniform Memory AccessAMD Heterogeneous Uniform Memory Access
AMD Heterogeneous Uniform Memory AccessAMD
 
Map SMAC Algorithm onto GPU
Map SMAC Algorithm onto GPUMap SMAC Algorithm onto GPU
Map SMAC Algorithm onto GPUZhengjie Lu
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)Kohei KaiGai
 
Physical computing and iot programming final with cp sycs sem 3
Physical computing and iot programming final with cp sycs sem 3Physical computing and iot programming final with cp sycs sem 3
Physical computing and iot programming final with cp sycs sem 3WE-IT TUTORIALS
 
CMP208_Unleash Your Graphics Solutions with the Flexibility of Elastic GPUs
CMP208_Unleash Your Graphics Solutions with the Flexibility of Elastic GPUsCMP208_Unleash Your Graphics Solutions with the Flexibility of Elastic GPUs
CMP208_Unleash Your Graphics Solutions with the Flexibility of Elastic GPUsAmazon Web Services
 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Editor IJARCET
 

Ähnlich wie AMD 2012: HSA in Gaming (20)

Cg 4278
Cg 4278Cg 4278
Cg 4278
 
HSA Overview
HSA Overview HSA Overview
HSA Overview
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit ppt
 
GPU Computing
GPU ComputingGPU Computing
GPU Computing
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDA
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
 
Gpu
GpuGpu
Gpu
 
Gpu
GpuGpu
Gpu
 
Graphics Processing Unit: An Introduction
Graphics Processing Unit: An IntroductionGraphics Processing Unit: An Introduction
Graphics Processing Unit: An Introduction
 
Amd fusion apus
Amd fusion apusAmd fusion apus
Amd fusion apus
 
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONSA SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
 
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
 
Final lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbFinal lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tb
 
GPGPU in Commercial Software: Lessons From Three Cycles of the Adobe Creative...
GPGPU in Commercial Software: Lessons From Three Cycles of the Adobe Creative...GPGPU in Commercial Software: Lessons From Three Cycles of the Adobe Creative...
GPGPU in Commercial Software: Lessons From Three Cycles of the Adobe Creative...
 
AMD Heterogeneous Uniform Memory Access
AMD Heterogeneous Uniform Memory AccessAMD Heterogeneous Uniform Memory Access
AMD Heterogeneous Uniform Memory Access
 
Map SMAC Algorithm onto GPU
Map SMAC Algorithm onto GPUMap SMAC Algorithm onto GPU
Map SMAC Algorithm onto GPU
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
 
Physical computing and iot programming final with cp sycs sem 3
Physical computing and iot programming final with cp sycs sem 3Physical computing and iot programming final with cp sycs sem 3
Physical computing and iot programming final with cp sycs sem 3
 
CMP208_Unleash Your Graphics Solutions with the Flexibility of Elastic GPUs
CMP208_Unleash Your Graphics Solutions with the Flexibility of Elastic GPUsCMP208_Unleash Your Graphics Solutions with the Flexibility of Elastic GPUs
CMP208_Unleash Your Graphics Solutions with the Flexibility of Elastic GPUs
 
Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045Volume 2-issue-6-2040-2045
Volume 2-issue-6-2040-2045
 

KĂŒrzlich hochgeladen

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 

KĂŒrzlich hochgeladen (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

AMD 2012: HSA in Gaming

  • 1. GPGPU ALGORITHMS IN GAMES How Heterogeneous Systems Architecture can be leveraged to optimize algorithms in video games Matthijs De Smedt Nixxes Software B.V. Lead Graphics Programmer
  • 2. | HSA Algorithms in Games | June 13th, 2012 CONTENTS  A short introduction  Current usage of GPGPU in games  Heterogeneous Systems Architecture  Examples made possible by HSA
  • 4. | HSA Algorithms in Games | June 13th, 2012 VIDEOGAMES  Games are near real-time simulations  Response time is key  Most systems run in sync with the output frequency – Rendering 60 frames per second – Allows for 16ms of processing time  Framerate is limited either by: – GPU – CPU – Display (VSync) CPU GPU Input Simulate Render Render
  • 5. | HSA Algorithms in Games | June 13th, 2012 HARDWARE  Typical hardware target for PC games: – One multicore CPU – One GPU  Multiple GPUs: CrossFire – Transparent to the application – Driver alternates frames between GPUs  GPUs are becoming more general purpose: – General Purpose GPU algorithms (GPGPU) CrossFire
  • 7. | HSA Algorithms in Games | June 13th, 2012 INTRODUCTION TO GPGPU  Rendering is a sequence of parallel algorithms  GPUs are great at parallel computation  Evolution of hardware and software to general purpose  First GPGPU was accomplished with programmable rendering – DirectX – OpenGL  Second generation using dedicated GPGPU APIs: – CUDA – OpenCL – DirectCompute  Third generation of GPGPU on the way: – Heterogeneous Systems Architecture
  • 8. | HSA Algorithms in Games | June 13th, 2012 GPGPU IN GAMES  Some GPGPU algorithms are being used in games right now. For example: – Physics  Particles  Fluid simulation  Destruction – Specialized graphics algorithms  Post-processing  All these algorithms drive visual effects GPU particle system by Fairlight
  • 9. | HSA Algorithms in Games | June 13th, 2012 CURRENT PHYSICS EXAMPLE  GPGPU particle simulation using DirectCompute  Great for simulating thousands of visible particles  Results of simulation are never copied back to CPU – Can not interfere with gameplay – Not synced in networked games  Example: Smoke particles that affect game AI CPU GPU Call GPU Simulate particles Render particles
  • 10. | HSA Algorithms in Games | June 13th, 2012 GPGPU LIMITATIONS  Why isn’t GPGPU used more for non-graphics?  Latency – DirectX has many layers and buffers – DirectX commands are buffered up to multiple frames – Actual execution on the GPU is delayed  Copy overhead – GPU cannot directly access application memory – Must copy all data from and to the application  Functionality – Constrained programming models
  • 12. | HSA Algorithms in Games | June 13th, 2012 HETEROGENEOUS SYSTEMS ARCHITECTURE Hardware Software  "Drivers" – HSA provides a new, thin Compute API – Very low latency – Unified Address Space – Exposes more hardware capabilities  HSA Intermediate Language – Virtual ISA – Introduces CPU programming features to the GPU  New features on discrete GPUs  Accelerated Processing Unit – Next generation processor – Multiple CPU and GPU cores on the same die – Shared memory access – Soon to be as widespread as multicore CPUs  New hardware and software
  • 13. | HSA Algorithms in Games | June 13th, 2012 USING THE APU  Distinction between two hardware configurations  APU without discrete GPU – Found in many laptops, soon in many desktops – Use the on-die GPU for rendering  APU with discrete GPU: – Hard-core gamers will still use discrete GPUs – Asymmetrical CrossFire – Or: Dedicate the on-die GPU to Compute algorithms  Could result in massive speedup of algorithms  Using SIMD co-processors to offload the CPU is familiar to PS3 developers
  • 14. | HSA Algorithms in Games | June 13th, 2012 COPY OVERHEAD  Current Compute APIs require the application to explicitly copy all input and output memory – Copying can easily takes longer than processing on CPU! – Only small datasets or very expensive computations benefit from GPGPU  HSA introduces a Unified Address Space for CPU and GPU memory – CPU pointers on the GPU – Virtual memory on the GPU  Paging over PCI-Express (discrete) or shared memory controller (APU) – Fully coherent – Will make GPGPU an option for many more algorithms
  • 15. | HSA Algorithms in Games | June 13th, 2012 LATENCY  DirectX commands are buffered  When the GPU is fully loaded this buffer is saturated  Delay between scheduling and executing a GPGPU program on a busy GPU can take multiple frames – Results will be several frames behind – Game simulation needs all objects to be in sync  GPGPU is currently impractical to use for anything but visual effects
  • 16. | HSA Algorithms in Games | June 13th, 2012
  • 17. | HSA Algorithms in Games | June 13th, 2012
  • 18. | HSA Algorithms in Games | June 13th, 2012
  • 19. | HSA Algorithms in Games | June 13th, 2012
  • 20. | HSA Algorithms in Games | June 13th, 2012 LATENCY  HSA’s new Compute API will reduce latency  How to deal with a saturated GPU?  A second GPU – Dedicate the APU to Compute – Virtually no latency  HSA feature: Graphics pre-emption – Context switching on the GPU  Interrupt a graphics task (typically a large command list)  Execute Compute algorithm  Switch back to graphics – Can be used both on discrete GPUs or on the APU  Choose the solution best suited to your needs
  • 21. | HSA Algorithms in Games | June 13th, 2012 APU USAGE EXAMPLE GPU CPU HSA Frame Schedule DirectCompute Execute Execute
  • 22. | HSA Algorithms in Games | June 13th, 2012 PROGRAMMING MODEL  HSA Intermediate Language: HSAIL  Designed for parallel algorithms  JIT compiles your algorithm to CPU or GPU hardware – Also makes multi-core SIMD programming easy!  High level language features – Object-oriented programming – Virtual functions – Exceptions  Debugging  SysCall support – I/O
  • 24. | HSA Algorithms in Games | June 13th, 2012 PHYSICS  Current GPGPU physics solutions only output to the renderer  With HSA you can simulate physics on the GPU and get the results back in the same frame  Use hardware acceleration to compute physics for gameplay objects  Reduced CPU load  More objects, higher fidelity
  • 25. | HSA Algorithms in Games | June 13th, 2012 FRUSTUM CULLING  Videogames tend to be GPU-bound  Avoid rendering what cannot be seen  Cull objects outside the camera viewport – Test the bounding box of every object against the camera frustum – Currently done on the CPU – Lots of vector math – Can be computed completely in parallel!  CPU needs the results immediately – HSA will allow low-latency execution
  • 26. | HSA Algorithms in Games | June 13th, 2012 OCCLUSION CULLING  Objects may be hidden behind others: Occlusion  Final per-pixel occlusion is only known after rendering the scene  Approximate occlusion by rendering low-detail geometry – This kind of occlusion culling is currently being done on CPU or on SPUs – Rendering is better suited to GPUs  HSA solution: – Software rasterization in Compute on the GPU – HSA does not yet expose graphics pipeline  – Still much faster than a multicore CPU Software occlusion culling in Battlefield 3
  • 27. | HSA Algorithms in Games | June 13th, 2012 SORTING  Typically several long lists per frame need sorting  Sorting on the GPU using a parallel sort algorithm – Ken Batcher: Bitonic or Odd-even mergesort  Copy overhead currently negates the performance advantage of using a GPU sorting algorithm  HSA solution: – Unified Address Space – GPU can sort in-place in system memory
  • 28. | HSA Algorithms in Games | June 13th, 2012 ASSET DECOMPRESSION  Game assets are stored compressed on disk  Decompression is expensive  The usage of some compression algorithms is prevented by CPU speed  Games are moving away from loading screens  An APU with Unified Address Space – Can be used to decompress new assets without taxing the CPU or discrete GPU – Perhaps even use HSAIL I/O to read from disk – A better streaming experience for gamers
  • 29. | HSA Algorithms in Games | June 13th, 2012 PATHFINDING  Some strategy games simulate thousands of units  Pathfinding over complex terrain with thousands of moving units is very expensive  Clever approximate solutions are often used – Supreme Commander 2 “Flow field”  GPGPU pathfinding with HSA – Use one GPU thread per unit to do a deep search for an optimal path – With HSA such an algorithm can page all requisite data from system memory and write back found paths – APU could be fully saturated with pathfinding without impacting framerate
  • 30. | HSA Algorithms in Games | June 13th, 2012 CONCLUSION  Many algorithms in games are suitable for offloading to the GPU  Heterogeneous Systems Architecture solves two major obstacles – Latency – Memory access  HSAIL allows for entirely new kinds of GPGPU programs  APUs can be used to offload the CPU  HSA will finally make GPUs available to developers as full-featured co-processors
  • 31. | HSA Algorithms in Games | June 13th, 2012 THANK YOU  Any questions?
  • 32.
  • 33. | HSA Algorithms in Games | June 13th, 2012 Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes. NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in this presentation are for informational purposes only and may be trademarks of their respective owners. The contents of this presentation were provided by individual(s) and/or company listed on the title page. The information and opinions presented in this presentation may not represent AMD’s positions, strategies or opinions. Unless explicitly stated, AMD is not responsible for the content herein and no endorsements are implied.