This document discusses heterogeneous systems architecture and its potential to enable technologies for virtual reality environments like holodecks. It provides an overview of holodeck enabling technologies such as computational photography, directional audio, natural user interfaces, and augmented reality. It then discusses how heterogeneous systems architecture can accelerate these technologies by allowing more flexible partitioning of workloads between the CPU and GPU for improved performance and energy efficiency. As an example, it analyzes how HSA could improve the performance of face detection algorithms by offloading certain stages to the GPU. Overall, the document argues that HSA is key to realizing the advanced computing capabilities needed for future immersive virtual environments.
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Â
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
1. HETEROGENEOUS SYSTEMS ARCHITECTURE:
THE NEXT AREA OF COMPUTING INNOVATION
CASE STUDY: THE HOLODECK
Dr. Lisa Su
Senior Vice President and GM, Global Business Units,
AMD
ISSCC Conference
February 18, 2013
2. CHALLENGES TO MOOREâS LAW SCALING
Area Scaling by Technology Generation Cost Per Transistor Scaling
1.0 1.0
Normalized Cost/Transistor
0.8 0.8
Normalized Area
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
45nm 40nm 32nm 28nm 20nm 20 45nm 40nm 32nm 28nm 20nm 20
FinFET FinFET
ď§ Lithography challenges begin severely limiting area scaling at 20nm node
â Fewer 1X metals due to cost
â Less aggressive feature scaling due to lithography challenges
ď§ Compounded by rapidly increasing lithography costs
â 28 ď 20nm transition is inflection point with dual exposure
â No cost / transistor crossover for first time at 28 ď 20nm transition
2 | ISSCC Keynote | February 18th, 2013
3. A PARADIGM SHIFTâŚ
Microprocessor Advancement
CPU
Single-Core Multi-Core Heterogeneous
Era Era Systems Era
High-level
Heterogeneous programmable
Computing
OpenCL/DX
driver-based
Homogeneous programs
Programmability
Computing
Advancement
GPU
Graphics
driver-based
programs
Throughput Performance Accelerator
3 | ISSCC Keynote | February 18th, 2013
5. ARCHITECTURES â A HISTORICAL PERSPECTIVE
Legacy Processing Era Surround Computing Era
Single Core CPUs
Traditionally Optimized Platforms
Multi-Core CPUs/GPUs
APUs and legacy SOC
Heterogeneous Architectures
1981 1990s 2000s 2010s
5 | ISSCC Keynote | February 18th, 2013
6. CHANGING THE THINKING, CHANGING THE GAME
HSA is designed to make the GPU hardware
directly accessible to the software, using the high
level languages programmers already in use on
the CPU
ď§ C, C++, Java, PythonâŚeven JavaScript, HTML5
ď§ ISA agnostic â e.g., x86, 64-bit ARM, Radeon, Mali
GPU becomes a peer processor to the CPU in
terms of system integration
ď§ Full programming language features
ď§ Shared virtual memory: pointer is a pointer
ď§ Coherency
ď§ Context switching
HSA Foundation â an
industry-wide initiative
6 | ISSCC Keynote | February 18th, 2013
8. EFFECTIVE COMPUTE OFFLOAD
APU Accelerated HSA Accelerated Processing Unit
Software Applications
Data Parallel Workloads
Serial and Task
Parallel Workloads
Made easy by HSA
Unleash the best compute elements depending on task
8 | ISSCC Keynote | February 18th, 2013
9. BRINGING IT ALL TOGETHER
MOTION DSP 720P
Power Performance
35 W 25 fps
30 W
DRAM 20 fps
25 W
NB+GPU DRAM
20 W 15 fps
NB+GPU
15 W
10 fps
10 W CPU Cores
CPU Cores 5 fps
5W
0W 0 fps
CPU CPU+GPU CPU CPU+GPU
Synergistic use of GPU compute
+ shared memory >4.0X Better Energy
= Efficiency1
lower power and higher performance
AMD internal testing: AMD E2-3200 APU (2 cores @ 2400Mhz, GPU:2 CU @ 444Mhz),
Windows 7 OS, MotionDSP vReveal Applications 720P MP4 input
(http://www.vreveal.com/stabilization)
9 | ISSCC Keynote | February 18th, 2013
10. TODAYâS DISCUSSION: FROM SURROUND COMPUTING TO
ENABLING THE HOLODECK
1. A fully featured Holodeck is
still many years away
2. Today our discussion will:
ď§ Establish a Holodeck framework
ď§ Identify Holodeck enabling technologies
ď§ Discuss how Heterogeneous Systems
Architecture (HSA) accelerates these
technologies
ď§ Undertake an HSA deep dive on one of
these enabling technologies
ď§ Look at how new dedicated processors
will enable Holodeck functionality
10 | ISSCC Keynote | February 18th, 2013
11. WHAT IS A HOLODECK?
11 | ISSCC Keynote | February 18th, 2013
12. THE HOLODECK FRAMEWORK:
AN EVOLUTION OF SURROUND COMPUTING
ď§ Natural User Interfaces
ď§ Context Computing
ď§ 360 Degree Virtual
Environments
12 | ISSCC Keynote | February 18th, 2013
13. HOLODECK ENABLING TECHNOLOGIES:
PROFOUND IMPLICATIONS FOR COMPUTER ARCHITECTURE
Computational Photography
ď§ Delivering seamless and immersive video environments
Directional Audio
ď§ Using audio to enhance immersion and realism of our environments
Natural User Interfaces
ď§ Enabling realistic, natural human
communication
Context Computing
ď§ Delivering an intuitive understanding
of the userâs needs in real time
Augmented Reality
ď§ Bringing it all together â combining the
real and the virtual
13 | ISSCC Keynote | February 18th, 2013
14. COMPUTATIONAL PHOTOGRAPHY
360 DEGREE VISUAL ENVIRONMENTS, PHOTOSTITCHING, PERIPHERAL VISION AND HSA
ď§ Mapping real life scenes through finite images
ď§ Photo stitching of tiled environments and
perceptual correction
ď§ Detect interest points & match features
ď§ Projecting geometry with point features
using algorithms like RANSAC
ď§ Image processing to account for
curved screen surfaces
ď§ Modulate brightness to account for
peripheral vision
HSA presents a unified view of the
system with shared memory so CPU and
GPU acceleration in the entire process
14 | ISSCC Keynote | February 18th, 2013
15. DIRECTIONAL AUDIO
ď§ Couples computationally demanding 3D
audio and spatialization effects with
"always on" background processing like
(VAD) Voice Activity Detection
ď§ Voice activity detection is best
implemented with special audio
processors and acceleration
techniques
ď§ Spatialization effects such as
âConvolution Reverbâ are best
done with GPU acceleration
HSA enables seamless
integration of CPU and GPU
acceleration with other
independent accelerators
15 | ISSCC Keynote | February 18th, 2013
16. NATURAL USER INTERFACES
ď§ Speech Recognition:
ď§ Background processing â echo
cancellation & noise suppression
ď§ Audio feature extraction
ď§ Voice pattern recognition through
Markov model or similar algorithm
ď§ Gesture Recognition:
ď§ Frame preprocessing & filtering
ď§ Optical flow or object tracking
ď§ Sophisticated computer vision
algorithms to delineate the hand or
body parts from the background
NUI algorithms all benefit from
CPU/GPU and audio processors to
efficiently perform these functions at
the lowest power
16 | ISSCC Keynote | February 18th, 2013
17. CONTEXT COMPUTING
BIOMETRICS EXAMPLE
⢠Facial Recognition:
⢠Face detection (is there a face) â
GPU acceleration
⢠Face identification (pattern
matching through algorithms like
Haar face detection) â CPU and
GPU acceleration
⢠Validation through blink detection
(make sure it is a real face) â
GPU acceleration
HSA enables mix and match of the best
acceleration for each phase of the
process
17 | ISSCC Keynote | February 18th, 2013
18. AUGMENTED REALITY
⢠Image Registration:
⢠Relies on robust and fast feature
detection â benefits from
CPU/GPU acceleration
⢠Object Tracking:
⢠Relies on âoptical flowâ algorithm
â benefits from CPU/GPU
acceleration
⢠Image Composition:
⢠Once information exists from the
above, becomes a classic
graphics rendering use case
The building blocks of HSA enable the
augmented reality world.
18 | ISSCC Keynote | February 18th, 2013
19. THE WAY FORWARD
ď§ Many technologies required to
enable our vision
â Heterogeneous engines that
accelerate key client and server
workloads
â Datacenters optimized for
latency, scalability, and
efficiency
â Processors optimized for new
and emerging workloads
â Active research into new
algorithms
19 | ISSCC Keynote | February 18th, 2013
20. ENABLING TECHNOLOGY DEEP DIVE:
ACCELERATING NATURAL USER INTERFACES (HAAR
FACE DETECTION) WITH HETEROGENEOUS
SYSTEMS ARCHITECTURE
21. LOOKING FOR FACES IN ALL THE RIGHT PLACES
21 | ISSCC Keynote | February 18th, 2013
22. LOOKING FOR FACES IN ALL THE RIGHT PLACES
Quick HD Calculations
Search square = 21 x 21
Pixels = 1920 x 1080 = 2,073,600
Search squares = 1900 x 1060 = ~2 Million
22 | ISSCC Keynote | February 18th, 2013
23. LOOKING FOR DIFFERENT SIZE FACES
BY SCALING THE VIDEO FRAME
23 | ISSCC Keynote | February 18th, 2013
24. LOOKING FOR DIFFERENT SIZE FACES
BY SCALING THE VIDEO FRAME
More HD Calculations
70% scaling in H and V
Total Pixels = 4.07 Million
Search squares = 3.8 Million
24 | ISSCC Keynote | February 18th, 2013
25. HAAR CASCADE STAGES
Feature k
Feature l Stage N
Feature m
Face still
Yes possible?
Feature p
No
Feature r Stage N+1
Feature q REJECT
FRAME
25 | ISSCC Keynote | February 18th, 2013
26. 22 CASCADE STAGES, EARLY OUT BETWEEN EACH
FACE
STAGE 1 STAGE 2 STAGE 21 STAGE 22 CONFIRMED
NO FACE
Final HD Calculations Calculation Rate
Search squares = 3.8 million 30 frames/sec = 1.4TCalcs/second
Average features per square = 124 60 frames/sec = 2.8TCalcs/second
Calculations per feature = 100
Calculations per frame = 47 GCalcs âŚand this only gets front-facing faces
26 | ISSCC Keynote | February 18th, 2013
28. UNBALANCING DUE TO EXITS IN EARLIER CASCADE STAGES
Live
Dead
ď§ When running on the GPU, we run each search rectangle on a separate
work item
ď§ Early out algorithms, like HAAR, exhibit divergence between work items
â Some work items exit early
â Their neighbors continue
â SIMD packing suffers as a result
28 | ISSCC Keynote | February 18th, 2013
30. PERFORMANCE CPU-VS-GPU
AMD A10-4600M APU (6CU@497Mhz, 4 cores@2700Mhz)
12
CPU HSA GPU
10
8
Images/Sec
6
4
2
0
0 1 2 3 4 5 6 7 8 22
Number of Cascade Stages on GPU
AMD A10 4600M APU with Radeon⢠HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G,
6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL⢠1.1 (873.1)
30 | ISSCC Keynote | February 18th, 2013
31. HAAR SOLUTION
RUN DIFFERENT CASCADES ON GPU AND CPU
By seamlessly sharing data between CPU and GPU,
HSA allows the right processor to handle its appropriate
workload
+2.5x
-2.5x
INCREASED DECREASED ENERGY
PERFORMANCE PER FRAME
31 | ISSCC Keynote | February 18th, 2013
32. APPLICATION ACCELERATION USING HSA
Gesture recognition 12x
Photo indexing 10x
Voice recognition 10x
Visual Search 9x
Audio search 5x
Stereo vision 4x
Video stabilization 4x
Face detect 2x
0 2 4 6 8 10 12 14
Acceleration vs. CPU
AMD estimates Source:AMD Whitepaper, Accelerating Consumer/Prosumer Multimedia with HSA, June 2012
32 | ISSCC Keynote | February 18th, 2013
33. HSA EVOLUTION
Llano Trinity Kaveri Next Gen
Physical Optimized Architectural System
Integration Platforms Integration Integration
Integrate CPU & GPU GPU Compute C++ Unified Address Space GPU compute
in silicon support for CPU and GPU context switch
GPU uses pageable
Unified Memory GPU graphics
User mode scheduling system memory via
Controller pre-emption
CPU pointers
Common Bi-Directional Power
Fully coherent memory
Manufacturing Mgmt between CPU Quality of Service
between CPU & GPU
Technology and GPU
33 | ISSCC Keynote | February 18th, 2013
34. HSA PROGRAMMABILITY ADVANTAGE
Unified Programming Models Domain-
HSA OpenCL, C++ DX11, Specific
C, C++, Java ⌠AMP, Java8 ⌠OpenGL ⌠Ext / APIs
Foundation
HSA Intermediate Language (HSAIL)
Compute Acceleration Graphics Acceleration
⢠Works with todayâs programming models and languages
⢠Architected to enable CPU like programmability
⢠Promotes development and adoption of extended standards
⢠Write Once Run Anywhere â with Performance
34 | ISSCC Keynote | February 18th, 2013
35. CONCLUSION
ď§ The age of traditional computing is
dead.
ď§ A paradigm shift in processing has
brought about the Heterogeneous
Systems Era
ď§ HSA will enable us to dramatically
scale processing power while
increasing power efficiency
ď§ The Holodeck still years away, but
HSA and dedicated hardware
blocks will accelerate and enable
technologies as they emerge
35 | ISSCC Keynote | February 18th, 2013
36. ACKNOWLEDGEMENTS
ď§ Bill Herz
ď§ Phil Rogers
ď§ Marty Johnson
ď§ Chris Hook
ď§ Sumant Subramanian
36 | ISSCC Keynote | February 18th, 2013
38. DISCLAIMER
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and
typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to
product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences
between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or
otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to
time to the content hereof without obligation of AMD to notify any person of such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO
RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN
NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES
ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF
SUCH DAMAGES.
ATTRIBUTION
Š 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, Radeon, and combinations thereof
are trademarks of Advanced Micro Devices, Inc. Other names and logos are used for informational purposes only and may
be trademarks of their respective owners.
38 | ISSCC Keynote | February 18th, 2013