Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
DESIGNING A GAME AUDIO ENGINE FOR HSA
LAURENT BETBEDER
SCEA
WHAT’S SO SPECIAL ABOUT CONSOLE GAME DEV?
NOW THAT CONSOLES MOSTLY RUN PC HARDWARE

 Extreme performance optimizations
‒ ...
GAME AUDIO DSP ON THE ACP
WHY?

 Heavy specialized DSP workloads
‒ Stuff games need badly but don’t really want to deal w...
GAME AUDIO DSP ON THE ACP
WHY NOT?

 Exotic hardware and dev environment
‒ Closed to games
‒ Closed to middleware
‒ Platf...
GAME AUDIO DSP ON THE GPU
WHY?

 Much more demand for real-time effects today and will keep growing

 CPU FLOPS likely t...
GAME AUDIO DSP ON THE GPU
WHY NOT?

 Some algorithms do not work (as) well on wide SIMD units
‒ IIR filters, ADPCM decode...
GAME AUDIO DSP ON JAGUAR
WHY?

 Well known and open x64 dev environment
‒ Middleware friendly
‒ CLANG/LLVM solid & stable...
GAME AUDIO DSP ON JAGUAR
WHY NOT?

 “Weak laptop CPU” compared to top of the line on desktop
‒ No FMA4
‒ Slow clock @ 1.6...
GAME ENGINE CODE
THIN COMPUTE

 3D audio
‒ Sound emitters (distance, directionality and size modeling)
‒ Sound listeners ...
CONCLUSIONS
 HSA + hUMA is a great combo for high perf game audio!
‒ Maximized perf per W from specialized hardware (CPU ...
AUDIO SYNTHESIZER SCHEDULING IN HSA

11 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
DISCLAIMER & ATTRIBUTION

The information presented in this document is for informational purposes only and may contain te...
Nächste SlideShare
Wird geladen in …5
×

MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder

6.570 Aufrufe

Veröffentlicht am

Presentation MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder at the AMD Developer Summit (APU13) November 11-13, 2013.

Veröffentlicht in: Technologie, Unterhaltung & Humor
  • Als Erste(r) kommentieren

MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder

  1. 1. DESIGNING A GAME AUDIO ENGINE FOR HSA LAURENT BETBEDER SCEA
  2. 2. WHAT’S SO SPECIAL ABOUT CONSOLE GAME DEV? NOW THAT CONSOLES MOSTLY RUN PC HARDWARE  Extreme performance optimizations ‒ Until gamers opt for shorter upgrade cycles (phones/tablets business model) ? ‒ Can’t run sub-optimal audio code when competing for cycles on crowded compute queues  Custom hardware, OS, drivers and compilers ‒ To extract max perf from fixed hardware ‒ Helps lengthening platform life time ‒ “But but… where’s my OpenCL runtime?”  Low latency ‒ Music games on consoles need it as much as professional music prod software on desktop ‒ But is much harder to achieve reliably when a system is constantly overloaded 2 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
  3. 3. GAME AUDIO DSP ON THE ACP WHY?  Heavy specialized DSP workloads ‒ Stuff games need badly but don’t really want to deal with ‒ Best fit for dedicated and/or fixed function hardware ‒ Codecs ‒ ‒ ‒ ‒ CELP codecs -> party chat 100s of MP3/AT9/AAC decode instances Huge impact on game assets footprint, down/load times Optional output bitstream encoding (AC3/DTS) ‒ Voice recognition ‒ Echo cancelation  Platform wide IP licensing levels the playing field ‒ Good for indy developers ‒ And good for the platform!  Available via asynchronous secure system APIs 3 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
  4. 4. GAME AUDIO DSP ON THE ACP WHY NOT?  Exotic hardware and dev environment ‒ Closed to games ‒ Closed to middleware ‒ Platform specific  Asynchronous interface ‒ Can’t have sequential interleaving of DSP back and forth between CPU and ACP w/o latency buildup ‒ But ultimately, we want the DSP pipeline to be data driven (by artists who know nothing about this) ‒ Modularity  Slow clock rate @ 800MHz, very limited SIMD and no FP support ‒ Tough sell against Jaguar for many DSP algorithms ‒ Very tight local memory shared by multiple DSP cores  Already pretty busy with codec loads and system tasks 4 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
  5. 5. GAME AUDIO DSP ON THE GPU WHY?  Much more demand for real-time effects today and will keep growing  CPU FLOPS likely to stagnate and could even decline in HSA as CUs takes over SIMD workloads  Flexibility: some games are CPU bound, others are GPU bound…  hUMA is a game changer (removes NUMA’s main bottleneck: GPU write back)  Compute queues with prioritized scheduling and even some form of preemption  Many real-time audio DSP algorithms work well on wide SIMD units ‒ FFT convolution (spectral processing in general) ‒ Mixing, resampling, wave shaping, etc…  Mostly coalesced mem accesses  Low/med bandwidth (< 1GB/s) 5 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
  6. 6. GAME AUDIO DSP ON THE GPU WHY NOT?  Some algorithms do not work (as) well on wide SIMD units ‒ IIR filters, ADPCM decodes, dynamics: data recursion causes thread interdependencies within wavefronts ‒ Typical AAA game runs 1000s of biquads at various stages in the filtergraph  Workloads may require batch voice processing to achieve high CU efficiency ‒ Build 2D grids (channels x samples) or 3D grids (channels x subbands x samples) ‒ Swizzling is key but watch out for runtime cost as SIMD widens (static vs dynamic)  Batch processing goes against free form MaxMSP model artists are pushing for ‒ Unique DSP chain for each sound “just because we can!” ‒ Data driven filtergraph and DSP pipeline  Complex prioritized scheduling & dispatching compute queues ‒ Do not prevent intermittent CU saturation caused by large graphics workloads ‒ Risky for low latency direct path audio DSP  Proprietary hardware, drivers and shader compilers (PSSL) ‒ Audio middleware will need a some incentive to move up there ‒ Most will probably stay on the CPU 6 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
  7. 7. GAME AUDIO DSP ON JAGUAR WHY?  Well known and open x64 dev environment ‒ Middleware friendly ‒ CLANG/LLVM solid & stable  Full FP unit with SSE4 support  Early PA is surprisingly good for compiled intrinsics code ‒ ~10% slower than core i7 @ same clock rate ‒ GDDR5 latency is not an issue ‒ < ~50% of 1 core @ 1.6GHz running the entire KZSF filtergraph  Only reliable solution for ultra low latency ‒ Music and rhythm games ‒ Run 100% on CPU (including decoding) 7 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
  8. 8. GAME AUDIO DSP ON JAGUAR WHY NOT?  “Weak laptop CPU” compared to top of the line on desktop ‒ No FMA4 ‒ Slow clock @ 1.6GHz (compared to typical desktop)  256bit AVX mostly useless  Possible bottleneck down the line 8 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
  9. 9. GAME ENGINE CODE THIN COMPUTE  3D audio ‒ Sound emitters (distance, directionality and size modeling) ‒ Sound listeners (mic and ear modeling) ‒ Sound geometry (collision meshes) ‒ Deeper physical modeling of sound propagation ‒ Simple ray casting (occlusion, obstruction, indirect audio) ‒ Advanced ray casting (diffraction, real-time individual early reflection tracking)  Physics ‒ Rigid body dynamics (collisions, friction, destruction) ‒ Fluid dynamics (turbulences)  Animation, special FX ‒ Inline audio sequencing and modulation ‒ Foley, coarse granular synthesis 9 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
  10. 10. CONCLUSIONS  HSA + hUMA is a great combo for high perf game audio! ‒ Maximized perf per W from specialized hardware (CPU + GPU + ACP) ‒ Our challenge is to figure out what to run where and when  ACP is a great fit for codecs and OS services ‒ But not for modular synthesis and highly customized DSP pipelines  GPU is great fit for mid/high latency DSP and high level 3D thin compute ‒ Indirect (reflected) audio ‒ Convolution reverb ‒ 3D ray casting for occlusion/obstruction/diffraction  CPU is still the best fit for everything else: ‒ Open modular synthesis frameworks and middleware ‒ Low latency audio 10 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
  11. 11. AUDIO SYNTHESIZER SCHEDULING IN HSA 11 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
  12. 12. DISCLAIMER & ATTRIBUTION The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their respective owners. 12 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL

×