A talk from the Design Track at AWE USA 2018 - the World's #1 XR Conference & Expo in Santa Clara, California May 30- June 1, 2018.
Patrick Flanagan (THX): Binaural Evaluations of Rendering Methods: Using 3D Audio for Immersive Experiences
An understanding of tools and techniques for proper rendering 3D / spatial audio. THX’s lead audio architect, Patrick Flanagan, will review the research and any updates (first research was published in October 2017, more is scheduled to be published). In addition Patrick will walk through the necessary steps and theories behind using audio as a tool in mixing and recording
http://AugmentedWorldExpo.com
2. Goals
● Determine how different evaluative metrics impact user preference for a
renderer.
○ This can inform where specific improvements in the rendering process can be focused,
consistent with the end-goals of the application.
● Analyze the variance in performance of a sample of commercially available
binaural renderers.
● Examine the proposed methodology for the comprehensive evaluation of
binaural renderers.
3. Experimental Procedure
● The experiment was broken up into three phases
● Phase I assessed 3D sound localization errors: externalization, front/back
confusions, back/front confusions, and localization
○ These errors are particularly endemic in 3D audio reproduced over headphones
○ Stimuli - Three 2-second monophonic drum loops created in ProTools.
■ https://drive.google.com/file/d/1Co2-AkgMdKhRtSIVZ-gsRNbl6J3FScOs/view?usp=sharing
● Phase II assessed high-level qualitative characteristics of the resulting 3D
audio image.
● Phase III assessed total sound quality/overall preference through a forced
choice ranking of renderers.
○ Stimuli - Three 20-second surround-sound music and three 30-second surround sound movie
clips rendered for static binaural presentation.
■ https://drive.google.com/file/d/1C9Q-CJ9ALbv5u011oR_YzHx7FSTgKOMc/view?usp=sharing
● The document detailing the stimulus naming convention can be found here:
https://docs.google.com/document/d/1dY3J2TfrnGMHyyLFW-wCIvZ6Wq-Z9S4grNHbP0pXT8A/edit?usp=sharing
4. Methodology
● Though each renderer supported head-tracking in its native application, audio
was presented under a static binaural condition.
● Testing was performed using circumaural headphones (Sennheiser HD-650)
in a soundproof booth (NYU Dolan Isolation Booth).
● Custom software was used to administer the test and collect data.
○ Subjects indicated and submitted their responses on the graphical user interface (GUI) built in
MATLAB.
6. Subject Breakdown
● 79 Participants
● Below: Breakdown of subjects by 3D audio and VR experience
7. PHASE I
● Phase I consisted of four different test
○ Externalization
○ Front/Back Confusions
○ Up/Down Confusions
○ Localization
● Externalization, Front/Back and Up/Down Confusions resulted are presented in
following paper
○ https://drive.google.com/file/d/1aR_AJKjkHT-bhNolTEO80SDKV-redg_G/view?usp=sharing
● The Localization results are presented in the following paper
○ https://drive.google.com/file/d/1sB2yw3WWfeUi-cL6MAGWXPHDCCCerNQy/view?usp=sharing
8. Externalization
● Main Findings: Renderer, Stimulus, and Renderer*Stimulus (interaction term) were
found to have a significant on ratings of externalization. Shown below are the
estimated means output from the statistical model (see paper for more details).
9. Front/Back Confusions
● Main Findings: Renderer was found to have significant effect on the number
of front/back confusions. A consistent bias for reversing frontal stimuli to the
rear was found (consistent with previous literature).
10. Front/Back Confusions
● The grand mean reversal rate (left) and the reversal rate for each renderer (right)
as a function of azimuth is pictured below. Front/back confusions do not appear to
be azimuth-dependent. But, as seen on the right, the spread of the rates reaches a
minimum as 40/140 degrees azimuth.
11. Up/Down Confusions
● Main Findings: Renderer was found to have a significant effect on the number
of up/down confusions. No clear bias for up-down versus down-up reversals
was found (consistent with other literature).
12. Up/Down Confusions
● The reversal rate for each renderer (and the grand mean reversal rate) as a
function of azimuth is pictured below. No clear trend for the individual
renderers is discernible, but the grand mean appears to show reduced
prevalence of up/down confusions as one moves from the front/back towards
the sides of the head.
13. Localization
● Main Findings: Renderer was found to have a significant effect on regional
localization accuracy. On left presents renderer accuracy without correcting
for front-back confusions. On right presents renderer accuracy agnostic to
front-back confusions.
14. Localization - Distribution of Absolute Error
● To get an understanding of the severity of error judgements, error distributions for
each renderer are presented. Because we are concerned with accuracy agnostic to
front-back confusions, error was quantified as the number of zones off the subject’s
response was from the correct zone after correcting for confusions.
15. Localization - Distribution of Absolute Error by Zone
Main Findings: Localization accuracy is greater in the front of the head than at the sides of the head. The performance of
renderers at the sides of head (zone 3/9) contributes heavily to differences in localization accuracy.
16. Key findings - Phase I
● Rendering methodologies present significant tradeoffs in performance; no
renderer was superior in all tests.
● First-order Ambisonic renderers generally performed poorly, specifically with
regards to reversal errors and horizontal localization accuracy.
● Ratings of externalization were found to be highly content-specific, while the
other metrics were robust to changes in stimulus; externalization performs
more similarly to the sound qualities than the other localization errors.
17. PHASE II
● Phase II assessed five different sound quality attributes
○ Naturalness
○ Spaciousness
○ Clarity
○ Timbral Balance
○ Dialogue Intelligibility (movie stimuli only)
● Subjects were either presented movie or music stimuli throughout the entirety
of Phase II (and Phase III).
○ Music: n = 36
○ Movie: n = 31
● Renderers were presented side-by-side and the quality assessed on a
discrete 1 - 5 scale (1 = worst, 5 = best).
18. Naturalness
● Main Findings: Renderer and Renderer-Stimuli interaction were found to have
a significant effect on ratings on naturalness (music on left, movie on right).
19. Spaciousness
● Main Findings: Renderer and Renderer-Stimuli interaction were found to have
a significant effect on ratings on spaciousness.
20. Clarity
● Main Findings: Renderer and Renderer-Stimuli interaction were found to have
a significant effect on ratings of clarity.
21. Timbral Balance
● Main Findings: Renderer and Renderer-Stimuli interaction were found to have
a significant effect on ratings on timbral balance.
22. Dialogue Intelligibility (movie stimuli only)
● Main Findings: Renderer and Renderer-Stimuli interaction were found to have
a significant effect on ratings on dialogue intelligibility.
23. Key findings - Phase II
● The experimental conditions result in differences in renderer performance.
○ Assessments of sound quality attributes for each renderer are highly dependent on the
content.
24. PHASE III
● Phase III was a global assessment of sound quality of the renderers
● Subjects rated, in a forced-choice setting, overall preference for renderers,
resulting in a ranking from 1-6.
● Subjects were either presented movie or music stimuli, consistent which the
type selected in Phase II. These stimuli were identical to those used in Phase
II.
25. Overall Ranking - (6 is best, 1 is worst)
Main Findings: Renderer and Renderer-Stimulus interaction were found to have
a significant effect on total sound quality.
26. Key findings - Phase III
● Assessments of user preference are dependent on the content (movie vs.
music).
27. Conclusions
● Renderer performance is extremely variable.
● Rendering methodologies present clear tradeoffs in terms of performance and
the renderer.
○ Selecting a renderer for one’s content should be made consistent with the end-goals of the
application.