Patrick Flanagan (THX): Binaural Evaluations of Rendering Methods: Using 3D Audio for Immersive Experiences

THX-NYU
Evaluation of Binaural Renderers

Goals
● Determine how different evaluative metrics impact user preference for a
renderer.
○ This can inform where specific improvements in the rendering process can be focused,
consistent with the end-goals of the application.
● Analyze the variance in performance of a sample of commercially available
binaural renderers.
● Examine the proposed methodology for the comprehensive evaluation of
binaural renderers.

Experimental Procedure
● The experiment was broken up into three phases
● Phase I assessed 3D sound localization errors: externalization, front/back
confusions, back/front confusions, and localization
○ These errors are particularly endemic in 3D audio reproduced over headphones
○ Stimuli - Three 2-second monophonic drum loops created in ProTools.
■ https://drive.google.com/file/d/1Co2-AkgMdKhRtSIVZ-gsRNbl6J3FScOs/view?usp=sharing
● Phase II assessed high-level qualitative characteristics of the resulting 3D
audio image.
● Phase III assessed total sound quality/overall preference through a forced
choice ranking of renderers.
○ Stimuli - Three 20-second surround-sound music and three 30-second surround sound movie
clips rendered for static binaural presentation.
■ https://drive.google.com/file/d/1C9Q-CJ9ALbv5u011oR_YzHx7FSTgKOMc/view?usp=sharing
● The document detailing the stimulus naming convention can be found here:
https://docs.google.com/document/d/1dY3J2TfrnGMHyyLFW-wCIvZ6Wq-Z9S4grNHbP0pXT8A/edit?usp=sharing

Methodology
● Though each renderer supported head-tracking in its native application, audio
was presented under a static binaural condition.
● Testing was performed using circumaural headphones (Sennheiser HD-650)
in a soundproof booth (NYU Dolan Isolation Booth).
● Custom software was used to administer the test and collect data.
○ Subjects indicated and submitted their responses on the graphical user interface (GUI) built in
MATLAB.

Renderers (Rendering Methodology)
00 - Google VR (Higher-Order Ambisonics)
01 - 3D Sound Labs (Higher-Order Ambisonics)
02 - VisiSonics (Direct virtualization through HRTFs)
03 - Facebook (First-Order Ambisonics)
04 - Dysonics (First-Order Ambisonics)
05 - Qualcomm (Higher-Order Ambisonics)

Subject Breakdown
● 79 Participants
● Below: Breakdown of subjects by 3D audio and VR experience

PHASE I
● Phase I consisted of four different test
○ Externalization
○ Front/Back Confusions
○ Up/Down Confusions
○ Localization
● Externalization, Front/Back and Up/Down Confusions resulted are presented in
following paper
○ https://drive.google.com/file/d/1aR_AJKjkHT-bhNolTEO80SDKV-redg_G/view?usp=sharing
● The Localization results are presented in the following paper
○ https://drive.google.com/file/d/1sB2yw3WWfeUi-cL6MAGWXPHDCCCerNQy/view?usp=sharing

Externalization
● Main Findings: Renderer, Stimulus, and Renderer*Stimulus (interaction term) were
found to have a significant on ratings of externalization. Shown below are the
estimated means output from the statistical model (see paper for more details).

Front/Back Confusions
● Main Findings: Renderer was found to have significant effect on the number
of front/back confusions. A consistent bias for reversing frontal stimuli to the
rear was found (consistent with previous literature).

Front/Back Confusions
● The grand mean reversal rate (left) and the reversal rate for each renderer (right)
as a function of azimuth is pictured below. Front/back confusions do not appear to
be azimuth-dependent. But, as seen on the right, the spread of the rates reaches a
minimum as 40/140 degrees azimuth.

Up/Down Confusions
● Main Findings: Renderer was found to have a significant effect on the number
of up/down confusions. No clear bias for up-down versus down-up reversals
was found (consistent with other literature).

Up/Down Confusions
● The reversal rate for each renderer (and the grand mean reversal rate) as a
function of azimuth is pictured below. No clear trend for the individual
renderers is discernible, but the grand mean appears to show reduced
prevalence of up/down confusions as one moves from the front/back towards
the sides of the head.

Localization
● Main Findings: Renderer was found to have a significant effect on regional
localization accuracy. On left presents renderer accuracy without correcting
for front-back confusions. On right presents renderer accuracy agnostic to
front-back confusions.

Localization - Distribution of Absolute Error
● To get an understanding of the severity of error judgements, error distributions for
each renderer are presented. Because we are concerned with accuracy agnostic to
front-back confusions, error was quantified as the number of zones off the subject’s
response was from the correct zone after correcting for confusions.

Localization - Distribution of Absolute Error by Zone
Main Findings: Localization accuracy is greater in the front of the head than at the sides of the head. The performance of
renderers at the sides of head (zone 3/9) contributes heavily to differences in localization accuracy.

Key findings - Phase I
● Rendering methodologies present significant tradeoffs in performance; no
renderer was superior in all tests.
● First-order Ambisonic renderers generally performed poorly, specifically with
regards to reversal errors and horizontal localization accuracy.
● Ratings of externalization were found to be highly content-specific, while the
other metrics were robust to changes in stimulus; externalization performs
more similarly to the sound qualities than the other localization errors.

PHASE II
● Phase II assessed five different sound quality attributes
○ Naturalness
○ Spaciousness
○ Clarity
○ Timbral Balance
○ Dialogue Intelligibility (movie stimuli only)
● Subjects were either presented movie or music stimuli throughout the entirety
of Phase II (and Phase III).
○ Music: n = 36
○ Movie: n = 31
● Renderers were presented side-by-side and the quality assessed on a
discrete 1 - 5 scale (1 = worst, 5 = best).

Naturalness
● Main Findings: Renderer and Renderer-Stimuli interaction were found to have
a significant effect on ratings on naturalness (music on left, movie on right).

Spaciousness
a significant effect on ratings on spaciousness.

Clarity
a significant effect on ratings of clarity.

Timbral Balance
a significant effect on ratings on timbral balance.

Dialogue Intelligibility (movie stimuli only)
a significant effect on ratings on dialogue intelligibility.

Key findings - Phase II
● The experimental conditions result in differences in renderer performance.
○ Assessments of sound quality attributes for each renderer are highly dependent on the
content.

PHASE III
● Phase III was a global assessment of sound quality of the renderers
● Subjects rated, in a forced-choice setting, overall preference for renderers,
resulting in a ranking from 1-6.
● Subjects were either presented movie or music stimuli, consistent which the
type selected in Phase II. These stimuli were identical to those used in Phase
II.

Overall Ranking - (6 is best, 1 is worst)
Main Findings: Renderer and Renderer-Stimulus interaction were found to have
a significant effect on total sound quality.

Key findings - Phase III
● Assessments of user preference are dependent on the content (movie vs.
music).

Conclusions
● Renderer performance is extremely variable.
● Rendering methodologies present clear tradeoffs in terms of performance and
the renderer.
○ Selecting a renderer for one’s content should be made consistent with the end-goals of the
application.

Patrick Flanagan (THX): Binaural Evaluations of Rendering Methods: Using 3D Audio for Immersive Experiences

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Patrick Flanagan (THX): Binaural Evaluations of Rendering Methods: Using 3D Audio for Immersive Experiences

Ähnlich wie Patrick Flanagan (THX): Binaural Evaluations of Rendering Methods: Using 3D Audio for Immersive Experiences (18)

Mehr von AugmentedWorldExpo

Mehr von AugmentedWorldExpo (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Patrick Flanagan (THX): Binaural Evaluations of Rendering Methods: Using 3D Audio for Immersive Experiences