This presentation summarises a method for the reproduction of multizone speech soundfields using perceptual weighting criteria. Psychoacoustic models are used to derive a space-time-frequency weighting function to control leakage of perceptually unimportant energy from the bright zone into the quiet zone. This is combined with a method for regulating the number of basis planewaves used in the reproduction to allow for an efficient implementation using a codebook of predetermined weights based on desired soundfield energy in the zones. The approach is capable of improving the mean squared error for reproduced speech in the bright zone by -10.5 decibels. Results also show that the approach leads to a significant reduction in the spatial error within the bright zone whilst requiring 65% less loudspeaker signal power for the case where the soundfield in this zone is in line with, and hence partially directed to, the quiet zone.
Multizone reproduction of speech soundfields a perceptually weighted approach - final
1. MULTIZONE
REPRODUCTION OF
SPEECH SOUNDFIELDS:
A PERCEPTUALLY
WEIGHTED APPROACH
Jacob Donley and Christian Ritz
School of Electrical, Computer and Telecommunications Engineering
ICT Research Institute & Global Challenges
University of Wollongong
2. 2
Room
How can we perceptually enhance
independent listening zones in a room?
Quiet Zone:
No reproduced
sound
Bright Zone:
Listening to
speech or
music
Loudspeakers
Known as Multizone Reproduction of Soundfields
3. 3
Aim: derive loudspeaker signals to
reproduce desired sound field in each zone
• Reproduced sound field modelled in the
(discrete) space (𝐱 = (𝒙, 𝒚)), time (𝑛),
frequency domain (𝑘) as:
𝑆 𝑤 𝐱, 𝑛, 𝑘 =
𝑙=1
𝐿
𝑑𝑙 𝑛, 𝑘, 𝑤
𝑗
4
𝐻0
1
𝑘 𝐱 𝑙 − 𝐱
𝐻 𝑚
1
is the mth order Hankel function of the first kind
𝑑𝑙 𝑘, 𝑤 are the loudspeaker signals to be derived
[1] Donley, J. & Ritz, C., “An efficient approach to dynamically weighted multizone wideband reproduction
of speech soundfields”, Proc. IEEE ChinaSIP 2015, pp. 60-64, 12-15 July 2015.
[2] W. Jin, W. B. Kleijn, and D. Virette, “Multizone soundfield reproduction using orthogonal basis
expansion,” Proc. IEEE ICASSP 2013, pp. 311–315
Solution is based on a weighted orthogonal basis expansion approach [1,2]
4. 4
http://bit.ly/WeightedMultizone
Weighting method controls leakage into
quiet zone at cost of quality in bright zone
• Multizone Occlusion
problem:
• Quiet zone in-line with
desired bright zone
• Difficult to control leakage
• Trade-off:
• Quality in Bright Zone vs
Quietness in Quiet Zone
Small weight
Large weight
Discrete:
Space
Time
Frequency
(weighted actual
soundfield function)
How quiet does the quiet
zone need to be?
5. 5
• Only need to suppress leakage in the quiet zone down to
the threshold in quiet
• Possible only if the acoustic contrast between zones is large
enough
Case 1: The Hearing Threshold
Speech
6. 6
• Key idea: a masker in the quiet zone perceptually hides
surrounding frequency components leaked from the
bright zone
• Benefit: Less control via weighting needed – improve
bright zone quality
Case 2: Spreading functions
corresponding to local masking signal
2kHz Masker
Speech
• Max. SPL - small
weight, high bright zone
quality
• Min. SPL – large weight,
low bright zone quality
• Leaked SPL – masker
allowed to remain in
quiet zone
7. 7
Considering masking - reduces spatial error
in the bright zone and SPL in quiet zone
Benefit: Perceptually optimised trade-off between
quality and leakage
• Weights chosen by comparing reproduced speech with
spreading functions
(2)
reduction
Spatial error:
Speech
Spreading
function and
hearing
threshold
𝜖 𝑏(𝑛, 𝑘)
8. 8
Experimental evaluation to validate
proposed perceptual approach
Multizone Setup:
• Full circle of 65 loudspeakers
• Loudspeaker array diameter: 3m
• Zone diameters: 60cm
(enough space for a human head)
• Zone centres are 1.2m apart
• Reproduction capable of wideband
speech
• Direction of speech causes Multizone
Occlusion Problem (𝜃 ≈ 15°).
= Hearing threshold & Spreading
function (as used in audio coding
standards)
9. 9
• 10dB improvement in MSE
• Still high quality speech in the bright zone
Reduced bright zone error from
psychoacoustic masking
Mean Squared Error (MSE): MSE =
1
𝑀 𝑛=1
𝑀
𝑌𝑤(𝑛) − 𝑌(𝑛)
2
No masking
large weight
With masking
variable weight
10. 10
Reduced bright zone spatial error
from psychoacoustic masking
Magnitude difference (A, B):
Phase difference (C, D):
Maximum spatial error reduction:
28dB
Consequence of smaller weighting:
less loudspeaker power (max.
reduction = 65 %
11. 11
Conclusion: Exploiting perceptual
weighting within multizone soundfield
reproduction results in significant
advantages
• Improved error in bright zones with no perceptual cost in
adjacent zones
• MSE of speech: -69.8dB to -80.3dB (max)
• Spatial error: -7.4dB to -31.5dB (max)
• Reduced loudspeaker power (up to 65%)
• Improved reproduction when occlusion problem is present
Questions?