In most of the communication systems speech is transmittes in narrowband, containing frequencies from 300 Hz to 3400 Hz. Compared with normal speech which is generally contains a perceptually significant amount of energy up to 8 kHz, this speech has a muffled quality and reduced intelligibility, particularly noticeable in sounds such as /s/ and /f/ . Speech which has been bandlimited to 8 kHz is often coded for this reason, but this requires an increase in the bit rate.
Wideband reconstruction is a scheme that adds a synthesized highband signal to narrowband speech to produce a higher quality wideband speech signal. The synthesized highband signal is based entirely on information contained in the narrowband speech, and is thus achieved at zero increase in the bit rate from a coding perspective. Wideband reconstruction can function as a post-processor to any narrowband telephone receiver, or alternatively it can be combined with any narrowband speech coder to produce a very low bit rate wideband speech coder. Applications include higher quality mobile, teleconferencing, and internet telephony.
This final project aims to simulate the bandwidth extension system using spectral shifting method for highband excitation, which is used codebook and linear mapping to estimate the envelope of highband. The algorithm for wide band expansion proved to work, though certain unwanted artefacts were introduced in the reconstructed signal. Listening tests confirmed the presence of these unwanted artefacts. Objective and subjective tests demonstrate that wideband speech synthesized using these techniques have presentage in (numerical) 50 % of the respondences with SNR 5,13 dB. Optimum parameter used in this system goes to Euclidean distance with K=1 for KNN classification and correlation distance with 256 clusters for Kmean clustering. Computational time for spectral shifting 0.144 s, for spectral folding 0.138 s and codebook needs 164,2 s. Subjective measurement using DMOS for spectral shifting about 3.65 and for spectral folding 2. However further research and improvement to reach higher quality from this system for implementation are still needed.
Simulation of Wideband Speech Reconstruction Using Spectral Shifting
1. SIMULATION OF RECONSTRUCTION
WIDEBAND SPEECH SIGNAL USING
SPECTRAL SHIFTING
Fitrie Ratnasari
(111080134)
1 Advisor :
Dr. Ir. Bambang Hidayat, DEA
st
2nd Advisor :
Inung Wijayanto S.T, M.T
1
3. BACKGROUND
The transmitting speech narrowband
sounds muffled, thin and far away
communication.
Wideband speech coding inquires an
increase in the bit rate, bandwidth
and expensive.
Various codec in amount of
telecommunication technology
need a ‘bridge’ to equate the
speech quality
3
4. PURPOSE
To simulate the reconstruction of
wideband speech signal using spectral
shifting and spectral folding method
To estimate the wideband speech
envelope using codebook algorithm
To analyze performance system of
simulation using objective, subjective
measurement and computational time
4
5. PROBLEM FORMULATION
How to simulate the reconstruction of
wideband speech signal using spectral
shifting method with Matlab R.2009.a
How to estimate the wideband speech
envelope using codebook algorithm
How to estimate the residual error of
high frequence wideband speech signal
How to analyze and synthesize linear
predictive in a system
5
6. PROBLEM LIMITATIONS
System focused on using spectral
shifting method for reconstruction
wideband speech signal
System doesn’t require any
communication system channel or
transmission
Input speech for simulation is in a
clean speech signal, format .*wav,
with sampling frequency 16KHz
Non real-time system
6
7. PROBLEM LIMITATIONS (cont’d)
Data training of speech signal are in
Bahasa without any dialect region
Evaluation of simulation only tested in a
normal hearing
Performance paramaters that analyze
using SNR, Cross Correlation, DMOS
(Degraded Mean Opinion Score), and
computational time
7
8. Theory
• Wideband vs Narrowband
Narrowband
Wideband
: 200 – 3400 Hz
: 50 – 7000 Hz
The high-frequency extension from 3400 to 7000 Hz provides
better fricative differentiation, and therefore higher intelligibility.
Wideband Speech
Narrowband Speech
Wideband is the answer to make intelligibility,
naturalness of speech, feeling of transparent
communication and facilitates speaker
recognition.
8
10. Wideband Reconstruction
Estimating wideband envelope.
This system using codebook
algorithm
Estimating missing high frequency.
This system using spectral shifting
and folding
10
12. Spectral Shifting and Spectral Folding Method
Spectral shifting or spectral folding is a method for
estimating the missing high frequency.
Spectral shifting method
LPF
Pitch
detector
Cosine
generator
X
Spectral folding method
12
13. Mulai
System Model
Studi Literatur
Perekaman Suara
Wideband
Data
Masukan
Filter Telephoni
↓K
Bandlimited
speech
(Narrowband)
LP Analysis
LPC
Coefficient
of
narrowband
Residual
Error of
Narrowband
Envelope
Estimation
High Frequency
Regeneration
LPC Coeff
estimated of
wideband
Residual
Error
Estimation of
Wideband
LP Synthesis
HPF
Adding
Reconstructed
Wideband
13
15. Simulation Result (Subjective Measurement)
1st testing
(A) Narrowband signal
(B) Original wideband
Preference : B
Proportion : 95%
2nd testing
(A) Narrowband signal
(B) Wideband reconstructed using spectral folding
Preference: A
Proportion : 100%
3rd testing
(A) Narrowband signal
(B) Wideband reconstructed using spectral shifting
Preference : A / B
Proportion : 50 %
4th testing
(A) Wideband reconstructed using spectral folding
(B) Wideband reconstructed using spectral shifting
Preference : B
Proportion : 100%
Testing Result of Simulation using
A/B preference test
15
17. Conclusion
1. Simulation of reconstruction wideband speech signal using
spectral shifting and spectral folding proved to work, though
certain unwanted artefacts were introduced in the
reconstructed speech signal.
2. The best parameter which is used for this system is Euclidean
Distance with K = 1 for knn classification and Correlation
Distance with 256 clusters for kmean clustering.
3. Maximum of mean SNR of this system toward to 5.3 dB
4. Computational time for this system needs about 0.176 seconds,
and 164 seconds for codebook process.
17
18. Conclusion
5. Wideband reconstructed signal using spectral shifting method
has a better performance rather than spectral folding method.
Mean DMOS for spectral shifting about 3.65 and for spectral
folding 2
6. Percentage of respondence for choosing both of spectral shifting
method narrowband signal has a same presentage. It means
that this system still need further reaserch to reach higher
quality for implementation
18
19. Suggestion
1. System might be implemented and analyzed with the
other languange programs, such as Java, C, etc
2. System might be impelemented with using another
method for estimating missing high frequency such as
statistical method GMM, HMM, etc
3. System might be added by using with estimating
wideband energy in order to have a higher quality.
4. System can be applied with dialect region
5. System for reconstruction wideband speech signal
can be implemented as a real time process
19