Real-time Non-Intrusive Speech Quality Estimation of VoIP Using Genetic Programming

Real-time Non-Intrusive Speech
Quality Estimation of VoIP Using
Genetic Programming
Muhammad Adil Raja
University of Limerick

Outline
• Introduction
• Voice over Internet Protocol (VoIP)
• Approaches to Speech Quality estimation
• Genetic Programming
• Real-time, Non-intrusive Evaluation of VoIP

Outline ...
• A Methodology for Deriving VoIP Equipment
Impairment Factors for a Mixed NB/WB Context
• A Signal-based Model
• Conclusion

Introduction
• VoIP -- a paradigm shift
• Bandwidth redundancy exploitation
• QoS remains dominated by network/transport layer
metrics
• Quality Assessment - a reﬂection upon the operating
conditions of the network

Research Goals
• Derivation of non-intrusive parametric models for
speech quality estimation
• Derivation of a signal-based non-intrusive model
• Genetic programming based symbolic regression was
used

VoIP
• Packet based communication channel
• Uses wire-line speech codecs
• Linear predictive coding (LPC) is popular
• Coded frames are packetized into RTP/UDP
• Internet is used for transportation
• The receiver does the reverse process

Speech Quality
• Two Approaches to Speech Quality Assessment
★ Subjective Assessment
★ Objective Assessment

Subjective Assessment
Speech Quality
• Speech Quality is Estimated By Humans
• Advantage - Reliable Results
• Limitations
1. Expensive
2. Time Consuming
3. Laborious
4. Lack of Repeatability
• Mean Opinion Score (MOS) is the Measure of Quality
• 1 - Bad
• 5 - Excellent

Objective Assessment of Speech Quality
Speech Quality
• A computer automated fast and reliable program is
used to assay human perception of speech quality
• Two approaches:
1. Intrusive Assessment
2. Non-Intrusive Assessment

Intrusive Assessment
• The signal under test is compared against a reference
signal
• Advantages
1. The most reliable artiﬁcial means of assessing the
speech quality
2. Tests can be repeated easily
• Limitations:
1. Consumes considerable computing resources
2. Not useful for continuous monitoring of quality due
to requirement of a reference signal

ITU-T P.862 (PESQ)
• PESQ algorithm is the current ITU-T recommendation
for intrusive speech quality estimation
• The speech signal is mapped from time domain to time-
frequency representation using the psycho-physical
equivalents of time and frequency

ITU-T P.862 (PESQ)
• It has shown a high correlation with ITU-T benchmark
tests
• For 30 ITU-T subjective tests the Pearson’s correlation
coefﬁcient (R) was 0.935

Non-Intrusive Assessment
• A challenging problem since a reference signal is not
available
• Two approaches exist
1. Parametric models
2. Signal-based models

Parametric Models
• Function of transport layer metrics and other measurable
quantities
• Cogent metrics may be:
★ Packet loss rate
★ End-to-end delay
★ Delay variation - jitter
★ Codecs characteristics
★ ...
• Aimed at real-time and continuous evaluation of speech
quality

Signal-based Models
• Recent approaches are based on emulating
1. Human speech production mechanism
2. Psycho-acoustic processing of human hearing
• ITU-T P.563 is the current recommendation

Introduction
Genetic Programing (GP)
• GP is a machine learning technique inspired by
biological evolution.
• Aimed at evolving program expressions/computer
code
• Each individual encodes a symbolic expression
• Solution representation
★ Tree structure - popular
★ Graphs, linear structures (arrays) etc
• Primary application area is modeling

Applications
1. Circuit design
2. Controllers
3. Antennas
4. Artiﬁcial chemistry
5. Computer hardware design
6. Network coding
7. Digital ﬁlter design
8. Computer aided diagnosis
9. Signal processing applications
10. ...

A Simplified GP Breeding Cycle
1. Generate an initial population of random compositions
of the functions and terminals of the problem
(computer programs)
★ Functions: +, -, *, /, sin, cos, log, power etc
★ Terminals: Can be variables (network traffic
parameters) and constants
2. Execute each program in the population and assign it a
fitness value
3. Copy the best existing programs (selection).
4. Create new computer programs by mutation and
crossover
✴ Repeat steps 2-4 till the desired solution is found

A Simpliﬁed GP Breeding Cycle: A Symbolic Representation

Real-time, Non-intrusive Evaluation of VoIP
• Derivation of non-intrusive parametric models for
No Parameter Name Abbreviation
1 bit-rate (kbps) br
2 mean loss rate mlr
3 mean burst length mbl
4
packetization interval
(ms)
PI
5 frame duration (ms) fd

Simulation Environment

The Models
MOS − LQOGP = −2.46 × log(cos(log(br)) + mlrV AD × (br + fd/10)) + 3.17
MOS − LQOGP = −2.99 × cos(0.91 × sin(mlrV AD) + mlrV AD + 8) + 4.20
Data
Equation 1 Equation 2
MSE MSE
Training 0.037 0.9634 0.052 0.9481
Testing 0.0387 0.9646 0.0541 0.9501
Validation 0.0382 0.9688 0.0541 0.9531
σ σ

Scatter Plots

Comparison With ITU-T P.563

A Methodology for Deriving VoIP Equipment
Impairment Factors for a Mixed NB/WB Context
• VoIP is rapidly evolving towards wideband (WB)
transmission
• WB offers more natural sounding speech
• However, a period of coexistence between NB and WB
codecs would prevail
• Research focus was at evolving impairment factors for
ITU-T G.107, the E-model, for a mixed NB/WB context

ITU-T G.107 The E-Model
Equipment Impairment Factors for a Mixed Context
• A parametric model
• Based on an impairment factor principle
• Effect of impairments on quality is additive
• R scale: R=R0 - Is - Id - Ie,eff + A
• Initially designed for NB Telephony
• Extension to WB (or NB/WB)

• Effective equipment
impairments:
• Codec related
• loss rate (mean)
• Burstiness
• Payload size (PI)
0 5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
100
110
120
mean loss rate (mlr) %
I
e,WB,eff
G.729 (8)
G.723.1(6.3)
AMR!NB (7.4)
G.722.2 (19.85)
G.722.1 (32)

The Models
Ie,W B,eff = (1)
{11 − mbl + ln(grad) + grad × mlr + Ie,W B
−2.log2(PI)} × 0.8619 + 9
Ie,W B,eff = (2)
ln
9 × (Ie,W B + mlr × grad2
)
mbl5 − mlr
+ mlr + Ie,W B
+grad × mlr} × 0.8303 + 8.9977
Ie,W B,eff = (3)
(log10(log10(log2(Ie,W B − 2 × mbl) + mlr)))
×321.7017 + 95.3708

Test Results
0 20 40 60 80 100 120 140
0
20
40
60
80
100
120
140
I
e,WB,eff
!WB!PESQ
I
e,WB,eff
!GP
0 20 40 60 80 100 120 140
0
20
40
60
80
100
120
140
I
e,WB,eff
!WB!PESQ
I
e,WB,eff
!GP
Model
Training Testing
RMSE RMSE
1 8.3941 0.9236 8.5057 0.9240
2 8.3552 0.9243 8.4605 0.9248
3 9.1745 0.908 9.3145 0.9080
σ σ

The Proposed Model
A Signal-Based Model
• ITU-T P.563 has been chosen for feature extraction
• Reasons:
i. P.563 is the current state-of-the-art algorithm for
ii. It computes the most numerous and varied features
relevant to speech quality
• A new mapping is derived by employing symbolic
regression

GP Experiments
• Three GP experiments were performed with various
conﬁgurations
• Leaf coefﬁcients of GP trees were tuned with a GA -
hybrid optimization

Distortion Conditions
• Signal correlated noise
• Frame erasures
• Bit errors
• transcoding
• Front-end clipping
• Low bitrate coding
• Speech level variation

Results - Comparison With ITU-T P.563
ITU-T P.563 GP based model
Percentage
Enhancement
Training 0.3937 0.3415 9.89
Testing 0.3674 0.3071 16.41

Conclusions
• Research goal: derivation of real-time non-intrusive
models for speech quality estimation
• GP has been employed to achieve this
• Disparate parametric and signal-based models have
been derived
• Models outperform ITU-T’s standard recommendations

Real-time Non-Intrusive Speech Quality Estimation of VoIP Using Genetic Programming

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Real-time Non-Intrusive Speech Quality Estimation of VoIP Using Genetic Programming

Ähnlich wie Real-time Non-Intrusive Speech Quality Estimation of VoIP Using Genetic Programming (20)

Mehr von adil raja

Mehr von adil raja (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Real-time Non-Intrusive Speech Quality Estimation of VoIP Using Genetic Programming