2. Outline
• Introduction
• Voice over Internet Protocol (VoIP)
• Approaches to Speech Quality estimation
• Genetic Programming
• Real-time, Non-intrusive Evaluation of VoIP
3. Outline ...
• A Methodology for Deriving VoIP Equipment
Impairment Factors for a Mixed NB/WB Context
• A Signal-based Model
• Conclusion
4. Introduction
• VoIP -- a paradigm shift
• Bandwidth redundancy exploitation
• QoS remains dominated by network/transport layer
metrics
• Quality Assessment - a reflection upon the operating
conditions of the network
5. Research Goals
• Derivation of non-intrusive parametric models for
speech quality estimation
• Derivation of a signal-based non-intrusive model
• Genetic programming based symbolic regression was
used
6. VoIP
• Packet based communication channel
• Uses wire-line speech codecs
• Linear predictive coding (LPC) is popular
• Coded frames are packetized into RTP/UDP
• Internet is used for transportation
• The receiver does the reverse process
7. Speech Quality
• Two Approaches to Speech Quality Assessment
★ Subjective Assessment
★ Objective Assessment
8. Subjective Assessment
Speech Quality
• Speech Quality is Estimated By Humans
• Advantage - Reliable Results
• Limitations
1. Expensive
2. Time Consuming
3. Laborious
4. Lack of Repeatability
• Mean Opinion Score (MOS) is the Measure of Quality
• 1 - Bad
• 5 - Excellent
9. Objective Assessment of Speech Quality
Speech Quality
• A computer automated fast and reliable program is
used to assay human perception of speech quality
• Two approaches:
1. Intrusive Assessment
2. Non-Intrusive Assessment
10. Intrusive Assessment
Objective Assessment of Speech Quality
• The signal under test is compared against a reference
signal
• Advantages
1. The most reliable artificial means of assessing the
speech quality
2. Tests can be repeated easily
• Limitations:
1. Consumes considerable computing resources
2. Not useful for continuous monitoring of quality due
to requirement of a reference signal
11. ITU-T P.862 (PESQ)
Objective Assessment of Speech Quality
• PESQ algorithm is the current ITU-T recommendation
for intrusive speech quality estimation
• The speech signal is mapped from time domain to time-
frequency representation using the psycho-physical
equivalents of time and frequency
12. ITU-T P.862 (PESQ)
Objective Assessment of Speech Quality
• It has shown a high correlation with ITU-T benchmark
tests
• For 30 ITU-T subjective tests the Pearson’s correlation
coefficient (R) was 0.935
13. Non-Intrusive Assessment
Objective Assessment of Speech Quality
• A challenging problem since a reference signal is not
available
• Two approaches exist
1. Parametric models
2. Signal-based models
14. Parametric Models
Objective Assessment of Speech Quality
• Function of transport layer metrics and other measurable
quantities
• Cogent metrics may be:
★ Packet loss rate
★ End-to-end delay
★ Delay variation - jitter
★ Codecs characteristics
★ ...
• Aimed at real-time and continuous evaluation of speech
quality
15. Signal-based Models
Objective Assessment of Speech Quality
• Recent approaches are based on emulating
1. Human speech production mechanism
2. Psycho-acoustic processing of human hearing
• ITU-T P.563 is the current recommendation
16. Introduction
Genetic Programing (GP)
• GP is a machine learning technique inspired by
biological evolution.
• Aimed at evolving program expressions/computer
code
• Each individual encodes a symbolic expression
• Solution representation
★ Tree structure - popular
★ Graphs, linear structures (arrays) etc
• Primary application area is modeling
18. A Simplified GP Breeding Cycle
Genetic Programing (GP)
1. Generate an initial population of random compositions
of the functions and terminals of the problem
(computer programs)
★ Functions: +, -, *, /, sin, cos, log, power etc
★ Terminals: Can be variables (network traffic
parameters) and constants
2. Execute each program in the population and assign it a
fitness value
3. Copy the best existing programs (selection).
4. Create new computer programs by mutation and
crossover
✴ Repeat steps 2-4 till the desired solution is found
19. A Simplified GP Breeding Cycle: A Symbolic Representation
Genetic Programing (GP)
20. Real-time, Non-intrusive Evaluation of VoIP
• Derivation of non-intrusive parametric models for
speech quality estimation
No Parameter Name Abbreviation
1 bit-rate (kbps) br
2 mean loss rate mlr
3 mean burst length mbl
4
packetization interval
(ms)
PI
5 frame duration (ms) fd
25. A Methodology for Deriving VoIP Equipment
Impairment Factors for a Mixed NB/WB Context
• VoIP is rapidly evolving towards wideband (WB)
transmission
• WB offers more natural sounding speech
• However, a period of coexistence between NB and WB
codecs would prevail
• Research focus was at evolving impairment factors for
ITU-T G.107, the E-model, for a mixed NB/WB context
26. ITU-T G.107 The E-Model
Equipment Impairment Factors for a Mixed Context
• A parametric model
• Based on an impairment factor principle
• Effect of impairments on quality is additive
• R scale: R=R0 - Is - Id - Ie,eff + A
• Initially designed for NB Telephony
• Extension to WB (or NB/WB)
27. Equipment Impairment Factors for a Mixed Context
• Effective equipment
impairments:
• Codec related
• loss rate (mean)
• Burstiness
• Payload size (PI)
0 5 10 15 20 25 30 35 40
20
30
40
50
60
70
80
90
100
110
120
mean loss rate (mlr) %
I
e,WB,eff
G.729 (8)
G.723.1(6.3)
AMR!NB (7.4)
G.722.2 (19.85)
G.722.1 (32)
28. The Models
Equipment Impairment Factors for a Mixed Context
Ie,W B,eff = (1)
{11 − mbl + ln(grad) + grad × mlr + Ie,W B
−2.log2(PI)} × 0.8619 + 9
Ie,W B,eff = (2)
ln
9 × (Ie,W B + mlr × grad2
)
mbl5 − mlr
+ mlr + Ie,W B
+grad × mlr} × 0.8303 + 8.9977
Ie,W B,eff = (3)
(log10(log10(log2(Ie,W B − 2 × mbl) + mlr)))
×321.7017 + 95.3708
29. Test Results
Equipment Impairment Factors for a Mixed Context
0 20 40 60 80 100 120 140
0
20
40
60
80
100
120
140
I
e,WB,eff
!WB!PESQ
I
e,WB,eff
!GP
0 20 40 60 80 100 120 140
0
20
40
60
80
100
120
140
I
e,WB,eff
!WB!PESQ
I
e,WB,eff
!GP
Model
Training Testing
RMSE RMSE
1 8.3941 0.9236 8.5057 0.9240
2 8.3552 0.9243 8.4605 0.9248
3 9.1745 0.908 9.3145 0.9080
σ σ
30. The Proposed Model
A Signal-Based Model
• ITU-T P.563 has been chosen for feature extraction
• Reasons:
i. P.563 is the current state-of-the-art algorithm for
speech quality estimation
ii. It computes the most numerous and varied features
relevant to speech quality
• A new mapping is derived by employing symbolic
regression
31. GP Experiments
A Signal-Based Model
• Three GP experiments were performed with various
configurations
• Leaf coefficients of GP trees were tuned with a GA -
hybrid optimization
32. Distortion Conditions
A Signal-Based Model
• Signal correlated noise
• Frame erasures
• Bit errors
• transcoding
• Front-end clipping
• Low bitrate coding
• Speech level variation
33. Results - Comparison With ITU-T P.563
A Signal-Based Model
ITU-T P.563 GP based model
Percentage
Enhancement
Training 0.3937 0.3415 9.89
Testing 0.3674 0.3071 16.41
34. Conclusions
• Research goal: derivation of real-time non-intrusive
models for speech quality estimation
• GP has been employed to achieve this
• Disparate parametric and signal-based models have
been derived
• Models outperform ITU-T’s standard recommendations