SlideShare ist ein Scribd-Unternehmen logo
1 von 3
Downloaden Sie, um offline zu lesen
NIQA – Non-Intrusive voice Quality Analyzer

Modern standard methods for evaluating quality of transmitted speech

Voice quality is one of the main characteristics of speech transmission systems. When analyzing voice quality one
must not only consider audio signal degradation caused by transmission over telecom channels, but also specifics of
speaker's voice, conditions of listener's hearing and variation of these parameters in time.

The most known methods for quality evaluation of voice transmission systems were developed by Telecommunication
Standardization Sector of International Telecommunications Union (ITU-T) in the middle of 90-s. Results of this work
are presented in Recommendation P.800 (P.830) «Methods for subjective determination of transmission quality» [1,
2]. This document describes conditions for voice quality testing, audio contents, scoring and methods to evaluate
results. Typically “Methods for subjective determination of transmission quality” are used to obtain mean subjective
quality score according to five-digit scale (Mean Opinion Score - MOS).

Unfortunately P.800 recommendation tests may lead to ambiguous results. Recommendation is warning about
comparing MOS scores received under different conditions and consider such approach incorrect. Besides that
preforming tests according to P.800 takes a lot of time and requires a lot of testers involved in the process.

In order to move from subjective (MOS) scores to objective ones and to automate the quality measurement, ITU-T has
developed the P.861 recommendation, which is based on low level quantitative measurements [3]. Recommendation
P.861 is a follow-up of PSQM method (Perceptual Speech Quality Measurement), developed by KPN Research and
devoted to objective analysis of speech codecs performance with a low level of degradation.

However, it is impossible to utilize PSQM for evaluation of work of a real communication system because the method
does not consider all the important factors influencing human perception. Among these factors are delay, jitter, packet
loss as well as signal level clipping.

In February 2001 ITU-T has issued another recommendation ITU-T P.862 [4], which describes a more advanced
algorithm for voice quality testing – PESQ (Perceptual Evaluation of Speech Quality). The algorithm includes level
and time aligning, human perception and cognitive modeling. Due to these additional operations the approach
considers signal amplification/ attenuation in a communication system, time delays and jitter as well as spectrum
bands, which are the most significant for human perception. Based on cognitive modeling PESQ also recalculates
objective quality score into MOS values.

A disadvantage of PESQ as well as other similar algorithm is the fact that they are based on comparing of two signals:
original and transmitted through a communication system. This approach may create a range of difficulties connected
with setting and preforming voice quality testing. One requires to arrange signal recording on both sides of the
telecommunication system as well as records transmission to the test system. Besides this real time quality monitoring
in such approach appears quite difficult as well.

In order to solve the challenging issues mentioned above ITU-T has developed a new recommendation P.563 [5]
introduced in May 2004. This recommendation determines algorithm for evaluating speech quality by listening to
communication sessions. The algorithm takes into account single-side distortions, speech trunk parameters, noise and
speech naturalness. Developers of P.563 call attention that P.563 does not provide overall quality estimation of
speech transmission. Distortions driven by delays, echo, loss of loudness and everything related to two-sided
interaction cannot be taken into consideration by this method.

It's widely thought that P.563 provides a high level of correlation between automated and expert quality scores.
However, simple tests based on ITU-T sound database for codec testing [6] may raise some doubts about the
consistence of the algorithm provided together with its description.
Table.1. Comparison between results of P.563 and expert estimations

               MOS Range                      Ava rage Score                             Average error
                                             MOS      P.563
                  4–5                        4,25      2,45                                   1,79
                  3–4                        3,42      1,70                                   1,69
                  2–3                        2,56      1,71                                   0,97
                  1–2                        1,68      1,49                                   0,55

The problem discovered in the distributed P.563 algorithm implementation required development of an alternative
solution. Further down one can find one of possible solutions that is implemented in Sevana NIQA (Non-Intrusive
Quality Analyzer).

General Structure of Sevana NIQA
NIQA's (Non-Intrusive Quality Analyzer) approach is based on a database of trained etalons called associations. Each
association corresponds to a group of files that have close expert estimations of sound quality and common set of
reasons for sound quality degradation. For each association NIQA calculates and stores a distribution of parameters'
values.

Basic algorithm showing how NIQA obtains sound quality scores is represented on the picture below.

                                  Loading sound data. Excluding low level pauses. Audio signal energy
                                                            normalization.


                                  Detecting signal energy level threshold. VAD algorithm initialization.



                                         Separating signal into active and passive components.



                                             Calculating signal parameters in time domain.



                                                      Calculating signal spectrum.



                                  Detecting DTMF



                                           Psy-filtering. First level of psycho-acoustic model.
                                                                                                           Signal parameters




                                             Splitting spectrum into tone/noise components.


                                      Level normalization. Second level of psycho-acoustic model.



                                   Transforming levels into quantitative range of loudness. Third level
                                                       of psycho-acoustic model.


                                                 Calculating signal spectrum parameters.



                                      Search and selection from operational associations database.
                   Associations
                    database
                                                            Score calculation.
                                               Output of quality score and list of matched
                                                sociations.«сработавших» ассоциаций.
When loading sound signal the system excludes all fragments with low energy level (according to threshold). The
excluded fragments correspond to “absolute silence” and are considered irrelevant for obtaining sound quality score.

At the next phase the signal is split into frames used in voice activity detection algorithm (VAD). The system calculates
energy values for each frame what increases accuracy of VAD. With the help of VAD algorithm the signal divides to
active and inactive components that are processed separately. The system builds level histograms for both active and
inactive signal components.

By discrete cosine transform (DCT) the system obtains signal spectrum and checks the active components frames for
DTMF presence and then excludes the frames that are similar to DTMF from further processing.

Next stage applies the first level of psycho-acoustic model to the signal spectrum. This model checks different types of
masking (including pre-masking and post-masking). According to clear peaks of spectrum energy the system splits the
signal into tone and noise components.

Second level of psycho-acoustic model performs energy normalization of the signal – energy levels are transformed
into loudness levels at 1kHz. Third level of psycho-acoustic model transforms loudness levels into several detectable
grades of loudness that allow to ignore sound signal changes, which are not recognized by human ear.

The next step is to split signal spectrum into bands that are critical to human ear perception and calculate parameters
both on and out of the bands. Based on the computed signal parameters the system selects most similar associations
from the database and performs matching. According to selected associations the system determines how much each
of them influence the overall quality and then generates the final voice quality score as a combination of scores for
selected associations and according to correspondent weights.

Sevana NIQA Testing and Evaluation
Sevana NIQA has been tested utilizing the same ITU-T speech database that is used for conformance testing of
P.563 algorithm. In the tests we used a total of 376 English language recordings. All recordings were sorted into 4
groups depending on their MOS scores (represented in the documentation attached to the sound database). For all
groups of recordings we determined average expert scores and average NIQA scores (Table 2). In order to illustrate
comparison with P.563 we also calculated average errors for P.563 and NIQA scores for the same tests.

                                                       Table.2. Comparison of NIQA scores against expert estimations

         MOS Range              Average Score              Average Error
                               MOS      NIQA              NIQA           P.563
             4–5               4,25      3,44              0,83          1,79
             3–4               3,42      3,06              0,51          1,69
             2–3               2,56      2,61              0,43          0,97
             1–2               1,68      2,36              0,68          0,55

The results clearly show that NIQA allows receiving much higher accuracy between generated quality scores and
expert estimations than P.563. NIQA scores are less precise only for records with very low MOS scores (in the range
from 1 to 2). In all other cases NIQA provides 2-3 times higher quality scores precision compared to MOS values.


References
1. Methods for subjective determination of transmission quality // ITU-T Recommendation P.800 /
http://www.itu.int/rec/T-REC-P.800/en
2. Subjective performance assessment of telephone-band and wideband digital codecs // ITU-T Recommendation
P.830 / http://www.itu.int/rec/T-REC-P.830/en
3. Objective quality measurement of telephone-band (300-3400 Hz) speech codecs // ITU-T Recommendation P.861 /
http://www.itu.int/rec/T-REC-P.861/en
4. Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of
narrow-band telephone networks and speech codecs // ITU-T Recommendation P.862 / http://www.itu.int/rec/T-REC-
P.862/en
5. Single-ended method for objective speech quality assessment in narrow-band telephony applications // ITU-T
Recommendation P.563 / http://www.itu.int/rec/T-REC-P.563-200405-I/en
6. ITU-T coded-speech database // Supplement 23 to ITU-T P-series Recommendations / http://www.itu.int/rec/T-
REC-P.Sup23-199802-I/en

Weitere ähnliche Inhalte

Mehr von Sevana Oü

QualTest Host User Guide
QualTest Host User GuideQualTest Host User Guide
QualTest Host User GuideSevana Oü
 
QualTest SIP User guide
QualTest SIP User guideQualTest SIP User guide
QualTest SIP User guideSevana Oü
 
QualTest GSM User Guide
QualTest GSM User GuideQualTest GSM User Guide
QualTest GSM User GuideSevana Oü
 
Sevana QualTest
Sevana QualTestSevana QualTest
Sevana QualTestSevana Oü
 
Sevana real-time rtp analysis for mobile operators
Sevana real-time rtp analysis for mobile operatorsSevana real-time rtp analysis for mobile operators
Sevana real-time rtp analysis for mobile operatorsSevana Oü
 
Sevana AQuA. End-to-end drive testing technology
Sevana AQuA. End-to-end drive testing technologySevana AQuA. End-to-end drive testing technology
Sevana AQuA. End-to-end drive testing technologySevana Oü
 
Real time call quality analysis for mobile operators
Real time call quality analysis for mobile operatorsReal time call quality analysis for mobile operators
Real time call quality analysis for mobile operatorsSevana Oü
 
Sevana QualTest
Sevana QualTestSevana QualTest
Sevana QualTestSevana Oü
 
Sevana PVQA Server
Sevana PVQA ServerSevana PVQA Server
Sevana PVQA ServerSevana Oü
 
Sevana AQuA (Audio Quality Analyzer)
Sevana AQuA (Audio Quality Analyzer)Sevana AQuA (Audio Quality Analyzer)
Sevana AQuA (Audio Quality Analyzer)Sevana Oü
 
Real-time-RTP-analysis
Real-time-RTP-analysisReal-time-RTP-analysis
Real-time-RTP-analysisSevana Oü
 
AQuA 7.x manual
AQuA 7.x manualAQuA 7.x manual
AQuA 7.x manualSevana Oü
 
Drive Testing. AQuA. PVQA.
Drive Testing. AQuA. PVQA.Drive Testing. AQuA. PVQA.
Drive Testing. AQuA. PVQA.Sevana Oü
 
Drive-Testing-AQuA-PVQA
Drive-Testing-AQuA-PVQADrive-Testing-AQuA-PVQA
Drive-Testing-AQuA-PVQASevana Oü
 
AQuA - End-to-End Drive Testing Technology (VoLTE, VoWiFi, RCS)
AQuA - End-to-End Drive Testing Technology (VoLTE, VoWiFi, RCS)AQuA - End-to-End Drive Testing Technology (VoLTE, VoWiFi, RCS)
AQuA - End-to-End Drive Testing Technology (VoLTE, VoWiFi, RCS)Sevana Oü
 
AQuA - альтернатива PESQ (p.862) и POLQA (P.863)
AQuA - альтернатива PESQ (p.862) и POLQA (P.863)AQuA - альтернатива PESQ (p.862) и POLQA (P.863)
AQuA - альтернатива PESQ (p.862) и POLQA (P.863)Sevana Oü
 
Sevana Audio Quality Analyzer Brochure
Sevana Audio Quality Analyzer BrochureSevana Audio Quality Analyzer Brochure
Sevana Audio Quality Analyzer BrochureSevana Oü
 
Passive Call Quality Monitoring in VoIP
Passive Call Quality Monitoring in VoIPPassive Call Quality Monitoring in VoIP
Passive Call Quality Monitoring in VoIPSevana Oü
 
Sevana Voice Impairments Detection Library
Sevana Voice Impairments Detection LibrarySevana Voice Impairments Detection Library
Sevana Voice Impairments Detection LibrarySevana Oü
 

Mehr von Sevana Oü (20)

QualTest Host User Guide
QualTest Host User GuideQualTest Host User Guide
QualTest Host User Guide
 
QualTest SIP User guide
QualTest SIP User guideQualTest SIP User guide
QualTest SIP User guide
 
QualTest GSM User Guide
QualTest GSM User GuideQualTest GSM User Guide
QualTest GSM User Guide
 
Sevana QualTest
Sevana QualTestSevana QualTest
Sevana QualTest
 
Sevana real-time rtp analysis for mobile operators
Sevana real-time rtp analysis for mobile operatorsSevana real-time rtp analysis for mobile operators
Sevana real-time rtp analysis for mobile operators
 
Sevana AQuA. End-to-end drive testing technology
Sevana AQuA. End-to-end drive testing technologySevana AQuA. End-to-end drive testing technology
Sevana AQuA. End-to-end drive testing technology
 
Real time call quality analysis for mobile operators
Real time call quality analysis for mobile operatorsReal time call quality analysis for mobile operators
Real time call quality analysis for mobile operators
 
Sevana QualTest
Sevana QualTestSevana QualTest
Sevana QualTest
 
Sevana PVQA
Sevana PVQASevana PVQA
Sevana PVQA
 
Sevana PVQA Server
Sevana PVQA ServerSevana PVQA Server
Sevana PVQA Server
 
Sevana AQuA (Audio Quality Analyzer)
Sevana AQuA (Audio Quality Analyzer)Sevana AQuA (Audio Quality Analyzer)
Sevana AQuA (Audio Quality Analyzer)
 
Real-time-RTP-analysis
Real-time-RTP-analysisReal-time-RTP-analysis
Real-time-RTP-analysis
 
AQuA 7.x manual
AQuA 7.x manualAQuA 7.x manual
AQuA 7.x manual
 
Drive Testing. AQuA. PVQA.
Drive Testing. AQuA. PVQA.Drive Testing. AQuA. PVQA.
Drive Testing. AQuA. PVQA.
 
Drive-Testing-AQuA-PVQA
Drive-Testing-AQuA-PVQADrive-Testing-AQuA-PVQA
Drive-Testing-AQuA-PVQA
 
AQuA - End-to-End Drive Testing Technology (VoLTE, VoWiFi, RCS)
AQuA - End-to-End Drive Testing Technology (VoLTE, VoWiFi, RCS)AQuA - End-to-End Drive Testing Technology (VoLTE, VoWiFi, RCS)
AQuA - End-to-End Drive Testing Technology (VoLTE, VoWiFi, RCS)
 
AQuA - альтернатива PESQ (p.862) и POLQA (P.863)
AQuA - альтернатива PESQ (p.862) и POLQA (P.863)AQuA - альтернатива PESQ (p.862) и POLQA (P.863)
AQuA - альтернатива PESQ (p.862) и POLQA (P.863)
 
Sevana Audio Quality Analyzer Brochure
Sevana Audio Quality Analyzer BrochureSevana Audio Quality Analyzer Brochure
Sevana Audio Quality Analyzer Brochure
 
Passive Call Quality Monitoring in VoIP
Passive Call Quality Monitoring in VoIPPassive Call Quality Monitoring in VoIP
Passive Call Quality Monitoring in VoIP
 
Sevana Voice Impairments Detection Library
Sevana Voice Impairments Detection LibrarySevana Voice Impairments Detection Library
Sevana Voice Impairments Detection Library
 

Kürzlich hochgeladen

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 

Kürzlich hochgeladen (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

NIQA - non-intrusive voice quality testing software (alternative for P.563)

  • 1. NIQA – Non-Intrusive voice Quality Analyzer Modern standard methods for evaluating quality of transmitted speech Voice quality is one of the main characteristics of speech transmission systems. When analyzing voice quality one must not only consider audio signal degradation caused by transmission over telecom channels, but also specifics of speaker's voice, conditions of listener's hearing and variation of these parameters in time. The most known methods for quality evaluation of voice transmission systems were developed by Telecommunication Standardization Sector of International Telecommunications Union (ITU-T) in the middle of 90-s. Results of this work are presented in Recommendation P.800 (P.830) «Methods for subjective determination of transmission quality» [1, 2]. This document describes conditions for voice quality testing, audio contents, scoring and methods to evaluate results. Typically “Methods for subjective determination of transmission quality” are used to obtain mean subjective quality score according to five-digit scale (Mean Opinion Score - MOS). Unfortunately P.800 recommendation tests may lead to ambiguous results. Recommendation is warning about comparing MOS scores received under different conditions and consider such approach incorrect. Besides that preforming tests according to P.800 takes a lot of time and requires a lot of testers involved in the process. In order to move from subjective (MOS) scores to objective ones and to automate the quality measurement, ITU-T has developed the P.861 recommendation, which is based on low level quantitative measurements [3]. Recommendation P.861 is a follow-up of PSQM method (Perceptual Speech Quality Measurement), developed by KPN Research and devoted to objective analysis of speech codecs performance with a low level of degradation. However, it is impossible to utilize PSQM for evaluation of work of a real communication system because the method does not consider all the important factors influencing human perception. Among these factors are delay, jitter, packet loss as well as signal level clipping. In February 2001 ITU-T has issued another recommendation ITU-T P.862 [4], which describes a more advanced algorithm for voice quality testing – PESQ (Perceptual Evaluation of Speech Quality). The algorithm includes level and time aligning, human perception and cognitive modeling. Due to these additional operations the approach considers signal amplification/ attenuation in a communication system, time delays and jitter as well as spectrum bands, which are the most significant for human perception. Based on cognitive modeling PESQ also recalculates objective quality score into MOS values. A disadvantage of PESQ as well as other similar algorithm is the fact that they are based on comparing of two signals: original and transmitted through a communication system. This approach may create a range of difficulties connected with setting and preforming voice quality testing. One requires to arrange signal recording on both sides of the telecommunication system as well as records transmission to the test system. Besides this real time quality monitoring in such approach appears quite difficult as well. In order to solve the challenging issues mentioned above ITU-T has developed a new recommendation P.563 [5] introduced in May 2004. This recommendation determines algorithm for evaluating speech quality by listening to communication sessions. The algorithm takes into account single-side distortions, speech trunk parameters, noise and speech naturalness. Developers of P.563 call attention that P.563 does not provide overall quality estimation of speech transmission. Distortions driven by delays, echo, loss of loudness and everything related to two-sided interaction cannot be taken into consideration by this method. It's widely thought that P.563 provides a high level of correlation between automated and expert quality scores. However, simple tests based on ITU-T sound database for codec testing [6] may raise some doubts about the consistence of the algorithm provided together with its description.
  • 2. Table.1. Comparison between results of P.563 and expert estimations MOS Range Ava rage Score Average error MOS P.563 4–5 4,25 2,45 1,79 3–4 3,42 1,70 1,69 2–3 2,56 1,71 0,97 1–2 1,68 1,49 0,55 The problem discovered in the distributed P.563 algorithm implementation required development of an alternative solution. Further down one can find one of possible solutions that is implemented in Sevana NIQA (Non-Intrusive Quality Analyzer). General Structure of Sevana NIQA NIQA's (Non-Intrusive Quality Analyzer) approach is based on a database of trained etalons called associations. Each association corresponds to a group of files that have close expert estimations of sound quality and common set of reasons for sound quality degradation. For each association NIQA calculates and stores a distribution of parameters' values. Basic algorithm showing how NIQA obtains sound quality scores is represented on the picture below. Loading sound data. Excluding low level pauses. Audio signal energy normalization. Detecting signal energy level threshold. VAD algorithm initialization. Separating signal into active and passive components. Calculating signal parameters in time domain. Calculating signal spectrum. Detecting DTMF Psy-filtering. First level of psycho-acoustic model. Signal parameters Splitting spectrum into tone/noise components. Level normalization. Second level of psycho-acoustic model. Transforming levels into quantitative range of loudness. Third level of psycho-acoustic model. Calculating signal spectrum parameters. Search and selection from operational associations database. Associations database Score calculation. Output of quality score and list of matched sociations.«сработавших» ассоциаций.
  • 3. When loading sound signal the system excludes all fragments with low energy level (according to threshold). The excluded fragments correspond to “absolute silence” and are considered irrelevant for obtaining sound quality score. At the next phase the signal is split into frames used in voice activity detection algorithm (VAD). The system calculates energy values for each frame what increases accuracy of VAD. With the help of VAD algorithm the signal divides to active and inactive components that are processed separately. The system builds level histograms for both active and inactive signal components. By discrete cosine transform (DCT) the system obtains signal spectrum and checks the active components frames for DTMF presence and then excludes the frames that are similar to DTMF from further processing. Next stage applies the first level of psycho-acoustic model to the signal spectrum. This model checks different types of masking (including pre-masking and post-masking). According to clear peaks of spectrum energy the system splits the signal into tone and noise components. Second level of psycho-acoustic model performs energy normalization of the signal – energy levels are transformed into loudness levels at 1kHz. Third level of psycho-acoustic model transforms loudness levels into several detectable grades of loudness that allow to ignore sound signal changes, which are not recognized by human ear. The next step is to split signal spectrum into bands that are critical to human ear perception and calculate parameters both on and out of the bands. Based on the computed signal parameters the system selects most similar associations from the database and performs matching. According to selected associations the system determines how much each of them influence the overall quality and then generates the final voice quality score as a combination of scores for selected associations and according to correspondent weights. Sevana NIQA Testing and Evaluation Sevana NIQA has been tested utilizing the same ITU-T speech database that is used for conformance testing of P.563 algorithm. In the tests we used a total of 376 English language recordings. All recordings were sorted into 4 groups depending on their MOS scores (represented in the documentation attached to the sound database). For all groups of recordings we determined average expert scores and average NIQA scores (Table 2). In order to illustrate comparison with P.563 we also calculated average errors for P.563 and NIQA scores for the same tests. Table.2. Comparison of NIQA scores against expert estimations MOS Range Average Score Average Error MOS NIQA NIQA P.563 4–5 4,25 3,44 0,83 1,79 3–4 3,42 3,06 0,51 1,69 2–3 2,56 2,61 0,43 0,97 1–2 1,68 2,36 0,68 0,55 The results clearly show that NIQA allows receiving much higher accuracy between generated quality scores and expert estimations than P.563. NIQA scores are less precise only for records with very low MOS scores (in the range from 1 to 2). In all other cases NIQA provides 2-3 times higher quality scores precision compared to MOS values. References 1. Methods for subjective determination of transmission quality // ITU-T Recommendation P.800 / http://www.itu.int/rec/T-REC-P.800/en 2. Subjective performance assessment of telephone-band and wideband digital codecs // ITU-T Recommendation P.830 / http://www.itu.int/rec/T-REC-P.830/en 3. Objective quality measurement of telephone-band (300-3400 Hz) speech codecs // ITU-T Recommendation P.861 / http://www.itu.int/rec/T-REC-P.861/en 4. Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs // ITU-T Recommendation P.862 / http://www.itu.int/rec/T-REC- P.862/en 5. Single-ended method for objective speech quality assessment in narrow-band telephony applications // ITU-T Recommendation P.563 / http://www.itu.int/rec/T-REC-P.563-200405-I/en 6. ITU-T coded-speech database // Supplement 23 to ITU-T P-series Recommendations / http://www.itu.int/rec/T- REC-P.Sup23-199802-I/en