SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
Improving Speech Recognition Using Limited Accent Diverse
British English Training Data
With Acoustic Model and Data Selection
By Maryam Najafian
Supervisor Prof. Martin Russell
University of Birmingham, UK
4th October 2016
Email: m.najafian@utdallas.edu
Motivation
1/12
Regional accents can be a problem for Speech Technology!
Overview
• Problems: (1) Multi conditional data problem (2) Recognition of 14
regional accents of British English (3) Define an approach to measure the
accent difficulty
• Low dimensional visualisation of the AID feature space reveals expected
relationships between regional accents.
• One approach to accent robust ASR is adaption to the speakers accent
using an online AID to select an accent specific acoustic model 1,2,3
• Another approach to accent-robust ASR is AID and analyse the training
and apply data selection to train a DNN based system 1,2.
[1] M Najafian, Acoustic model selection for recognition of regional accented speech”,
Doctoral dissertation, Ph. D. dissertation, University of Birmingham, UK, 2016.
[2] M Najafian et al. Identification of British English regional accents using fusion of i-vector and multi-accent phonotactic systems," in ODYSSEY, 2016, pp. 132-139.
[3] M Najafian et al. Unsupervised Model Selection for Recognition of Regional Accented “peech , Proc. Interspeech 2014.
[4] M Najafian et al. Improving speech recognition using limited accent diverse British English training data with deep neural networks in MLSP, 2016.
2/12
Objectives
• This research is concerned with automatic speech recognition (ASR) for accented
speech using a range of different AID systems for GMM-HMM and DNN-HMM
based acoustic model selection
• Trained on the SI training set (92 speakers, 7861 utterances) of the WSJCAM0
corpus of read British English speech
• Tested/adapted on ABI Corpus, 14 different accents (285 speakers)
3/12
Baseline AID System Design
Phonotactic
Accuracy : 80.65 %
I-vector
Accuracy :76.76%
ACCDIST-SVM
Accuracy: 95 %
4/12
ACCDIST Accent ID feature space
5/12
GMM-HMM: Unsupervised Adaptation
6/12
GMM-HMM: Speaker versus Accent Adaptation
Supervised speaker versus accent adaptation
Unsupervised speaker versus accent adaptation
7/12
DNN-HMM versus GMM-HMM
???
8/12
Accent properties of WSJCAM0
using an i-vector based AID
9/12
DNN based ASR Vs
i-vector based AID
error rates
10/12
DNN-HMM: Extra Training Material (ETM) &
Extra Pre-Training Material (EPM)
The relationship between AI &ASR error rates motivated analysis of the effect of
supplementing the WSJCAM0 training set with different types of accented speech!
Summary and publications
• To address the multi-accent learning problem in a deep learning acoustic
modelling framework with limited resources, this work introduced a
concept called accent difficulty to analyse the training set
• A relative gain of 46.85% is achieved in recognising the Accents of British
Isles corpus by applying a baseline DNN model rather than a Gaussian
mixture model.
• Our results show that across all accent regions supplementing the
training set with a small amount of data from the most difficult accent
(2.25 hours of Glaswegian accent) leads to a similar gain in performance
as using a large amount of accent diverse data (8.96 hours from 14
accent regions), even though this accent accounts for just 14% of the test
data.
12/12
Thank you for listening
Email: m.najafian@utdallas.edu

Weitere ähnliche Inhalte

Ähnlich wie presentation_ASR_MIT

Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...karthik annam
 
SPEAKER VERIFICATION USING ACOUSTIC AND PROSODIC FEATURES
SPEAKER VERIFICATION USING ACOUSTIC AND PROSODIC FEATURESSPEAKER VERIFICATION USING ACOUSTIC AND PROSODIC FEATURES
SPEAKER VERIFICATION USING ACOUSTIC AND PROSODIC FEATURESacijjournal
 
Hybrid Phonemic and Graphemic Modeling for Arabic Speech Recognition
Hybrid Phonemic and Graphemic Modeling for Arabic Speech RecognitionHybrid Phonemic and Graphemic Modeling for Arabic Speech Recognition
Hybrid Phonemic and Graphemic Modeling for Arabic Speech RecognitionWaqas Tariq
 
Capturing Word-level Dependencies in Morpheme-based Language Modeling
Capturing Word-level Dependencies in Morpheme-based Language ModelingCapturing Word-level Dependencies in Morpheme-based Language Modeling
Capturing Word-level Dependencies in Morpheme-based Language ModelingGuy De Pauw
 
Performance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languagePerformance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languageiosrjce
 
High level speaker specific features modeling in automatic speaker recognitio...
High level speaker specific features modeling in automatic speaker recognitio...High level speaker specific features modeling in automatic speaker recognitio...
High level speaker specific features modeling in automatic speaker recognitio...IJECEIAES
 
Prosodic Control of Unit-Selection Speech Synthesis: A Probabilistic Approach
Prosodic Control of Unit-Selection Speech Synthesis: A Probabilistic ApproachProsodic Control of Unit-Selection Speech Synthesis: A Probabilistic Approach
Prosodic Control of Unit-Selection Speech Synthesis: A Probabilistic ApproachChristophe Veaux
 
Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training Ensembles
Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training EnsemblesSemi-Supervised Keyword Spotting in Arabic Speech Using Self-Training Ensembles
Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training EnsemblesMohamed El-Geish
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Lifeng (Aaron) Han
 
07-Effect-Of-Machine-Translation-In-Interlingual-Conversation.pdf
07-Effect-Of-Machine-Translation-In-Interlingual-Conversation.pdf07-Effect-Of-Machine-Translation-In-Interlingual-Conversation.pdf
07-Effect-Of-Machine-Translation-In-Interlingual-Conversation.pdfsimonp16
 
Machine learning for Arabic phonemes recognition using electrolarynx speech
Machine learning for Arabic phonemes recognition using  electrolarynx speechMachine learning for Arabic phonemes recognition using  electrolarynx speech
Machine learning for Arabic phonemes recognition using electrolarynx speechIJECEIAES
 
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESEFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESkevig
 
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and PhonemesEffect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemeskevig
 
Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...csandit
 
Replicating Speech Experts’ Assessment for Parkinson’s Disease Treatment usin...
Replicating Speech Experts’ Assessment for Parkinson’s Disease Treatment usin...Replicating Speech Experts’ Assessment for Parkinson’s Disease Treatment usin...
Replicating Speech Experts’ Assessment for Parkinson’s Disease Treatment usin...Mark (Mong) Montances
 
01 8445 speech enhancement
01 8445 speech enhancement01 8445 speech enhancement
01 8445 speech enhancementIAESIJEECS
 
Improving speech Intelligibility through Speaker Dependent and Independent Sp...
Improving speech Intelligibility through Speaker Dependent and Independent Sp...Improving speech Intelligibility through Speaker Dependent and Independent Sp...
Improving speech Intelligibility through Speaker Dependent and Independent Sp...OHSU | Oregon Health & Science University
 
Evaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis System
Evaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis SystemEvaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis System
Evaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis SystemIJERA Editor
 
Interspeech 2017 s_miyoshi
Interspeech 2017 s_miyoshiInterspeech 2017 s_miyoshi
Interspeech 2017 s_miyoshiHiroyuki Miyoshi
 

Ähnlich wie presentation_ASR_MIT (20)

Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...
 
SPEAKER VERIFICATION USING ACOUSTIC AND PROSODIC FEATURES
SPEAKER VERIFICATION USING ACOUSTIC AND PROSODIC FEATURESSPEAKER VERIFICATION USING ACOUSTIC AND PROSODIC FEATURES
SPEAKER VERIFICATION USING ACOUSTIC AND PROSODIC FEATURES
 
Hybrid Phonemic and Graphemic Modeling for Arabic Speech Recognition
Hybrid Phonemic and Graphemic Modeling for Arabic Speech RecognitionHybrid Phonemic and Graphemic Modeling for Arabic Speech Recognition
Hybrid Phonemic and Graphemic Modeling for Arabic Speech Recognition
 
Using a Manifold Vocoder for Spectral Voice and Style Conversion
Using a Manifold Vocoder for Spectral Voice and Style ConversionUsing a Manifold Vocoder for Spectral Voice and Style Conversion
Using a Manifold Vocoder for Spectral Voice and Style Conversion
 
Capturing Word-level Dependencies in Morpheme-based Language Modeling
Capturing Word-level Dependencies in Morpheme-based Language ModelingCapturing Word-level Dependencies in Morpheme-based Language Modeling
Capturing Word-level Dependencies in Morpheme-based Language Modeling
 
Performance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languagePerformance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi language
 
High level speaker specific features modeling in automatic speaker recognitio...
High level speaker specific features modeling in automatic speaker recognitio...High level speaker specific features modeling in automatic speaker recognitio...
High level speaker specific features modeling in automatic speaker recognitio...
 
Prosodic Control of Unit-Selection Speech Synthesis: A Probabilistic Approach
Prosodic Control of Unit-Selection Speech Synthesis: A Probabilistic ApproachProsodic Control of Unit-Selection Speech Synthesis: A Probabilistic Approach
Prosodic Control of Unit-Selection Speech Synthesis: A Probabilistic Approach
 
Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training Ensembles
Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training EnsemblesSemi-Supervised Keyword Spotting in Arabic Speech Using Self-Training Ensembles
Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training Ensembles
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...
 
07-Effect-Of-Machine-Translation-In-Interlingual-Conversation.pdf
07-Effect-Of-Machine-Translation-In-Interlingual-Conversation.pdf07-Effect-Of-Machine-Translation-In-Interlingual-Conversation.pdf
07-Effect-Of-Machine-Translation-In-Interlingual-Conversation.pdf
 
Machine learning for Arabic phonemes recognition using electrolarynx speech
Machine learning for Arabic phonemes recognition using  electrolarynx speechMachine learning for Arabic phonemes recognition using  electrolarynx speech
Machine learning for Arabic phonemes recognition using electrolarynx speech
 
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESEFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
 
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and PhonemesEffect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
 
Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...
 
Replicating Speech Experts’ Assessment for Parkinson’s Disease Treatment usin...
Replicating Speech Experts’ Assessment for Parkinson’s Disease Treatment usin...Replicating Speech Experts’ Assessment for Parkinson’s Disease Treatment usin...
Replicating Speech Experts’ Assessment for Parkinson’s Disease Treatment usin...
 
01 8445 speech enhancement
01 8445 speech enhancement01 8445 speech enhancement
01 8445 speech enhancement
 
Improving speech Intelligibility through Speaker Dependent and Independent Sp...
Improving speech Intelligibility through Speaker Dependent and Independent Sp...Improving speech Intelligibility through Speaker Dependent and Independent Sp...
Improving speech Intelligibility through Speaker Dependent and Independent Sp...
 
Evaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis System
Evaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis SystemEvaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis System
Evaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis System
 
Interspeech 2017 s_miyoshi
Interspeech 2017 s_miyoshiInterspeech 2017 s_miyoshi
Interspeech 2017 s_miyoshi
 

presentation_ASR_MIT

  • 1. Improving Speech Recognition Using Limited Accent Diverse British English Training Data With Acoustic Model and Data Selection By Maryam Najafian Supervisor Prof. Martin Russell University of Birmingham, UK 4th October 2016 Email: m.najafian@utdallas.edu
  • 2. Motivation 1/12 Regional accents can be a problem for Speech Technology!
  • 3. Overview • Problems: (1) Multi conditional data problem (2) Recognition of 14 regional accents of British English (3) Define an approach to measure the accent difficulty • Low dimensional visualisation of the AID feature space reveals expected relationships between regional accents. • One approach to accent robust ASR is adaption to the speakers accent using an online AID to select an accent specific acoustic model 1,2,3 • Another approach to accent-robust ASR is AID and analyse the training and apply data selection to train a DNN based system 1,2. [1] M Najafian, Acoustic model selection for recognition of regional accented speech”, Doctoral dissertation, Ph. D. dissertation, University of Birmingham, UK, 2016. [2] M Najafian et al. Identification of British English regional accents using fusion of i-vector and multi-accent phonotactic systems," in ODYSSEY, 2016, pp. 132-139. [3] M Najafian et al. Unsupervised Model Selection for Recognition of Regional Accented “peech , Proc. Interspeech 2014. [4] M Najafian et al. Improving speech recognition using limited accent diverse British English training data with deep neural networks in MLSP, 2016. 2/12
  • 4. Objectives • This research is concerned with automatic speech recognition (ASR) for accented speech using a range of different AID systems for GMM-HMM and DNN-HMM based acoustic model selection • Trained on the SI training set (92 speakers, 7861 utterances) of the WSJCAM0 corpus of read British English speech • Tested/adapted on ABI Corpus, 14 different accents (285 speakers) 3/12
  • 5. Baseline AID System Design Phonotactic Accuracy : 80.65 % I-vector Accuracy :76.76% ACCDIST-SVM Accuracy: 95 % 4/12
  • 6. ACCDIST Accent ID feature space 5/12
  • 8. GMM-HMM: Speaker versus Accent Adaptation Supervised speaker versus accent adaptation Unsupervised speaker versus accent adaptation 7/12
  • 10. Accent properties of WSJCAM0 using an i-vector based AID 9/12
  • 11. DNN based ASR Vs i-vector based AID error rates 10/12
  • 12. DNN-HMM: Extra Training Material (ETM) & Extra Pre-Training Material (EPM) The relationship between AI &ASR error rates motivated analysis of the effect of supplementing the WSJCAM0 training set with different types of accented speech!
  • 13. Summary and publications • To address the multi-accent learning problem in a deep learning acoustic modelling framework with limited resources, this work introduced a concept called accent difficulty to analyse the training set • A relative gain of 46.85% is achieved in recognising the Accents of British Isles corpus by applying a baseline DNN model rather than a Gaussian mixture model. • Our results show that across all accent regions supplementing the training set with a small amount of data from the most difficult accent (2.25 hours of Glaswegian accent) leads to a similar gain in performance as using a large amount of accent diverse data (8.96 hours from 14 accent regions), even though this accent accounts for just 14% of the test data. 12/12
  • 14. Thank you for listening Email: m.najafian@utdallas.edu