SlideShare ist ein Scribd-Unternehmen logo
1 von 11
Downloaden Sie, um offline zu lesen
fit
speech fit
BUT Zero-Cost 2016 Speech Recognition
Miroslav Skácel, Martin Karafiát, Lucas Ondel, Albert Uchytil, Igor Szöke
BUT Speech@FIT
Brno University of Technology
Czech Republic
MediaEval 2016 Workshop, October 20-21, 2016, Hilversum, Netherlands
Overview
Our ideas for this task were:
• to use previous knowledge of low-resource languages
• to use existing systems from Babel1
• to adapt that on Vietnamese
• to follow zero-cost requirements
We ended up with:
• lots of data processing and cleaning
• 2x LVCSR sub-task systems
• 1x subword sub-task systems
• suggestions for further improvements
1https://www.iarpa.gov/index.php/research-programs/babel
1
Data preparation
• audio downsampled from 16kHz to 8kHz
• got Vietnamese alphabet from wordlist
• cleaned other characters (punctuation marks, brackets, etc.)
• numerals expanded to textual form
2
Data segmentation
• audio longer than 1 minute caused troubles
• alignment from first training stages used
• data split when:
• silence longer than 0.5s occurs
• segment would be longer than 15s
3
Language models
3 LMs were created:
1) cleaned transcriptions from train set
2) cleaned subtitles
3) cleaned text from websites
• 3 or more words
• we combined them together
4
LVCSR systems - GMM/DNN2
2Martin Karafiát et al. BUT Neural Network Features for Spontaneous Vietnamese in
BABEL. In Proceedings of ICASSP 2014, pages 5659–5663. IEEE Signal Processing Society,
2014.
5
LVCSR systems - BLSTM3
• 16kHz audio, original transcriptions
• Kaldi/TNet toolkits
3Martin Karafiát et al. Multilingual BLSTM and Speaker-Specific Vector Adaptation in
2016 BUT Babel System. Accepted at SLT 2016, 2016.
6
Subword system
• Acoustic Unit Discovery4
(AUD) model
• unlabeled data - no transcription needed
• like phone-loop model, fully Bayesian, Dirichlet distribution
• no fixed number of components like in HMM
• can learn complexity of model
4Lucas Ondel et al. Variational Inference for Acoustic Unit Discovery. In Procedia
Computer Science, volume 2016, pages 80–86. Elsevier Science, 2016.
7
Results
System
Devel [WER] Test [WER]
all (ELSA / Forvo / RhinoSpike) all (ELSA / Forvo / RhinoSpike / YouTube)
Kaldi Baseline 60.4 (45.3 / 99.8 / 73.5) 75.5 (46.7 / 98.1 / 76.2 / 97.1)
P-BUT - Babel Kaldi BLSTM 16kHz 17.9 (6.4 / 58.1 / 15.8) 48.0 (4.9 / 55.7 / 35.4 / 87.2)
L-BUT - Babel Kaldi BLSTM 16kHz - LM tune 17.6 (6.2 / 56.4 / 16.9) 46.3 (4.6 / 52.6 / 32.2 / 84.7)
L-BUT - Babel GMM/DNN 8kHz 36.1 (29.7 / 68.5 / 23.4) 55.7 (28.0 / 59.3 / 44.9 / 81.4)
System
Devel [NMI] Test [NMI]
all (ELSA / Forvo / RhinoSpike) all (ELSA / Forvo / RhinoSpike / YouTube)
Kaldi Baseline 5.48 (7.88 / 10.44 / 13.8) 4.29 (7.49 / 11.25 / 15.99 / 6.35)
P-BUT AUD phone-loop 5.08 (6.45 / 8.76 / 14.19) 4.56 (5.52 / 9.59 / 18.49 / 7.59)
• BLSTM overperformed GMM/DNN system
• exception is unseen data (YouTube test set)
8
Further work
Ideas for future experiments:
• find better way to clean original transcriptions
• methods to detect inappropriate audio/transcriptions
• adding noise to audio for more robust system
• audio reverberation using RIR
9
Thanks for your attention!
9

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (16)

Video Retrieval for Multimedia Verification of Breaking News on Social Networks
Video Retrieval for Multimedia Verification  of Breaking News on Social NetworksVideo Retrieval for Multimedia Verification  of Breaking News on Social Networks
Video Retrieval for Multimedia Verification of Breaking News on Social Networks
 
MediaEval 2016 - Emotion in Music Task: Lessons Learned
MediaEval 2016 - Emotion in Music Task: Lessons LearnedMediaEval 2016 - Emotion in Music Task: Lessons Learned
MediaEval 2016 - Emotion in Music Task: Lessons Learned
 
MediaEval 2015 - JRS at Synchronization of Multi-user Event Media Task
MediaEval 2015 - JRS at Synchronization of Multi-user Event Media TaskMediaEval 2015 - JRS at Synchronization of Multi-user Event Media Task
MediaEval 2015 - JRS at Synchronization of Multi-user Event Media Task
 
MediaEval 2015 - CERTH at MediaEval 2015 Synchronization of Multi-User Event ...
MediaEval 2015 - CERTH at MediaEval 2015 Synchronization of Multi-User Event ...MediaEval 2015 - CERTH at MediaEval 2015 Synchronization of Multi-User Event ...
MediaEval 2015 - CERTH at MediaEval 2015 Synchronization of Multi-User Event ...
 
MediaEval 2015 - GTM-UVigo Systems for Person Discovery Task at MediaEval 2015
MediaEval 2015 - GTM-UVigo Systems for Person Discovery Task at MediaEval 2015MediaEval 2015 - GTM-UVigo Systems for Person Discovery Task at MediaEval 2015
MediaEval 2015 - GTM-UVigo Systems for Person Discovery Task at MediaEval 2015
 
MediaEval 2016 - TUD-MMC Predicting media Interestingness Task
MediaEval 2016 - TUD-MMC Predicting media Interestingness TaskMediaEval 2016 - TUD-MMC Predicting media Interestingness Task
MediaEval 2016 - TUD-MMC Predicting media Interestingness Task
 
MediaEval 2015 - Verifying Multimedia Use at MediaEval 2015
MediaEval 2015 - Verifying Multimedia Use at MediaEval 2015MediaEval 2015 - Verifying Multimedia Use at MediaEval 2015
MediaEval 2015 - Verifying Multimedia Use at MediaEval 2015
 
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
 
MediaEval 2016: A Multimodal System for the Verifying Multimedia Use Task
MediaEval 2016: A Multimodal System for the Verifying Multimedia Use TaskMediaEval 2016: A Multimodal System for the Verifying Multimedia Use Task
MediaEval 2016: A Multimodal System for the Verifying Multimedia Use Task
 
MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...
MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...
MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...
 
MediaEval 2015 - Synchronization of Multi-User Event Media at MediaEval 2015:...
MediaEval 2015 - Synchronization of Multi-User Event Media at MediaEval 2015:...MediaEval 2015 - Synchronization of Multi-User Event Media at MediaEval 2015:...
MediaEval 2015 - Synchronization of Multi-User Event Media at MediaEval 2015:...
 
MediaEval 2016 - Simula Team @ Context of Experience Task
MediaEval 2016 - Simula Team @ Context of Experience TaskMediaEval 2016 - Simula Team @ Context of Experience Task
MediaEval 2016 - Simula Team @ Context of Experience Task
 
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Media REVEALr: A social multimedia monitoring and intelligence system for Web...Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
 
The InVID Plug-in: Web Video Verification on the Browser
The InVID Plug-in: Web Video Verification on the BrowserThe InVID Plug-in: Web Video Verification on the Browser
The InVID Plug-in: Web Video Verification on the Browser
 
MediaEval 2016: LAPI at Predicting Media Interestingness Task
MediaEval 2016: LAPI at Predicting Media Interestingness TaskMediaEval 2016: LAPI at Predicting Media Interestingness Task
MediaEval 2016: LAPI at Predicting Media Interestingness Task
 
MediaEval 2016 - Verifying Multimedia Use Task Overview
MediaEval 2016 - Verifying Multimedia Use Task OverviewMediaEval 2016 - Verifying Multimedia Use Task Overview
MediaEval 2016 - Verifying Multimedia Use Task Overview
 

Ähnlich wie MediaEval 2016 - BUT Zero-Cost Speech Recognition

NMR Automation
NMR AutomationNMR Automation
NMR Automation
cknoxrun
 

Ähnlich wie MediaEval 2016 - BUT Zero-Cost Speech Recognition (20)

Mediaeval 2013 Spoken Web Search results slides
Mediaeval 2013 Spoken Web Search results slidesMediaeval 2013 Spoken Web Search results slides
Mediaeval 2013 Spoken Web Search results slides
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognition
 
PPT-CCL: A Universal Phrase Tagset for Multilingual Treebanks
PPT-CCL: A Universal Phrase Tagset for Multilingual TreebanksPPT-CCL: A Universal Phrase Tagset for Multilingual Treebanks
PPT-CCL: A Universal Phrase Tagset for Multilingual Treebanks
 
Preliminary study on using vector quantization latent spaces for TTS/VC syste...
Preliminary study on using vector quantization latent spaces for TTS/VC syste...Preliminary study on using vector quantization latent spaces for TTS/VC syste...
Preliminary study on using vector quantization latent spaces for TTS/VC syste...
 
Lenar Gabdrakhmanov (Provectus): Speech synthesis
Lenar Gabdrakhmanov (Provectus): Speech synthesisLenar Gabdrakhmanov (Provectus): Speech synthesis
Lenar Gabdrakhmanov (Provectus): Speech synthesis
 
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
 
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 Neural Network Language Models for Candidate Scoring in Multi-System Machine... Neural Network Language Models for Candidate Scoring in Multi-System Machine...
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 
Asr
AsrAsr
Asr
 
Corpus Linguistics :Analytical Tools
Corpus Linguistics :Analytical ToolsCorpus Linguistics :Analytical Tools
Corpus Linguistics :Analytical Tools
 
A Recorded Debating Dataset
A Recorded Debating DatasetA Recorded Debating Dataset
A Recorded Debating Dataset
 
SiddhantSancheti_MediumShortStory.pptx
SiddhantSancheti_MediumShortStory.pptxSiddhantSancheti_MediumShortStory.pptx
SiddhantSancheti_MediumShortStory.pptx
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
 
Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)
 
Acceptance Testing Of A Spoken Language Translation System
Acceptance Testing Of A Spoken Language Translation SystemAcceptance Testing Of A Spoken Language Translation System
Acceptance Testing Of A Spoken Language Translation System
 
Speaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet Vocoder
 
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...
 
NMR Automation
NMR AutomationNMR Automation
NMR Automation
 
Searching for the Best Machine Translation Combination
Searching for the Best Machine Translation CombinationSearching for the Best Machine Translation Combination
Searching for the Best Machine Translation Combination
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
 
High-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale SystemsHigh-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale Systems
 

Mehr von multimediaeval

Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
multimediaeval
 

Mehr von multimediaeval (20)

Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
 
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
 
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
 
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
 
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 TaskEssex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
 
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
 
Fooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality EstimatorFooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality Estimator
 
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
 
Pixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social ImagesPixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social Images
 
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-MatchingHCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
 
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
 
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
 
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentation
 
A Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image DetectionA Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image Detection
 
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
 
Fine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with AttentionFine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with Attention
 
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
 
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
 
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ... Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 

Kürzlich hochgeladen

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Sérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 

Kürzlich hochgeladen (20)

❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 

MediaEval 2016 - BUT Zero-Cost Speech Recognition

  • 1. fit speech fit BUT Zero-Cost 2016 Speech Recognition Miroslav Skácel, Martin Karafiát, Lucas Ondel, Albert Uchytil, Igor Szöke BUT Speech@FIT Brno University of Technology Czech Republic MediaEval 2016 Workshop, October 20-21, 2016, Hilversum, Netherlands
  • 2. Overview Our ideas for this task were: • to use previous knowledge of low-resource languages • to use existing systems from Babel1 • to adapt that on Vietnamese • to follow zero-cost requirements We ended up with: • lots of data processing and cleaning • 2x LVCSR sub-task systems • 1x subword sub-task systems • suggestions for further improvements 1https://www.iarpa.gov/index.php/research-programs/babel 1
  • 3. Data preparation • audio downsampled from 16kHz to 8kHz • got Vietnamese alphabet from wordlist • cleaned other characters (punctuation marks, brackets, etc.) • numerals expanded to textual form 2
  • 4. Data segmentation • audio longer than 1 minute caused troubles • alignment from first training stages used • data split when: • silence longer than 0.5s occurs • segment would be longer than 15s 3
  • 5. Language models 3 LMs were created: 1) cleaned transcriptions from train set 2) cleaned subtitles 3) cleaned text from websites • 3 or more words • we combined them together 4
  • 6. LVCSR systems - GMM/DNN2 2Martin Karafiát et al. BUT Neural Network Features for Spontaneous Vietnamese in BABEL. In Proceedings of ICASSP 2014, pages 5659–5663. IEEE Signal Processing Society, 2014. 5
  • 7. LVCSR systems - BLSTM3 • 16kHz audio, original transcriptions • Kaldi/TNet toolkits 3Martin Karafiát et al. Multilingual BLSTM and Speaker-Specific Vector Adaptation in 2016 BUT Babel System. Accepted at SLT 2016, 2016. 6
  • 8. Subword system • Acoustic Unit Discovery4 (AUD) model • unlabeled data - no transcription needed • like phone-loop model, fully Bayesian, Dirichlet distribution • no fixed number of components like in HMM • can learn complexity of model 4Lucas Ondel et al. Variational Inference for Acoustic Unit Discovery. In Procedia Computer Science, volume 2016, pages 80–86. Elsevier Science, 2016. 7
  • 9. Results System Devel [WER] Test [WER] all (ELSA / Forvo / RhinoSpike) all (ELSA / Forvo / RhinoSpike / YouTube) Kaldi Baseline 60.4 (45.3 / 99.8 / 73.5) 75.5 (46.7 / 98.1 / 76.2 / 97.1) P-BUT - Babel Kaldi BLSTM 16kHz 17.9 (6.4 / 58.1 / 15.8) 48.0 (4.9 / 55.7 / 35.4 / 87.2) L-BUT - Babel Kaldi BLSTM 16kHz - LM tune 17.6 (6.2 / 56.4 / 16.9) 46.3 (4.6 / 52.6 / 32.2 / 84.7) L-BUT - Babel GMM/DNN 8kHz 36.1 (29.7 / 68.5 / 23.4) 55.7 (28.0 / 59.3 / 44.9 / 81.4) System Devel [NMI] Test [NMI] all (ELSA / Forvo / RhinoSpike) all (ELSA / Forvo / RhinoSpike / YouTube) Kaldi Baseline 5.48 (7.88 / 10.44 / 13.8) 4.29 (7.49 / 11.25 / 15.99 / 6.35) P-BUT AUD phone-loop 5.08 (6.45 / 8.76 / 14.19) 4.56 (5.52 / 9.59 / 18.49 / 7.59) • BLSTM overperformed GMM/DNN system • exception is unseen data (YouTube test set) 8
  • 10. Further work Ideas for future experiments: • find better way to clean original transcriptions • methods to detect inappropriate audio/transcriptions • adding noise to audio for more robust system • audio reverberation using RIR 9
  • 11. Thanks for your attention! 9