The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
Data Augmentation Strategy for ASR via Semantic-Aware Weaving
1. WeavSpeech: Data Augmentation
Strategy for Automatic Speech
Recognition via Semantic-Aware Weaving
Kyusung Seo1, Joonhyung Park1 , Jaeyun Song1 , Eunho Yang1, 2
1Korea Advanced Institute of Science and Technology (KAIST)
2AITRCS
ICASSP 2023
2. Background
• End-to-end deep models require an immense amount of audio data with their
corresponding transcripts
• Most existing data augmentations focus on only transforming speech signal
2
“ICASSP is awesome.”
4. Challenges
• The length of speech segments is irregular
• Naïve data augmentations may generate grammatically and semantically incorrect
data
4
5. WeavSpeech
• Alignment Extraction between Speech signal and Transcript
• Weaving Transcripts
• POS Matching
• Embedding Similarity
• Weaving Speech Signals
• Smooth Padding
5
6. Experiment
• LibriSpeech
• Audio data from 1000hours of audiobooks
• LibriSpeech 100h and LibriSpeech 960h for low-scale and large-scale
• WSJ
• Audio data from 81 hours of news articles
• Dev93 comprises LDC94S13B (WSJ1)
• Eval92 comprises LDC93S6B (WSJ0)
6
7. Main results
• Outperformed baseline on all settings
• Consistently improves performance on the more challenging ‘other’ dataset of
LibriSpeech
7
8. Data Deficient Condition
• WeavSpeech can exhibit decent performances even under the data deficient
conditions
8
9. Ablation study
• The performance degrades when any module is eliminated
• The combination of all components effectively improves speech recognition
performance
9
11. Conclusion
• WeavSpeech is mixup-type data augmentation for automatic speech recognition
• WeavSpeech can be applied to any language without requiring language-specific
knowledges
• WeavSpeech can be seamlessly integrated with other verified augmentations
• Experimental results show the superiority of WeavSpeech, especially in the data
deficient condition.
11