Data Augmentation Strategy for ASR via Semantic-Aware Weaving

•

0 gefällt mir•25 views

MLILAB

WeavSpeech: Data Augmentation Strategy for Automatic Speech Recognition via Semantic-Aware Weaving

Ingenieurwesen

Background
• End-to-end deep models require an immense amount of audio data with their
corresponding transcripts
• Most existing data augmentations focus on only transforming speech signal
2
“ICASSP is awesome.”

Challenges
• The length of speech segments is irregular
• Naïve data augmentations may generate grammatically and semantically incorrect
data
4

WeavSpeech
• Alignment Extraction between Speech signal and Transcript
• Weaving Transcripts
• POS Matching
• Embedding Similarity
• Weaving Speech Signals
• Smooth Padding
5

Experiment
• LibriSpeech
• Audio data from 1000hours of audiobooks
• LibriSpeech 100h and LibriSpeech 960h for low-scale and large-scale
• WSJ
• Audio data from 81 hours of news articles
• Dev93 comprises LDC94S13B (WSJ1)
• Eval92 comprises LDC93S6B (WSJ0)
6

Main results
• Outperformed baseline on all settings
• Consistently improves performance on the more challenging ‘other’ dataset of
LibriSpeech
7

Data Deficient Condition
• WeavSpeech can exhibit decent performances even under the data deficient
conditions
8

Ablation study
• The performance degrades when any module is eliminated
• The combination of all components effectively improves speech recognition
performance
9

Qualitative Analysis
• WeavSpeech is capable of generating grammatically plausible sentences
10

Conclusion
• WeavSpeech is mixup-type data augmentation for automatic speech recognition
• WeavSpeech can be applied to any language without requiring language-specific
knowledges
• WeavSpeech can be seamlessly integrated with other verified augmentations
• Experimental results show the superiority of WeavSpeech, especially in the data
deficient condition.
11

Empfohlen

LACS S y stem A nalysis on R etrieval M odels for the MediaEval 2014 Search a...multimediaeval

Big data solution for ngs data analysisYun Lung Li

BioTeam Bhanu Rekepalli Presentation at BICoB 2015The BioTeam Inc.

Continuous modeling - automating model building on high-performance e-Infrast...Ola Spjuth

Modern Computing: Cloud, Distributed, & High Performanceinside-BigData.com

Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionZachary S. Brown

Semantics as a service at EMBL-EBISimon Jupp

An Oz Mammals Bioinformatics and Data ResourcePhilippa Griffin

Empfohlen

LACS S y stem A nalysis on R etrieval M odels for the MediaEval 2014 Search a...multimediaeval

Big data solution for ngs data analysisYun Lung Li

BioTeam Bhanu Rekepalli Presentation at BICoB 2015The BioTeam Inc.

Continuous modeling - automating model building on high-performance e-Infrast...Ola Spjuth

Modern Computing: Cloud, Distributed, & High Performanceinside-BigData.com

Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionZachary S. Brown

Semantics as a service at EMBL-EBISimon Jupp

An Oz Mammals Bioinformatics and Data ResourcePhilippa Griffin

Extending the Depth of Coverage in SWATH® Acquisition with Deeper Ion Libraries SCIEX

Speech Recognition TechnologySeminar Links

COPO kick-off meetingAlejandra Gonzalez-Beltran

Giab ashg 2017GenomeInABottle

ngs.pptxaaaa bbb

Spoken Content Retrieval - Lattices and Beyondlinshanleearchive

Neural Network Language Models for Candidate Scoring in Multi-System Machine...Matīss ‎‎‎‎‎‎‎

Challenges and Opportunities of Big Data GenomicsYasin Memari

A Novel Approach to Classify and Detect Electroencephalography Signals and Im...Viraat Das

Next Generation Sequencing - An OverviewEdizonJambormias2

Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit

Kerry Taylor - Semantics & sensorsWeb Directions

The Tipping PointAndrzej Zydroń MBCS

The tipping pointAndrzej Zydroń MBCS

shenktalk.pdfHungryBastard

[DSC Adria 23] Veljko Pejovic Lightweight Deep Learning on Edge Devices.pptxDataScienceConferenc1

VO web-services-based astronomy workflowsJose Enrique Ruiz

IEEE_BigData2014-Lee.pdfssuserff37aa

Towards Automated AI-guided Drug Discovery LabsOla Spjuth

Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGIPhil Ewels

J. Jeong, AAAI 2024, MLILAB, KAIST AI..MLILAB

J. Yun, NeurIPS 2023, MLILAB, KAISTAIMLILAB

Weitere ähnliche Inhalte

Ähnlich wie Data Augmentation Strategy for ASR via Semantic-Aware Weaving

Extending the Depth of Coverage in SWATH® Acquisition with Deeper Ion Libraries SCIEX

Speech Recognition TechnologySeminar Links

COPO kick-off meetingAlejandra Gonzalez-Beltran

Giab ashg 2017GenomeInABottle

ngs.pptxaaaa bbb

Spoken Content Retrieval - Lattices and Beyondlinshanleearchive

Neural Network Language Models for Candidate Scoring in Multi-System Machine...Matīss ‎‎‎‎‎‎‎

Challenges and Opportunities of Big Data GenomicsYasin Memari

A Novel Approach to Classify and Detect Electroencephalography Signals and Im...Viraat Das

Next Generation Sequencing - An OverviewEdizonJambormias2

Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit

Kerry Taylor - Semantics & sensorsWeb Directions

The Tipping PointAndrzej Zydroń MBCS

The tipping pointAndrzej Zydroń MBCS

shenktalk.pdfHungryBastard

[DSC Adria 23] Veljko Pejovic Lightweight Deep Learning on Edge Devices.pptxDataScienceConferenc1

VO web-services-based astronomy workflowsJose Enrique Ruiz

IEEE_BigData2014-Lee.pdfssuserff37aa

Towards Automated AI-guided Drug Discovery LabsOla Spjuth

Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGIPhil Ewels

Ähnlich wie Data Augmentation Strategy for ASR via Semantic-Aware Weaving (20)

Extending the Depth of Coverage in SWATH® Acquisition with Deeper Ion Libraries

Speech Recognition Technology

COPO kick-off meeting

Giab ashg 2017

ngs.pptx

Spoken Content Retrieval - Lattices and Beyond

Neural Network Language Models for Candidate Scoring in Multi-System Machine...

Challenges and Opportunities of Big Data Genomics

A Novel Approach to Classify and Detect Electroencephalography Signals and Im...

Next Generation Sequencing - An Overview

Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Kerry Taylor - Semantics & sensors

The Tipping Point

The tipping point

shenktalk.pdf

[DSC Adria 23] Veljko Pejovic Lightweight Deep Learning on Edge Devices.pptx

VO web-services-based astronomy workflows

IEEE_BigData2014-Lee.pdf

Towards Automated AI-guided Drug Discovery Labs

Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGI

Mehr von MLILAB

J. Jeong, AAAI 2024, MLILAB, KAIST AI..MLILAB

J. Yun, NeurIPS 2023, MLILAB, KAISTAIMLILAB

S. Kim, NeurIPS 2023, MLILAB, KAISTAIMLILAB

C. Kim, INTERSPEECH 2023, MLILAB, KAISTAIMLILAB

Y. Jung, ICML 2023, MLILAB, KAISTAIMLILAB

J. Song, S. Kim, ICML 2023, MLILAB, KAISTAIMLILAB

G. Kim, CVPR 2023, MLILAB, KAISTAIMLILAB

S. Kim, ICLR 2023, MLILAB, KAISTAIMLILAB

Y. Kim, ICLR 2023, MLILAB, KAISTAIMLILAB

J. Yun, AISTATS 2022, MLILAB, KAISTAIMLILAB

J. Song, J. Park, ICML 2022, MLILAB, KAISTAIMLILAB

J. Park, J. Song, ICLR 2022, MLILAB, KAISTAIMLILAB

J. Park, H. Shim, AAAI 2022, MLILAB, KAISTAIMLILAB

J. Park, AAAI 2022, MLILAB, KAIST AIMLILAB

J. Song, et. al., ASRU 2021, MLILAB, KAIST AIMLILAB

J. Song, H. Shim et al., ICASSP 2021, MLILAB, KAIST AIMLILAB

T. Yoon, et. al., ICLR 2021, MLILAB, KAIST AIMLILAB

G. Park, J.-Y. Yang, et. al., NeurIPS 2020, MLILAB, KAIST AIMLILAB

I. Chung, AAAI 2020, MLILAB, KAIST AIMLILAB

H. Shim, NeurIPS 2018, MLILAB, KAIST AIMLILAB

Mehr von MLILAB (20)

J. Jeong, AAAI 2024, MLILAB, KAIST AI..

J. Yun, NeurIPS 2023, MLILAB, KAISTAI

S. Kim, NeurIPS 2023, MLILAB, KAISTAI

C. Kim, INTERSPEECH 2023, MLILAB, KAISTAI

Y. Jung, ICML 2023, MLILAB, KAISTAI

J. Song, S. Kim, ICML 2023, MLILAB, KAISTAI

G. Kim, CVPR 2023, MLILAB, KAISTAI

S. Kim, ICLR 2023, MLILAB, KAISTAI

Y. Kim, ICLR 2023, MLILAB, KAISTAI

J. Yun, AISTATS 2022, MLILAB, KAISTAI

J. Song, J. Park, ICML 2022, MLILAB, KAISTAI

J. Park, J. Song, ICLR 2022, MLILAB, KAISTAI

J. Park, H. Shim, AAAI 2022, MLILAB, KAISTAI

J. Park, AAAI 2022, MLILAB, KAIST AI

J. Song, et. al., ASRU 2021, MLILAB, KAIST AI

J. Song, H. Shim et al., ICASSP 2021, MLILAB, KAIST AI

T. Yoon, et. al., ICLR 2021, MLILAB, KAIST AI

G. Park, J.-Y. Yang, et. al., NeurIPS 2020, MLILAB, KAIST AI

I. Chung, AAAI 2020, MLILAB, KAIST AI

H. Shim, NeurIPS 2018, MLILAB, KAIST AI

Kürzlich hochgeladen

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla

UNIT - IV - Air Compressors and its Performancesivaprakash250

HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95

Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat

Kürzlich hochgeladen (20)

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts

Processing & Properties of Floor and Wall Tiles.pptx

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS

UNIT - IV - Air Compressors and its Performance

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV

Coefficient of Thermal Expansion and their Importance.pptx

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service

(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...

Data Augmentation Strategy for ASR via Semantic-Aware Weaving

1. WeavSpeech: Data Augmentation Strategy for Automatic Speech Recognition via Semantic-Aware Weaving Kyusung Seo1, Joonhyung Park1 , Jaeyun Song1 , Eunho Yang1, 2 1Korea Advanced Institute of Science and Technology (KAIST) 2AITRCS ICASSP 2023

2. Background • End-to-end deep models require an immense amount of audio data with their corresponding transcripts • Most existing data augmentations focus on only transforming speech signal 2 “ICASSP is awesome.”

3. Mixup, CutOut, and CutMix 3

4. Challenges • The length of speech segments is irregular • Naïve data augmentations may generate grammatically and semantically incorrect data 4

5. WeavSpeech • Alignment Extraction between Speech signal and Transcript • Weaving Transcripts • POS Matching • Embedding Similarity • Weaving Speech Signals • Smooth Padding 5

6. Experiment • LibriSpeech • Audio data from 1000hours of audiobooks • LibriSpeech 100h and LibriSpeech 960h for low-scale and large-scale • WSJ • Audio data from 81 hours of news articles • Dev93 comprises LDC94S13B (WSJ1) • Eval92 comprises LDC93S6B (WSJ0) 6

7. Main results • Outperformed baseline on all settings • Consistently improves performance on the more challenging ‘other’ dataset of LibriSpeech 7

8. Data Deficient Condition • WeavSpeech can exhibit decent performances even under the data deficient conditions 8

9. Ablation study • The performance degrades when any module is eliminated • The combination of all components effectively improves speech recognition performance 9

10. Qualitative Analysis • WeavSpeech is capable of generating grammatically plausible sentences 10

11. Conclusion • WeavSpeech is mixup-type data augmentation for automatic speech recognition • WeavSpeech can be applied to any language without requiring language-specific knowledges • WeavSpeech can be seamlessly integrated with other verified augmentations • Experimental results show the superiority of WeavSpeech, especially in the data deficient condition. 11