On-the-fly Aligned Data Augmentation For Sequence-to-sequence ASR
2021 Β· Tsz Kin Lam, Mayumi Ohta, Shigehiko Schamoni, et al.
Abstract
We propose an on-the-fly data augmentation method for automatic speech recognition (ASR) that uses alignment information to generate effective training samples. Our method, called Aligned Data Augmentation (ADA) for ASR, replaces transcribed tokens and the speech representations in an aligned manner to generate previously unseen training pairs. The speech representations are sampled from an audio dictionary that has been extracted from the training corpus and inject speaker variations into the training examples. The transcribed tokens are either predicted by a language model such that the augmented data pairs are semantically close to the original data, or randomly sampled. Both strategies result in training pairs that improve robustness in ASR training. Our experiments on a Seq-to-Seq architecture show that ADA can be applied on top of SpecAugment, and achieves about 9-23% and 4-15% relative improvements in WER over SpecAugment alone on LibriSpeech 100h and LibriSpeech 960h test datas
Authors
(none)
Tags
Stats
Related papers
- Generating Synthetic Audio Data For Attention-based Speech Recognition Systems (2019)12.68
- Improving Sequence-to-sequence Speech Recognition Training With On-the-fly Data Augmentation (2019)0.00
- Data Augmentation For End-to-end Code-switching Speech Recognition (2020)9.92
- Back-translation-style Data Augmentation For End-to-end ASR (2018)13.11
- Training Data Augmentation For Dysarthric Automatic Speech Recognition By Text-to-dysarthric-speech Synthesis (2024)10.48
- Multi-modal Data Augmentation For End-to-end ASR (2018)11.67
- Personalized Adversarial Data Augmentation For Dysarthric And Elderly Speech Recognition (2022)11.49
- ASR Data Augmentation In Low-resource Settings Using Cross-lingual Multi-speaker TTS And Cross-lingual Voice Conversion (2022)6.77