Data Augmentation Methods For End-to-end Speech Recognition On Distant-talk Scenarios
2021 Β· Emiru Tsunoo, Kentaro Shibata, Chaitanya Narisetty, et al.
Abstract
Although end-to-end automatic speech recognition (E2E ASR) has achieved great performance in tasks that have numerous paired data, it is still challenging to make E2E ASR robust against noisy and low-resource conditions. In this study, we investigated data augmentation methods for E2E ASR in distant-talk scenarios. E2E ASR models are trained on the series of CHiME challenge datasets, which are suitable tasks for studying robustness against noisy and spontaneous speech. We propose to use three augmentation methods and thier combinations: 1) data augmentation using text-to-speech (TTS) data, 2) cycle-consistent generative adversarial network (Cycle-GAN) augmentation trained to map two different audio characteristics, the one of clean speech and of noisy recordings, to match the testing condition, and 3) pseudo-label augmentation provided by the pretrained ASR module for smoothing label distributions. Experimental results using the CHiME-6/CHiME-4 datasets show that each augmentation meth
Authors
(none)
Tags
Stats
Related papers
- You Do Not Need More Data: Improving End-to-end Speech Recognition By Text-to-speech Data Augmentation (2020)11.49
- Data Augmentation For End-to-end Code-switching Speech Recognition (2020)9.92
- Improving Code-switching And Named Entity Recognition In ASR With Speech Editing Based Data Augmentation (2023)6.34
- Personalized Adversarial Data Augmentation For Dysarthric And Elderly Speech Recognition (2022)11.49
- Hmm-based Data Augmentation For E2E Systems For Building Conversational Speech Synthesis Systems (2022)0.00
- Mixspeech: Data Augmentation For Low-resource Automatic Speech Recognition (2021)13.60
- Improving Sequence-to-sequence Speech Recognition Training With On-the-fly Data Augmentation (2019)0.00
- Frustratingly Easy Data Augmentation For Low-resource ASR (2025)0.00