You Do Not Need More Data: Improving End-to-end Speech Recognition By Text-to-speech Data Augmentation
2020 Β· Aleksandr Laptev, Roman Korostik, Aleksey Svischev, et al.
Abstract
Data augmentation is one of the most effective ways to make end-to-end automatic speech recognition (ASR) perform close to the conventional hybrid approach, especially when dealing with low-resource tasks. Using recent advances in speech synthesis (text-to-speech, or TTS), we build our TTS system on an ASR training database and then extend the data with synthesized speech to train a recognition model. We argue that, when the training data amount is relatively low, this approach can allow an end-to-end model to reach hybrid systems' quality. For an artificial low-to-medium-resource setup, we compare the proposed augmentation with the semi-supervised learning technique. We also investigate the influence of vocoder usage on final ASR performance by comparing Griffin-Lim algorithm with our modified LPCNet. When applied with an external language model, our approach outperforms a semi-supervised setup for LibriSpeech test-clean and only 33% worse than a comparable supervised setup. Our syste
Authors
(none)
Tags
Stats
Related papers
- Generating Synthetic Audio Data For Attention-based Speech Recognition Systems (2019)12.68
- Improving Accented Speech Recognition Using Data Augmentation Based On Unsupervised Text-to-speech Synthesis (2024)0.00
- Low-resource Expressive Text-to-speech Using Data Augmentation (2020)11.29
- ASR Data Augmentation In Low-resource Settings Using Cross-lingual Multi-speaker TTS And Cross-lingual Voice Conversion (2022)6.77
- Improving Low Resource Code-switched ASR Using Augmented Code-switched TTS (2020)7.50
- Tts-by-tts: Tts-driven Data Augmentation For Fast And High-quality Speech Synthesis (2020)9.59
- Data Augmentation Methods For End-to-end Speech Recognition On Distant-talk Scenarios (2021)6.34
- Frustratingly Easy Data Augmentation For Low-resource ASR (2025)0.00