Speech Recognition With Augmented Synthesized Speech
2019 · Andrew Rosenberg, Yu Zhang, Bhuvana Ramabhadran, et al.
Abstract
Recent success of the Tacotron speech synthesis architecture and its variants in producing natural sounding multi-speaker synthesized speech has raised the exciting possibility of replacing expensive, manually transcribed, domain-specific, human speech that is used to train speech recognizers. The multi-speaker speech synthesis architecture can learn latent embedding spaces of prosody, speaker and style variations derived from input acoustic representations thereby allowing for manipulation of the synthesized speech. In this paper, we evaluate the feasibility of enhancing speech recognition performance using speech synthesis using two corpora from different domains. We explore algorithms to provide the necessary acoustic and lexical diversity needed for robust speech recognition. Finally, we demonstrate the feasibility of this approach as a data augmentation strategy for domain-transfer. We find that improvements to speech recognition performance is achievable by augmenting training
Authors
(none)
Tags
Stats
Related papers
- Synth2aug: Cross-domain Speaker Recognition With TTS Synthesized Speech (2020)6.77
- Spoken Language Corpora Augmentation With Domain-specific Voice-cloned Speech (2024)0.00
- Speech Synthesis As Augmentation For Low-resource ASR (2020)0.00
- Improving Accented Speech Recognition Using Data Augmentation Based On Unsupervised Text-to-speech Synthesis (2024)0.00
- Generating Synthetic Audio Data For Attention-based Speech Recognition Systems (2019)12.68
- You Do Not Need More Data: Improving End-to-end Speech Recognition By Text-to-speech Data Augmentation (2020)11.49
- Tts-by-tts: Tts-driven Data Augmentation For Fast And High-quality Speech Synthesis (2020)9.59
- Improving Low Resource Code-switched ASR Using Augmented Code-switched TTS (2020)7.50