Speech Synthesis As Augmentation For Low-resource ASR
2020 Β· Deblin Bagchi, Shannon Wotherspoon, Zhuolin Jiang, et al.
Abstract
Speech synthesis might hold the key to low-resource speech recognition. Data augmentation techniques have become an essential part of modern speech recognition training. Yet, they are simple, naive, and rarely reflect real-world conditions. Meanwhile, speech synthesis techniques have been rapidly getting closer to the goal of achieving human-like speech. In this paper, we investigate the possibility of using synthesized speech as a form of data augmentation to lower the resources necessary to build a speech recognizer. We experiment with three different kinds of synthesizers: statistical parametric, neural, and adversarial. Our findings are interesting and point to new research directions for the future.
Authors
(none)
Tags
Stats
Related papers
- Frustratingly Easy Data Augmentation For Low-resource ASR (2025)0.00
- ASR Data Augmentation In Low-resource Settings Using Cross-lingual Multi-speaker TTS And Cross-lingual Voice Conversion (2022)6.77
- Speech Recognition With Augmented Synthesized Speech (2019)13.97
- You Do Not Need More Data: Improving End-to-end Speech Recognition By Text-to-speech Data Augmentation (2020)11.49
- Improving Low Resource Code-switched ASR Using Augmented Code-switched TTS (2020)7.50
- Generating Synthetic Audio Data For Attention-based Speech Recognition Systems (2019)12.68
- Reduce, Reuse, Recycle: Is Perturbed Data Better Than Other Language Augmentation For Low Resource Self-supervised Speech Models (2023)0.00
- Low-resource Expressive Text-to-speech Using Data Augmentation (2020)11.29