ASR Data Augmentation In Low-resource Settings Using Cross-lingual Multi-speaker TTS And Cross-lingual Voice Conversion
2022 Β· Edresson Casanova, Christopher Shulby, Alexander Korolev, et al.
Abstract
We explore cross-lingual multi-speaker speech synthesis and cross-lingual voice conversion applied to data augmentation for automatic speech recognition (ASR) systems in low/medium-resource scenarios. Through extensive experiments, we show that our approach permits the application of speech synthesis and voice conversion to improve ASR systems using only one target-language speaker during model training. We also managed to close the gap between ASR models trained with synthesized versus human speech compared to other works that use many speakers. Finally, we show that it is possible to obtain promising ASR training results with our data augmentation method using only a single real speaker in a target language.
Authors
(none)
Tags
Stats
Related papers
- Frustratingly Easy Data Augmentation For Low-resource ASR (2025)0.00
- Learning Cross-lingual Mappings For Data Augmentation To Improve Low-resource Speech Recognition (2023)0.00
- Improving Low Resource Code-switched ASR Using Augmented Code-switched TTS (2020)7.50
- Voice Conversion Can Improve ASR In Very Low-resource Settings (2021)7.50
- You Do Not Need More Data: Improving End-to-end Speech Recognition By Text-to-speech Data Augmentation (2020)11.49
- Speech Synthesis As Augmentation For Low-resource ASR (2020)0.00
- Skinaugment: Auto-encoding Speaker Conversions For Automatic Speech Translation (2020)7.16
- Low-resource Expressive Text-to-speech Using Data Augmentation (2020)11.29