Voice Conversion Can Improve ASR In Very Low-resource Settings
2021 Β· Matthew Baas, Herman Kamper
Abstract
Voice conversion (VC) could be used to improve speech recognition systems in low-resource languages by using it to augment limited training data. However, VC has not been widely used for this purpose because of practical issues such as compute speed and limitations when converting to and from unseen speakers. Moreover, it is still unclear whether a VC model trained on one well-resourced language can be applied to speech from another low-resource language for the aim of data augmentation. In this work we assess whether a VC system can be used cross-lingually to improve low-resource speech recognition. We combine several recent techniques to design and train a practical VC system in English, and then use this system to augment data for training speech recognition models in several low-resource languages. When using a sensible amount of VC augmented data, speech recognition performance is improved in all four low-resource languages considered. We also show that VC-based augmentation is su
Authors
(none)
Tags
Stats
Related papers
- ASR Data Augmentation In Low-resource Settings Using Cross-lingual Multi-speaker TTS And Cross-lingual Voice Conversion (2022)6.77
- Improving Child Speech Recognition With Augmented Child-like Speech (2024)5.24
- Cross-speaker Emotion Transfer For Low-resource Text-to-speech Using Non-parallel Voice Conversion With Pitch-shift Data Augmentation (2022)8.09
- Transfer Learning From Monolingual ASR To Transcription-free Cross-lingual Voice Conversion (2020)0.00
- Measuring The Effectiveness Of Voice Conversion On Speaker Identification And Automatic Speech Recognition Systems (2019)0.00
- Voice Filter: Few-shot Text-to-speech Speaker Adaptation Using Voice Conversion As A Post-processing Module (2022)8.35
- How Far Are We From Robust Voice Conversion: A Survey (2020)9.41
- Exploring Voice Conversion Based Data Augmentation In Text-dependent Speaker Verification (2020)0.00