Custom Data Augmentation For Low Resource ASR Using Bark And Retrieval-based Voice Conversion
2023 Β· Anand Kamble, Aniket Tathe, Suyash Kumbharkar, et al.
Abstract
This paper proposes two innovative methodologies to construct customized Common Voice datasets for low-resource languages like Hindi. The first methodology leverages Bark, a transformer-based text-to-audio model developed by Suno, and incorporates Meta's enCodec and a pre-trained HuBert model to enhance Bark's performance. The second methodology employs Retrieval-Based Voice Conversion (RVC) and uses the Ozen toolkit for data preparation. Both methodologies contribute to the advancement of ASR technology and offer valuable insights into addressing the challenges of constructing customized Common Voice datasets for under-resourced languages. Furthermore, they provide a pathway to achieving high-quality, personalized voice generation for a range of applications.
Authors
(none)
Tags
Stats
Related papers
- Frustratingly Easy Data Augmentation For Low-resource ASR (2025)0.00
- ASR Data Augmentation In Low-resource Settings Using Cross-lingual Multi-speaker TTS And Cross-lingual Voice Conversion (2022)6.77
- Voice Conversion Can Improve ASR In Very Low-resource Settings (2021)7.50
- Dialect Adaptation And Data Augmentation For Low-resource ASR: Taltech Systems For The MADASR 2023 Challenge (2023)6.34
- Exploring Voice Conversion Based Data Augmentation In Text-dependent Speaker Verification (2020)0.00
- Enhancing Out-of-vocabulary Performance Of Indian TTS Systems For Practical Applications Through Low-effort Data Strategies (2024)0.00
- End To End Hindi To English Speech Conversion Using Bark, Mbart And A Finetuned XLSR Wav2vec2 (2024)0.00
- Low-data? No Problem: Low-resource, Language-agnostic Conversational Text-to-speech Via F0-conditioned Data Augmentation (2022)0.00