Rapid Speaker Adaptation In Low Resource Text To Speech Systems Using Synthetic Data And Transfer Learning
2023 Β· Raviraj Joshi, Nikesh Garera
Abstract
Text-to-speech (TTS) systems are being built using end-to-end deep learning approaches. However, these systems require huge amounts of training data. We present our approach to built production quality TTS and perform speaker adaptation in extremely low resource settings. We propose a transfer learning approach using high-resource language data and synthetically generated data. We transfer the learnings from the out-domain high-resource English language. Further, we make use of out-of-the-box single-speaker TTS in the target language to generate in-domain synthetic data. We employ a three-step approach to train a high-quality single-speaker TTS system in a low-resource Indian language Hindi. We use a Tacotron2 like setup with a spectrogram prediction network and a waveglow vocoder. The Tacotron2 acoustic model is trained on English data, followed by synthetic Hindi data from the existing TTS system. Finally, the decoder of this model is fine-tuned on only 3 hours of target Hindi speake
Authors
(none)
Tags
Stats
Related papers
- Generic Indic Text-to-speech Synthesisers With Rapid Adaptation In An End-to-end Framework (2020)8.82
- Using Ipa-based Tacotron For Data Efficient Cross-lingual Speaker Adaptation And Pronunciation Enhancement (2020)0.00
- End-to-end Text-to-speech For Low-resource Languages By Cross-lingual Transfer Learning (2019)0.00
- Towards Building Text-to-speech Systems For The Next Billion Users (2022)0.00
- Adapting TTS Models For New Speakers Using Transfer Learning (2021)0.00
- Empowering Global Voices: A Data-efficient, Phoneme-tone Adaptive Approach To High-fidelity Speech Synthesis (2025)0.00
- Non-autoregressive TTS With Explicit Duration Modelling For Low-resource Highly Expressive Speech (2021)8.82
- Cross-lingual Low Resource Speaker Adaptation Using Phonological Features (2021)5.24