Comparative Analysis Of Transfer Learning In Deep Learning Text-to-speech Models On A Few-shot, Low-resource, Customized Dataset
2023 Β· Ze Liu
Abstract
Text-to-Speech (TTS) synthesis using deep learning relies on voice quality. Modern TTS models are advanced, but they need large amount of data. Given the growing computational complexity of these models and the scarcity of large, high-quality datasets, this research focuses on transfer learning, especially on few-shot, low-resource, and customized datasets. In this research, "low-resource" specifically refers to situations where there are limited amounts of training data, such as a small number of audio recordings and corresponding transcriptions for a particular language or dialect. This thesis, is rooted in the pressing need to find TTS models that require less training time, fewer data samples, yet yield high-quality voice output. The research evaluates TTS state-of-the-art model transfer learning capabilities through a thorough technical analysis. It then conducts a hands-on experimental analysis to compare models' performance in a constrained dataset. This study investigates the e
Authors
(none)
Tags
Stats
Related papers
- Exploring Transfer Learning For Low Resource Emotional TTS (2019)0.00
- Adapting TTS Models For New Speakers Using Transfer Learning (2021)0.00
- Rapid Speaker Adaptation In Low Resource Text To Speech Systems Using Synthetic Data And Transfer Learning (2023)0.00
- End-to-end Text-to-speech For Low-resource Languages By Cross-lingual Transfer Learning (2019)0.00
- Towards Transfer Learning For End-to-end Speech Synthesis From Deep Pre-trained Language Models (2019)0.00
- Empowering Global Voices: A Data-efficient, Phoneme-tone Adaptive Approach To High-fidelity Speech Synthesis (2025)0.00
- Voice Filter: Few-shot Text-to-speech Speaker Adaptation Using Voice Conversion As A Post-processing Module (2022)8.35
- Transfer Learning Framework For Low-resource Text-to-speech Using A Large-scale Unlabeled Speech Corpus (2022)10.21