Personalized Lightweight Text-to-speech: Voice Cloning With Adaptive Structured Pruning
2023 Β· Sung-Feng Huang, Chia-Ping Chen, Zhi-Sheng Chen, et al.
Abstract
Personalized TTS is an exciting and highly desired application that allows users to train their TTS voice using only a few recordings. However, TTS training typically requires many hours of recording and a large model, making it unsuitable for deployment on mobile devices. To overcome this limitation, related works typically require fine-tuning a pre-trained TTS model to preserve its ability to generate high-quality audio samples while adapting to the target speaker's voice. This process is commonly referred to as ``voice cloning.'' Although related works have achieved significant success in changing the TTS model's voice, they are still required to fine-tune from a large pre-trained model, resulting in a significant size for the voice-cloned model. In this paper, we propose applying trainable structured pruning to voice cloning. By training the structured pruning masks with voice-cloning data, we can produce a unique pruned model for each target speaker. Our experiments demonstrate th
Authors
(none)
Tags
Stats
Related papers
- Data Efficient Voice Cloning For Neural Singing Synthesis (2019)10.07
- Adapting TTS Models For New Speakers Using Transfer Learning (2021)0.00
- Mobilespeech: A Fast And High-fidelity Framework For Mobile Zero-shot Text-to-speech (2024)0.00
- Voice Cloning: A Multi-speaker Text-to-speech Synthesis Approach Based On Transfer Learning (2021)0.00
- Data Efficient Voice Cloning From Noisy Samples With Domain Adversarial Training (2020)9.92
- On The Interplay Between Sparsity, Naturalness, Intelligibility, And Prosody In Speech Synthesis (2021)5.24
- EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-to-speech Models (2022)3.58
- Empowering Global Voices: A Data-efficient, Phoneme-tone Adaptive Approach To High-fidelity Speech Synthesis (2025)0.00