Adapter-based Extension Of Multi-speaker Text-to-speech Model For New Speakers
2022 Β· Cheng-Ping Hsieh, Subhankar Ghosh, Boris Ginsburg
Abstract
Fine-tuning is a popular method for adapting text-to-speech (TTS) models to new speakers. However this approach has some challenges. Usually fine-tuning requires several hours of high quality speech per speaker. There is also that fine-tuning will negatively affect the quality of speech synthesis for previously learnt speakers. In this paper we propose an alternative approach for TTS adaptation based on using parameter-efficient adapter modules. In the proposed approach, a few small adapter modules are added to the original network. The original weights are frozen, and only the adapters are fine-tuned on speech for new speaker. The parameter-efficient fine-tuning approach will produce a new model with high level of parameter sharing with original model. Our experiments on LibriTTS, HiFi-TTS and VCTK datasets validate the effectiveness of adapter-based method through objective and subjective metrics.
Authors
(none)
Tags
Stats
Related papers
- ADAPTERMIX: Exploring The Efficacy Of Mixture Of Adapters For Low-resource TTS Adaptation (2023)6.34
- Hypertts: Parameter Efficient Adaptation In Text To Speech Using Hypernetworks (2024)3.23
- Voicetailor: Lightweight Plug-in Adapter For Diffusion-based Personalized Text-to-speech (2024)3.58
- Leveraging Parameter-efficient Transfer Learning For Multi-lingual Text-to-speech Adaptation (2024)0.00
- Efficient Adapter Tuning Of Pre-trained Speech Models For Automatic Speaker Verification (2024)0.00
- Sample Efficient Adaptive Text-to-speech (2018)0.00
- Adapting TTS Models For New Speakers Using Transfer Learning (2021)0.00
- Nanovoice: Efficient Speaker-adaptive Text-to-speech For Multiple Speakers (2024)0.00