Hypertts: Parameter Efficient Adaptation In Text To Speech Using Hypernetworks
2024 Β· Yingting Li, Rishabh Bhardwaj, Ambuj Mehrish, et al.
Abstract
Neural speech synthesis, or text-to-speech (TTS), aims to transform a signal from the text domain to the speech domain. While developing TTS architectures that train and test on the same set of speakers has seen significant improvements, out-of-domain speaker performance still faces enormous limitations. Domain adaptation on a new set of speakers can be achieved by fine-tuning the whole model for each new domain, thus making it parameter-inefficient. This problem can be solved by Adapters that provide a parameter-efficient alternative to domain adaptation. Although famous in NLP, speech synthesis has not seen much improvement from Adapters. In this work, we present HyperTTS, which comprises a small learnable network, "hypernetwork", that generates parameters of the Adapter blocks, allowing us to condition Adapters on speaker representations and making them dynamic. Extensive evaluations of two domain adaptation settings demonstrate its effectiveness in achieving state-of-the-art perfor
Authors
(none)
Tags
Stats
Related papers
- Adapter-based Extension Of Multi-speaker Text-to-speech Model For New Speakers (2022)6.77
- Leveraging Parameter-efficient Transfer Learning For Multi-lingual Text-to-speech Adaptation (2024)0.00
- ADAPTERMIX: Exploring The Efficacy Of Mixture Of Adapters For Low-resource TTS Adaptation (2023)6.34
- Sample Efficient Adaptive Text-to-speech (2018)0.00
- Rapid Speaker Adaptation In Low Resource Text To Speech Systems Using Synthetic Data And Transfer Learning (2023)0.00
- High Quality, Lightweight And Adaptable TTS Using Lpcnet (2019)10.97
- Speaker-adaptive Neural Vocoders For Parametric Speech Synthesis Systems (2018)2.26
- Linear Networks Based Speaker Adaptation For Speech Synthesis (2018)6.34