Fed-pisa: Federated Voice Cloning Via Personalized Identity-style Adaptation
2025 Β· Qi Wang, Shituo Ma, Guoxin Yu, et al.
Abstract
Voice cloning for Text-to-Speech (TTS) aims to generate expressive and personalized speech from text using limited data from a target speaker. Federated Learning (FL) offers a collaborative and privacy-preserving framework for this task, but existing approaches suffer from high communication costs and tend to suppress stylistic heterogeneity, resulting in insufficient personalization. To address these issues, we propose Fed-PISA, which stands for Federated Personalized Identity-Style Adaptation. To minimize communication costs, Fed-PISA introduces a disentangled Low-Rank Adaptation (LoRA) mechanism: the speaker's timbre is retained locally through a private ID-LoRA, while only a lightweight style-LoRA is transmitted to the server, thereby minimizing parameter exchange. To harness heterogeneity, our aggregation method, inspired by collaborative filtering, is introduced to create custom models for each client by learning from stylistically similar peers. Experiments show that Fed-PISA im
Authors
(none)
Tags
Stats
Related papers
- Communication-efficient Personalized Federated Learning For Speech-to-text Tasks (2024)7.81
- Fedspeech: Federated Text-to-speech With Continual Learning (2021)9.23
- Personalized Lightweight Text-to-speech: Voice Cloning With Adaptive Structured Pruning (2023)6.34
- Voicetailor: Lightweight Plug-in Adapter For Diffusion-based Personalized Text-to-speech (2024)3.58
- Seeing Your Speech Style: A Novel Zero-shot Identity-disentanglement Face-based Voice Conversion (2024)4.52
- Efficient Emotion And Speaker Adaptation In Llm-based TTS Via Characteristic-specific Partial Fine-tuning (2025)0.00
- Voiceshop: A Unified Speech-to-speech Framework For Identity-preserving Zero-shot Voice Editing (2024)0.00
- Facespeak: Expressive And High-quality Speech Synthesis From Human Portraits Of Different Styles (2025)0.00