ZSVC: Zero-shot Style Voice Conversion With Disentangled Latent Diffusion Models And Adversarial Training
2025 Β· Xinfa Zhu, Lei He, Yujia Xiao, et al.
Abstract
Style voice conversion aims to transform the speaking style of source speech into a desired style while keeping the original speaker's identity. However, previous style voice conversion approaches primarily focus on well-defined domains such as emotional aspects, limiting their practical applications. In this study, we present ZSVC, a novel Zero-shot Style Voice Conversion approach that utilizes a speech codec and a latent diffusion model with speech prompting mechanism to facilitate in-context learning for speaking style conversion. To disentangle speaking style and speaker timbre, we introduce information bottleneck to filter speaking style in the source speech and employ Uncertainty Modeling Adaptive Instance Normalization (UMAdaIN) to perturb the speaker timbre in the style prompt. Moreover, we propose a novel adversarial training strategy to enhance in-context learning and improve style similarity. Experiments conducted on 44,000 hours of speech data demonstrate the superior perfo
Authors
(none)
Tags
Stats
Related papers
- Improving Zero-shot Voice Style Transfer Via Disentangled Representation Learning (2021)0.00
- Zero-shot Voice Conversion Via Self-supervised Prosody Representation Learning (2021)6.34
- Stablevc: Style Controllable Zero-shot Voice Conversion With Conditional Flow Matching (2024)7.81
- One-shot Voice Conversion For Style Transfer Based On Speaker Adaptation (2021)8.09
- Robust Disentangled Variational Speech Representation Learning For Zero-shot Voice Conversion (2022)10.97
- Seeing Your Speech Style: A Novel Zero-shot Identity-disentanglement Face-based Voice Conversion (2024)4.52
- ACE-VC: Adaptive And Controllable Voice Conversion Using Explicitly Disentangled Self-supervised Speech Representations (2023)0.00
- Promptvc: Flexible Stylistic Voice Conversion In Latent Space Driven By Natural Language Prompts (2023)9.41