Stepback: Enhanced Disentanglement For Voice Conversion Via Multi-task Learning
2025 Β· Qian Yang, Calbert Graham
Abstract
Voice conversion (VC) modifies voice characteristics while preserving linguistic content. This paper presents the Stepback network, a novel model for converting speaker identity using non-parallel data. Unlike traditional VC methods that rely on parallel data, our approach leverages deep learning techniques to enhance disentanglement completion and linguistic content preservation. The Stepback network incorporates a dual flow of different domain data inputs and uses constraints with self-destructive amendments to optimize the content encoder. Extensive experiments show that our model significantly improves VC performance, reducing training costs while achieving high-quality voice conversion. The Stepback network's design offers a promising solution for advanced voice conversion tasks.
Authors
(none)
Tags
Stats
Related papers
- Beyond Voice Identity Conversion: Manipulating Voice Attributes By Adversarial Learning Of Structured Disentangled Representations (2021)0.00
- Discrete Unit Based Masking For Improving Disentanglement In Voice Conversion (2024)0.00
- Preserving Background Sound In Noise-robust Voice Conversion Via Multi-task Learning (2022)0.00
- Fastvc: Fast Voice Conversion With Non-parallel Data (2020)5.24
- Many-to-many Voice Conversion Based Feature Disentanglement Using Variational Autoencoder (2021)7.81
- MAIN-VC: Lightweight Speech Representation Disentanglement For One-shot Voice Conversion (2024)3.58
- Emotional Voice Conversion Using Multitask Learning With Text-to-speech (2019)0.00
- Learning Disentangled Speech Representations With Contrastive Learning And Time-invariant Retrieval (2024)5.84