Voiceshop: A Unified Speech-to-speech Framework For Identity-preserving Zero-shot Voice Editing
2024 Β· Philip Anastassiou, Zhenyu Tang, Kainan Peng, et al.
Abstract
We present VoiceShop, a novel speech-to-speech framework that can modify multiple attributes of speech, such as age, gender, accent, and speech style, in a single forward pass while preserving the input speaker's timbre. Previous works have been constrained to specialized models that can only edit these attributes individually and suffer from the following pitfalls: the magnitude of the conversion effect is weak, there is no zero-shot capability for out-of-distribution speakers, or the synthesized outputs exhibit undesirable timbre leakage. Our work proposes solutions for each of these issues in a simple modular framework based on a conditional diffusion backbone model with optional normalizing flow-based and sequence-to-sequence speaker attribute-editing modules, whose components can be combined or removed during inference to meet a wide array of tasks without additional model finetuning. Audio samples are available at https://voiceshopai.github.io.
Authors
(none)
Tags
Stats
Related papers
- Everyone-can-sing: Zero-shot Singing Voice Synthesis And Conversion With Speech Reference (2025)0.00
- ACE-VC: Adaptive And Controllable Voice Conversion Using Explicitly Disentangled Self-supervised Speech Representations (2023)0.00
- Controlspeech: Towards Simultaneous And Independent Zero-shot Speaker Cloning And Zero-shot Language Style Control (2024)9.40
- Seeing Your Speech Style: A Novel Zero-shot Identity-disentanglement Face-based Voice Conversion (2024)4.52
- Editspeech: A Text Based Speech Editing System Using Partial Inference And Bidirectional Fusion (2021)9.92
- Residual Speaker Representation For One-shot Voice Conversion (2023)0.00
- Beyond Voice Identity Conversion: Manipulating Voice Attributes By Adversarial Learning Of Structured Disentangled Representations (2021)0.00
- Vevo: Controllable Zero-shot Voice Imitation With Self-supervised Disentanglement (2025)0.00