Stargan-zsvc: Towards Zero-shot Voice Conversion In Low-resource Contexts
2021 Β· Matthew Baas, Herman Kamper
Abstract
Voice conversion is the task of converting a spoken utterance from a source speaker so that it appears to be said by a different target speaker while retaining the linguistic content of the utterance. Recent advances have led to major improvements in the quality of voice conversion systems. However, to be useful in a wider range of contexts, voice conversion systems would need to be (i) trainable without access to parallel data, (ii) work in a zero-shot setting where both the source and target speakers are unseen during training, and (iii) run in real time or faster. Recent techniques fulfil one or two of these requirements, but not all three. This paper extends recent voice conversion models based on generative adversarial networks (GANs), to satisfy all three of these conditions. We specifically extend the recent StarGAN-VC model by conditioning it on a speaker embedding (from a potentially unseen speaker). This allows the model to be used in a zero-shot setting, and we therefore cal
Authors
(none)
Tags
Stats
Related papers
- GAZEV: Gan-based Zero-shot Voice Conversion Over Non-parallel Speech Corpus (2020)8.60
- Towards Low-resource Stargan Voice Conversion Using Weight Adaptive Instance Normalization (2020)7.81
- Stargan-vc2: Rethinking Conditional Methods For Stargan-based Voice Conversion (2019)0.00
- Stargan-vc: Non-parallel Many-to-many Voice Conversion With Star Generative Adversarial Networks (2018)18.09
- Starganv2-vc: A Diverse, Unsupervised, Non-parallel Framework For Natural-sounding Voice Conversion (2021)13.70
- SLMGAN: Exploiting Speech Language Model Representations For Unsupervised Zero-shot Voice Conversion In Gans (2023)0.00
- Stargan-vc+asr: Stargan-based Non-parallel Voice Conversion Regularized By Automatic Speech Recognition (2021)5.24
- Improvement Speaker Similarity For Zero-shot Any-to-any Voice Conversion Of Whispered And Regular Speech (2024)4.52