GAZEV: Gan-based Zero-shot Voice Conversion Over Non-parallel Speech Corpus
2020 Β· Zining Zhang, Bingsheng He, Zhenjie Zhang
Abstract
Non-parallel many-to-many voice conversion is recently attract-ing huge research efforts in the speech processing community. A voice conversion system transforms an utterance of a source speaker to another utterance of a target speaker by keeping the content in the original utterance and replacing by the vocal features from the target speaker. Existing solutions, e.g., StarGAN-VC2, present promising results, only when speech corpus of the engaged speakers is available during model training. AUTOVCis able to perform voice conversion on unseen speakers, but it needs an external pretrained speaker verification model. In this paper, we present our new GAN-based zero-shot voice conversion solution, called GAZEV, which targets to support unseen speakers on both source and target utterances. Our key technical contribution is the adoption of speaker embedding loss on top of the GAN framework, as well as adaptive instance normalization strategy, in order to address the limitations of speaker id
Authors
(none)
Tags
Stats
Related papers
- Stargan-zsvc: Towards Zero-shot Voice Conversion In Low-resource Contexts (2021)3.58
- AUTOVC: Zero-shot Voice Style Transfer With Only Autoencoder Loss (2019)0.00
- Stargan-vc: Non-parallel Many-to-many Voice Conversion With Star Generative Adversarial Networks (2018)18.09
- Subband-based Generative Adversarial Network For Non-parallel Many-to-many Voice Conversion (2022)0.00
- Starganv2-vc: A Diverse, Unsupervised, Non-parallel Framework For Natural-sounding Voice Conversion (2021)13.70
- Improvement Speaker Similarity For Zero-shot Any-to-any Voice Conversion Of Whispered And Regular Speech (2024)4.52
- SLMGAN: Exploiting Speech Language Model Representations For Unsupervised Zero-shot Voice Conversion In Gans (2023)0.00
- Robust Disentangled Variational Speech Representation Learning For Zero-shot Voice Conversion (2022)10.97