Starganv2-vc: A Diverse, Unsupervised, Non-parallel Framework For Natural-sounding Voice Conversion
2021 Β· Yinghao Aaron Li, Ali Zare, Nima Mesgarani
Abstract
We present an unsupervised non-parallel many-to-many voice conversion (VC) method using a generative adversarial network (GAN) called StarGAN v2. Using a combination of adversarial source classifier loss and perceptual loss, our model significantly outperforms previous VC models. Although our model is trained only with 20 English speakers, it generalizes to a variety of voice conversion tasks, such as any-to-many, cross-lingual, and singing conversion. Using a style encoder, our framework can also convert plain reading speech into stylistic speech, such as emotional and falsetto speech. Subjective and objective evaluation experiments on a non-parallel many-to-many voice conversion task revealed that our model produces natural sounding voices, close to the sound quality of state-of-the-art text-to-speech (TTS) based voice conversion methods without the need for text labels. Moreover, our model is completely convolutional and with a faster-than-real-time vocoder such as Parallel WaveGAN
Authors
(none)
Tags
Stats
Related papers
- Stargan-vc: Non-parallel Many-to-many Voice Conversion With Star Generative Adversarial Networks (2018)18.09
- Nonparallel Voice Conversion With Augmented Classifier Star Generative Adversarial Networks (2020)9.92
- Stargan-vc+asr: Stargan-based Non-parallel Voice Conversion Regularized By Automatic Speech Recognition (2021)5.24
- Stargan-vc2: Rethinking Conditional Methods For Stargan-based Voice Conversion (2019)0.00
- Subband-based Generative Adversarial Network For Non-parallel Many-to-many Voice Conversion (2022)0.00
- Stargan-vc++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings (2023)2.26
- Cyclegan-vc2: Improved Cyclegan-based Non-parallel Voice Conversion (2019)17.45
- High-quality Nonparallel Voice Conversion Based On Cycle-consistent Adversarial Network (2018)0.00