Many-to-many Voice Conversion With Out-of-dataset Speaker Support
2019 Β· Gokce Keskin, Tyler Lee, Cory Stephenson, et al.
Abstract
We present a Cycle-GAN based many-to-many voice conversion method that can convert between speakers that are not in the training set. This property is enabled through speaker embeddings generated by a neural network that is jointly trained with the Cycle-GAN. In contrast to prior work in this domain, our method enables conversion between an out-of-dataset speaker and a target speaker in either direction and does not require re-training. Out-of-dataset speaker conversion quality is evaluated using an independently trained speaker identification model, and shows good style conversion characteristics for previously unheard speakers. Subjective tests on human listeners show style conversion quality for in-dataset speakers is comparable to the state-of-the-art baseline model.
Authors
(none)
Tags
Stats
Related papers
- Many-to-many Voice Conversion Using Conditional Cycle-consistent Adversarial Networks (2020)10.85
- Multi-target Voice Conversion Without Parallel Data By Adversarially Learning Disentangled Audio Representations (2018)13.60
- Subband-based Generative Adversarial Network For Non-parallel Many-to-many Voice Conversion (2022)0.00
- Towards Low-resource Stargan Voice Conversion Using Weight Adaptive Instance Normalization (2020)7.81
- Stargan-vc: Non-parallel Many-to-many Voice Conversion With Star Generative Adversarial Networks (2018)18.09
- Starganv2-vc: A Diverse, Unsupervised, Non-parallel Framework For Natural-sounding Voice Conversion (2021)13.70
- High-quality Nonparallel Voice Conversion Based On Cycle-consistent Adversarial Network (2018)0.00
- Parallel-data-free Voice Conversion Using Cycle-consistent Adversarial Networks (2017)0.00