Multi-target Voice Conversion Without Parallel Data By Adversarially Learning Disentangled Audio Representations
2018 Β· Ju-Chieh Chou, Cheng-Chieh Yeh, Hung-Yi Lee, et al.
Abstract
Recently, cycle-consistent adversarial network (Cycle-GAN) has been successfully applied to voice conversion to a different speaker without parallel data, although in those approaches an individual model is needed for each target speaker. In this paper, we propose an adversarial learning framework for voice conversion, with which a single model can be trained to convert the voice to many different speakers, all without parallel data, by separating the speaker characteristics from the linguistic content in speech signals. An autoencoder is first trained to extract speaker-independent latent representations and speaker embedding separately using another auxiliary speaker classifier to regularize the latent representation. The decoder then takes the speaker-independent latent representation and the target speaker embedding as the input to generate the voice of the target speaker with the linguistic content of the source utterance. The quality of decoder output is further improved by patch
Authors
(none)
Tags
Stats
Related papers
- Many-to-many Voice Conversion Using Conditional Cycle-consistent Adversarial Networks (2020)10.85
- Parallel-data-free Voice Conversion Using Cycle-consistent Adversarial Networks (2017)0.00
- Adversarially Trained Autoencoders For Parallel-data-free Voice Conversion (2019)6.34
- High-quality Nonparallel Voice Conversion Based On Cycle-consistent Adversarial Network (2018)0.00
- Many-to-many Voice Conversion With Out-of-dataset Speaker Support (2019)0.00
- CVC: Contrastive Learning For Non-parallel Voice Conversion (2020)7.50
- Recognition-synthesis Based Non-parallel Voice Conversion With Adversarial Learning (2020)0.00
- Subband-based Generative Adversarial Network For Non-parallel Many-to-many Voice Conversion (2022)0.00