Adversarially Trained Autoencoders For Parallel-data-free Voice Conversion
2019 Β· Orhan Ocal, Oguz H. Elibol, Gokce Keskin, et al.
Abstract
We present a method for converting the voices between a set of speakers. Our method is based on training multiple autoencoder paths, where there is a single speaker-independent encoder and multiple speaker-dependent decoders. The autoencoders are trained with an addition of an adversarial loss which is provided by an auxiliary classifier in order to guide the output of the encoder to be speaker independent. The training of the model is unsupervised in the sense that it does not require collecting the same utterances from the speakers nor does it require time aligning over phonemes. Due to the use of a single encoder, our method can generalize to converting the voice of out-of-training speakers to speakers in the training dataset. We present subjective tests corroborating the performance of our method.
Authors
(none)
Tags
Stats
Related papers
- Multi-target Voice Conversion Without Parallel Data By Adversarially Learning Disentangled Audio Representations (2018)13.60
- Learning In Your Voice: Non-parallel Voice Conversion Based On Speaker Consistency Loss (2020)0.00
- Recognition-synthesis Based Non-parallel Voice Conversion With Adversarial Learning (2020)0.00
- Many-to-many Voice Conversion Based Feature Disentanglement Using Variational Autoencoder (2021)7.81
- Non-parallel Sequence-to-sequence Voice Conversion With Disentangled Linguistic And Speaker Representations (2019)14.02
- Parallel-data-free Voice Conversion Using Cycle-consistent Adversarial Networks (2017)0.00
- Semi-supervised Voice Conversion With Amortized Variational Inference (2019)3.58
- ACVAE-VC: Non-parallel Many-to-many Voice Conversion With Auxiliary Classifier Variational Autoencoder (2018)14.69