Fastvc: Fast Voice Conversion With Non-parallel Data
2020 Β· Oriol Barbany Mayor, Milos Cernak
Abstract
This paper introduces FastVC, an end-to-end model for fast Voice Conversion (VC). The proposed model can convert speech of arbitrary length from multiple source speakers to multiple target speakers. FastVC is based on a conditional AutoEncoder (AE) trained on non-parallel data and requires no annotations at all. This model's latent representation is shown to be speaker-independent and similar to phonemes, which is a desirable feature for VC systems. While the current VC systems primarily focus on achieving the highest overall speech quality, this paper tries to balance the development concerning resources needed to run the systems. Despite the simple structure of the proposed model, it outperforms the VC Challenge 2020 baselines on the cross-lingual task in terms of naturalness.
Authors
(none)
Tags
Stats
Related papers
- ACVAE-VC: Non-parallel Many-to-many Voice Conversion With Auxiliary Classifier Variational Autoencoder (2018)14.69
- AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice Conversion (2021)7.50
- Fasts2s-vc: Streaming Non-autoregressive Sequence-to-sequence Voice Conversion (2021)0.00
- Voice Conversion From Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks (2017)16.34
- F0-consistent Many-to-many Non-parallel Voice Conversion Via Conditional Autoencoder (2020)13.17
- Voicy: Zero-shot Non-parallel Voice Conversion In Noisy Reverberant Environments (2021)5.24
- Towards General-purpose Text-instruction-guided Voice Conversion (2023)0.00
- Parallel-data-free Voice Conversion Using Cycle-consistent Adversarial Networks (2017)0.00