Efficient Non-autoregressive GAN Voice Conversion Using Vqwav2vec Features And Dynamic Convolution
2022 Β· Mingjie Chen, Yanghao Zhou, Heyan Huang, et al.
Abstract
It was shown recently that a combination of ASR and TTS models yield highly competitive performance on standard voice conversion tasks such as the Voice Conversion Challenge 2020 (VCC2020). To obtain good performance both models require pretraining on large amounts of data, thereby obtaining large models that are potentially inefficient in use. In this work we present a model that is significantly smaller and thereby faster in processing while obtaining equivalent performance. To achieve this the proposed model, Dynamic-GAN-VC (DYGAN-VC), uses a non-autoregressive structure and makes use of vector quantised embeddings obtained from a VQWav2vec model. Furthermore dynamic convolution is introduced to improve speech content modeling while requiring a small number of parameters. Objective and subjective evaluation was performed using the VCC2020 task, yielding MOS scores of up to 3.86, and character error rates as low as 4.3%. This was achieved with approximately half the number of model p
Authors
(none)
Tags
Stats
Related papers
- An Adaptive Learning Based Generative Adversarial Network For One-to-one Voice Conversion (2021)10.61
- Starganv2-vc: A Diverse, Unsupervised, Non-parallel Framework For Natural-sounding Voice Conversion (2021)13.70
- Generative Adversarial Network Based Voice Conversion: Techniques, Challenges, And Recent Advancements (2025)0.00
- Baseline System Of Voice Conversion Challenge 2020 With Cyclic Variational Autoencoder And Parallel Wavegan (2020)4.24
- Voice Conversion From Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks (2017)16.34
- VQVC+: One-shot Voice Conversion By Vector Quantization And U-net Architecture (2020)13.34
- Vocoder-free Non-parallel Conversion Of Whispered Speech With Masked Cycle-consistent Generative Adversarial Networks (2023)0.00
- Vits-based Singing Voice Conversion System With DSPGAN Post-processing For SVCC2023 (2023)5.84