Adavocoder: Adaptive Vocoder For Custom Voice
2022 Β· Xin Yuan, Yongbing Feng, Mingming Ye, et al.
Abstract
Custom voice is to construct a personal speech synthesis system by adapting the source speech synthesis model to the target model through the target few recordings. The solution to constructing a custom voice is to combine an adaptive acoustic model with a robust vocoder. However, training a robust vocoder usually requires a multi-speaker dataset, which should include various age groups and various timbres, so that the trained vocoder can be used for unseen speakers. Collecting such a multi-speaker dataset is difficult, and the dataset distribution always has a mismatch with the distribution of the target speaker dataset. This paper proposes an adaptive vocoder for custom voice from another novel perspective to solve the above problems. The adaptive vocoder mainly uses a cross-domain consistency loss to solve the overfitting problem encountered by the GAN-based neural vocoder in the transfer learning of few-shot scenes. We construct two adaptive vocoders, AdaMelGAN and AdaHiFi-GAN. Fir
Authors
(none)
Tags
Stats
Related papers
- Vocgan: A High-fidelity Real-time Vocoder With A Hierarchically-nested Adversarial Network (2020)12.54
- Avocodo: Generative Adversarial Network For Artifact-free Vocoder (2022)9.41
- An Adaptive Learning Based Generative Adversarial Network For One-to-one Voice Conversion (2021)10.61
- Stylemelgan: An Efficient High-fidelity Adversarial Vocoder With Temporal Adaptive Normalization (2020)13.05
- Vnet: A Gan-based Multi-tier Discriminator Network For Speech Synthesis Vocoders (2024)2.26
- Bigvgan: A Universal Neural Vocoder With Large-scale Training (2022)6.17
- DSPGAN: A Gan-based Universal Vocoder For High-fidelity TTS By Time-frequency Domain Supervision From DSP (2022)9.03
- La-voce: Low-snr Audio-visual Speech Enhancement Using Neural Vocoders (2022)0.00