Vocgan: A High-fidelity Real-time Vocoder With A Hierarchically-nested Adversarial Network
2020 Β· Jinhyeok Yang, Junmo Lee, Youngik Kim, et al.
Abstract
We present a novel high-fidelity real-time neural vocoder called VocGAN. A recently developed GAN-based vocoder, MelGAN, produces speech waveforms in real-time. However, it often produces a waveform that is insufficient in quality or inconsistent with acoustic characteristics of the input mel spectrogram. VocGAN is nearly as fast as MelGAN, but it significantly improves the quality and consistency of the output waveform. VocGAN applies a multi-scale waveform generator and a hierarchically-nested discriminator to learn multiple levels of acoustic properties in a balanced way. It also applies the joint conditional and unconditional objective, which has shown successful results in high-resolution image synthesis. In experiments, VocGAN synthesizes speech waveforms 416.7x faster on a GTX 1080Ti GPU and 3.24x faster on a CPU than real-time. Compared with MelGAN, it also exhibits significantly improved quality in multiple evaluation metrics including mean opinion score (MOS) with minimal add
Authors
(none)
Tags
Stats
Related papers
- Stylemelgan: An Efficient High-fidelity Adversarial Vocoder With Temporal Adaptive Normalization (2020)13.05
- Vnet: A Gan-based Multi-tier Discriminator Network For Speech Synthesis Vocoders (2024)2.26
- Universal Melgan: A Robust Neural Vocoder For High-fidelity Waveform Generation In Multiple Domains (2020)0.00
- TFGAN: Time And Frequency Domain Based Generative Adversarial Network For High-fidelity Speech Synthesis (2020)0.00
- Bigvgan: A Universal Neural Vocoder With Large-scale Training (2022)6.17
- Framewise Wavegan: High Speed Adversarial Vocoder In Time Domain With Very Low Computational Complexity (2022)7.16
- DSPGAN: A Gan-based Universal Vocoder For High-fidelity TTS By Time-frequency Domain Supervision From DSP (2022)9.03
- Vocos: Closing The Gap Between Time-domain And Fourier-based Neural Vocoders For High-quality Audio Synthesis (2023)6.10