Vnet: A Gan-based Multi-tier Discriminator Network For Speech Synthesis Vocoders
2024 Β· Yubing Cao, Yongming Li, Liejun Wang, et al.
Abstract
Since the introduction of Generative Adversarial Networks (GANs) in speech synthesis, remarkable achievements have been attained. In a thorough exploration of vocoders, it has been discovered that audio waveforms can be generated at speeds exceeding real-time while maintaining high fidelity, achieved through the utilization of GAN-based models. Typically, the inputs to the vocoder consist of band-limited spectral information, which inevitably sacrifices high-frequency details. To address this, we adopt the full-band Mel spectrogram information as input, aiming to provide the vocoder with the most comprehensive information possible. However, previous studies have revealed that the use of full-band spectral information as input can result in the issue of over-smoothing, compromising the naturalness of the synthesized speech. To tackle this challenge, we propose VNet, a GAN-based neural vocoder network that incorporates full-band spectral information and introduces a Multi-Tier Discrimina
Authors
(none)
Tags
Stats
Related papers
- Vocgan: A High-fidelity Real-time Vocoder With A Hierarchically-nested Adversarial Network (2020)12.54
- Univnet: A Neural Vocoder With Multi-resolution Spectrogram Discriminators For High-fidelity Waveform Generation (2021)14.80
- A Multi-scale Time-frequency Spectrogram Discriminator For Gan-based Non-autoregressive TTS (2022)6.77
- Wave-u-net Discriminator: Fast And Lightweight Discriminator For Generative Adversarial Network-based Speech Synthesis (2023)6.34
- Avocodo: Generative Adversarial Network For Artifact-free Vocoder (2022)9.41
- Bigvgan: A Universal Neural Vocoder With Large-scale Training (2022)6.17
- TFGAN: Time And Frequency Domain Based Generative Adversarial Network For High-fidelity Speech Synthesis (2020)0.00
- DSPGAN: A Gan-based Universal Vocoder For High-fidelity TTS By Time-frequency Domain Supervision From DSP (2022)9.03