Snakegan: A Universal Vocoder Leveraging DDSP Prior Knowledge And Periodic Inductive Bias
2023 Β· Sipan Li, Songxiang Liu, Luwen Zhang, et al.
Abstract
Generative adversarial network (GAN)-based neural vocoders have been widely used in audio synthesis tasks due to their high generation quality, efficient inference, and small computation footprint. However, it is still challenging to train a universal vocoder which can generalize well to out-of-domain (OOD) scenarios, such as unseen speaking styles, non-speech vocalization, singing, and musical pieces. In this work, we propose SnakeGAN, a GAN-based universal vocoder, which can synthesize high-fidelity audio in various OOD scenarios. SnakeGAN takes a coarse-grained signal generated by a differentiable digital signal processing (DDSP) model as prior knowledge, aiming at recovering high-fidelity waveform from a Mel-spectrogram. We introduce periodic nonlinearities through the Snake activation function and anti-aliased representation into the generator, which further brings the desired inductive bias for audio synthesis and significantly improves the extrapolation capacity for universal vo
Authors
(none)
Tags
Stats
Related papers
- Bigvgan: A Universal Neural Vocoder With Large-scale Training (2022)6.17
- DSPGAN: A Gan-based Universal Vocoder For High-fidelity TTS By Time-frequency Domain Supervision From DSP (2022)9.03
- Vocgan: A High-fidelity Real-time Vocoder With A Hierarchically-nested Adversarial Network (2020)12.54
- Vnet: A Gan-based Multi-tier Discriminator Network For Speech Synthesis Vocoders (2024)2.26
- Bemaganv2: Discriminator Combination Strategies For Gan-based Vocoders In Long-term Audio Generation (2025)2.68
- Avocodo: Generative Adversarial Network For Artifact-free Vocoder (2022)9.41
- Singgan: Generative Adversarial Network For High-fidelity Singing Voice Generation (2021)10.61
- Wgansing: A Multi-voice Singing Voice Synthesizer Based On The Wasserstein-gan (2019)11.08