Ddsp-based Singing Vocoders: A New Subtractive-based Synthesizer And A Comprehensive Evaluation
2022 Β· da-Yi Wu, Wen-Yi Hsiao, Fu-Rong Yang, et al.
Abstract
A vocoder is a conditional audio generation model that converts acoustic features such as mel-spectrograms into waveforms. Taking inspiration from Differentiable Digital Signal Processing (DDSP), we propose a new vocoder named SawSing for singing voices. SawSing synthesizes the harmonic part of singing voices by filtering a sawtooth source signal with a linear time-variant finite impulse response filter whose coefficients are estimated from the input mel-spectrogram by a neural network. As this approach enforces phase continuity, SawSing can generate singing voices without the phase-discontinuity glitch of many existing vocoders. Moreover, the source-filter assumption provides an inductive bias that allows SawSing to be trained on a small amount of data. Our experiments show that SawSing converges much faster and outperforms state-of-the-art generative adversarial network and diffusion-based vocoders in a resource-limited scenario with only 3 training recordings and a 3-hour training t
Authors
(none)
Tags
Stats
Related papers
- Xiaoicesing 2: A High-fidelity Singing Voice Synthesizer Based On Generative Adversarial Network (2022)0.00
- Diffsinger: Singing Voice Synthesis Via Shallow Diffusion Mechanism (2021)23.76
- Instructsing: High-fidelity Singing Voice Generation Via Instructing Yourself (2024)0.00
- Singgan: Generative Adversarial Network For High-fidelity Singing Voice Generation (2021)10.61
- Adversarially Trained Multi-singer Sequence-to-sequence Singing Synthesizer (2020)7.81
- Mandarin Singing Voice Synthesis With Denoising Diffusion Probabilistic Wasserstein GAN (2022)6.34
- Bytesing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-decoder Acoustic Models And Wavernn Vocoders (2020)11.49
- Sifisinger: A High-fidelity End-to-end Singing Voice Synthesizer Based On Source-filter Model (2024)4.52