Specgrad: Diffusion Probabilistic Model Based Neural Vocoder With Adaptive Noise Spectral Shaping
2022 Β· Yuma Koizumi, Heiga Zen, Kohei Yatabe, et al.
Abstract
Neural vocoder using denoising diffusion probabilistic model (DDPM) has been improved by adaptation of the diffusion noise distribution to given acoustic features. In this study, we propose SpecGrad that adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram. This adaptation by time-varying filtering improves the sound quality especially in the high-frequency bands. It is processed in the time-frequency domain to keep the computational cost almost the same as the conventional DDPM-based neural vocoders. Experimental results showed that SpecGrad generates higher-fidelity speech waveform than conventional DDPM-based neural vocoders in both analysis-synthesis and speech enhancement scenarios. Audio demos are available at wavegrad.github.io/specgrad/.
Authors
(none)
Tags
Stats
Related papers
- Periodgrad: Towards Pitch-controllable Neural Vocoder Based On A Diffusion Probabilistic Model (2024)0.00
- Priorgrad: Improving Conditional Denoising Diffusion Models With Data-dependent Adaptive Prior (2021)0.00
- Resgrad: Residual Denoising Diffusion Probabilistic Models For Text To Speech (2022)0.00
- Infergrad: Improving Diffusion Models For Vocoder By Considering Inference In Training (2022)9.41
- Prodiff: Progressive Fast Diffusion Model For High-quality Text-to-speech (2022)0.00
- Specdiff-gan: A Spectrally-shaped Noise Diffusion GAN For Speech And Music Synthesis (2024)7.81
- BDDM: Bilateral Denoising Diffusion Models For Fast And High-quality Speech Synthesis (2022)4.76
- Diffar: Denoising Diffusion Autoregressive Model For Raw Speech Waveform Generation (2023)0.00