BDDM: Bilateral Denoising Diffusion Models For Fast And High-quality Speech Synthesis
2022 Β· Max W. Y. Lam, Jun Wang, Dan Su, et al.
Abstract
Diffusion probabilistic models (DPMs) and their extensions have emerged as competitive generative models yet confront challenges of efficient sampling. We propose a new bilateral denoising diffusion model (BDDM) that parameterizes both the forward and reverse processes with a schedule network and a score network, which can train with a novel bilateral modeling objective. We show that the new surrogate objective can achieve a lower bound of the log marginal likelihood tighter than a conventional surrogate. We also find that BDDM allows inheriting pre-trained score network parameters from any DPMs and consequently enables speedy and stable learning of the schedule network and optimization of a noise schedule for sampling. Our experiments demonstrate that BDDMs can generate high-fidelity audio samples with as few as three sampling steps. Moreover, compared to other state-of-the-art diffusion-based neural vocoders, BDDMs produce comparable or higher quality samples indistinguishable from h
Authors
(none)
Tags
Stats
Related papers
- Fastdiff: A Fast Conditional Diffusion Model For High-quality Speech Synthesis (2022)14.35
- Adversarial Training Of Denoising Diffusion Model Using Dual Discriminators For High-fidelity Multi-speaker TTS (2023)2.26
- Prodiff: Progressive Fast Diffusion Model For High-quality Text-to-speech (2022)0.00
- Diffgan-tts: High-fidelity And Efficient Text-to-speech With Denoising Diffusion Gans (2022)0.00
- Speaking In Wavelet Domain: A Simple And Efficient Approach To Speed Up Speech Diffusion Model (2024)5.24
- Resgrad: Residual Denoising Diffusion Probabilistic Models For Text To Speech (2022)0.00
- Single And Few-step Diffusion For Generative Speech Enhancement (2023)10.21
- Speech Enhancement And Dereverberation With Diffusion-based Generative Models (2022)23.51