Infergrad: Improving Diffusion Models For Vocoder By Considering Inference In Training
2022 Β· Zehua Chen, Xu Tan, Ke Wang, et al.
Abstract
Denoising diffusion probabilistic models (diffusion models for short) require a large number of iterations in inference to achieve the generation quality that matches or surpasses the state-of-the-art generative models, which invariably results in slow inference speed. Previous approaches aim to optimize the choice of inference schedule over a few iterations to speed up inference. However, this results in reduced generation quality, mainly because the inference process is optimized separately, without jointly optimizing with the training process. In this paper, we propose InferGrad, a diffusion model for vocoder that incorporates inference process into training, to reduce the inference iterations while maintaining high generation quality. More specifically, during training, we generate data from random noise through a reverse process under inference schedules with a few iterations, and impose a loss to minimize the gap between the generated and ground-truth data samples. Then, unlike e
Authors
(none)
Tags
Stats
Related papers
- Priorgrad: Improving Conditional Denoising Diffusion Models With Data-dependent Adaptive Prior (2021)0.00
- Single And Few-step Diffusion For Generative Speech Enhancement (2023)10.21
- Specgrad: Diffusion Probabilistic Model Based Neural Vocoder With Adaptive Noise Spectral Shaping (2022)11.49
- Prodiff: Progressive Fast Diffusion Model For High-quality Text-to-speech (2022)0.00
- Resgrad: Residual Denoising Diffusion Probabilistic Models For Text To Speech (2022)0.00
- Fastdiff: A Fast Conditional Diffusion Model For High-quality Speech Synthesis (2022)14.35
- Periodgrad: Towards Pitch-controllable Neural Vocoder Based On A Diffusion Probabilistic Model (2024)0.00
- Undiff: Unsupervised Voice Restoration With Unconditional Diffusion Model (2023)5.24