Single And Few-step Diffusion For Generative Speech Enhancement
2023 Β· Bunlong Lay, Jean-Marie Lemercier, Julius Richter, et al.
Abstract
Diffusion models have shown promising results in speech enhancement, using a task-adapted diffusion process for the conditional generation of clean speech given a noisy mixture. However, at test time, the neural network used for score estimation is called multiple times to solve the iterative reverse process. This results in a slow inference process and causes discretization errors that accumulate over the sampling trajectory. In this paper, we address these limitations through a two-stage training approach. In the first stage, we train the diffusion model the usual way using the generative denoising score matching loss. In the second stage, we compute the enhanced signal by solving the reverse process and compare the resulting estimate to the clean speech target using a predictive loss. We show that using this second training stage enables achieving the same performance as the baseline model using only 5 function evaluations instead of 60 function evaluations. While the performance of
Authors
(none)
Tags
Stats
Related papers
- Speech Enhancement And Dereverberation With Diffusion-based Generative Models (2022)23.51
- Storm: A Diffusion-based Stochastic Regeneration Model For Speech Enhancement And Dereverberation (2022)15.43
- Adversarial Training Of Denoising Diffusion Model Using Dual Discriminators For High-fidelity Multi-speaker TTS (2023)2.26
- Diffusion-based Speech Enhancement With A Weighted Generative-supervised Learning Loss (2023)0.00
- Extract And Diffuse: Latent Integration For Improved Diffusion-based Speech And Vocal Enhancement (2024)0.00
- Investigating The Design Space Of Diffusion Models For Speech Enhancement (2023)10.07
- GALD-SE: Guided Anisotropic Lightweight Diffusion For Efficient Speech Enhancement (2024)3.58
- Fastdiff: A Fast Conditional Diffusion Model For High-quality Speech Synthesis (2022)14.35