Storm: A Diffusion-based Stochastic Regeneration Model For Speech Enhancement And Dereverberation
2022 Β· Jean-Marie Lemercier, Julius Richter, Simon Welker, et al.
Abstract
Diffusion models have shown a great ability at bridging the performance gap between predictive and generative approaches for speech enhancement. We have shown that they may even outperform their predictive counterparts for non-additive corruption types or when they are evaluated on mismatched conditions. However, diffusion models suffer from a high computational burden, mainly as they require to run a neural network for each reverse diffusion step, whereas predictive approaches only require one pass. As diffusion models are generative approaches they may also produce vocalizing and breathing artifacts in adverse conditions. In comparison, in such difficult scenarios, predictive models typically do not produce such artifacts but tend to distort the target speech instead, thereby degrading the speech quality. In this work, we present a stochastic regeneration approach where an estimate given by a predictive model is provided as a guide for further diffusion. We show that the proposed app
Authors
(none)
Tags
Stats
Related papers
- Speech Enhancement And Dereverberation With Diffusion-based Generative Models (2022)23.51
- Investigating The Design Space Of Diffusion Models For Speech Enhancement (2023)10.07
- Single And Few-step Diffusion For Generative Speech Enhancement (2023)10.21
- Analysing Diffusion-based Generative Approaches Versus Discriminative Approaches For Speech Restoration (2022)11.39
- Thunder : Unified Regression-diffusion Speech Enhancement With A Single Reverse Step Using Brownian Bridge (2024)6.34
- Cold Diffusion For Speech Enhancement (2022)11.85
- Diffar: Denoising Diffusion Autoregressive Model For Raw Speech Waveform Generation (2023)0.00
- Extract And Diffuse: Latent Integration For Improved Diffusion-based Speech And Vocal Enhancement (2024)0.00