Diffusion-based Signal Refiner For Speech Enhancement And Separation
2023 Β· Masato Hirano, Ryosuke Sawata, Naoki Murata, et al.
Abstract
Although recent speech processing technologies have achieved significant improvements in objective metrics, there still remains a gap in human perceptual quality. This paper proposes Diffiner, a novel solution that utilizes the powerful generative capability of diffusion models' prior distributions to address this fundamental issue. Diffiner leverages the probabilistic generative framework of diffusion models and learns natural prior distributions of clean speech to convert outputs from existing speech processing systems into perceptually natural high-quality audio. In contrast to conventional deterministic approaches, our method simultaneously analyzes both the original degraded speech and the pre-processed speech to accurately identify unnatural artifacts introduced during processing. Then, through the iterative sampling process of the diffusion model, these degraded portions are replaced with perceptually natural and high-quality speech segments. Experimental results indicate that D
Authors
(none)
Tags
Stats
Related papers
- Speech Enhancement And Dereverberation With Diffusion-based Generative Models (2022)23.51
- Investigating The Design Space Of Diffusion Models For Speech Enhancement (2023)10.07
- Extract And Diffuse: Latent Integration For Improved Diffusion-based Speech And Vocal Enhancement (2024)0.00
- Cold Diffusion For Speech Enhancement (2022)11.85
- Gdiffuse: Diffusion-based Speech Enhancement With Noise Model Guidance (2025)0.00
- Single And Few-step Diffusion For Generative Speech Enhancement (2023)10.21
- Storm: A Diffusion-based Stochastic Regeneration Model For Speech Enhancement And Dereverberation (2022)15.43
- GALD-SE: Guided Anisotropic Lightweight Diffusion For Efficient Speech Enhancement (2024)3.58