Conditional Latent Diffusion-based Speech Enhancement Via Dual Context Learning
2025 Β· Shengkui Zhao, Zexu Pan, Kun Zhou, et al.
Abstract
Recently, the application of diffusion probabilistic models has advanced speech enhancement through generative approaches. However, existing diffusion-based methods have focused on the generation process in high-dimensional waveform or spectral domains, leading to increased generation complexity and slower inference speeds. Additionally, these methods have primarily modelled clean speech distributions, with limited exploration of noise distributions, thereby constraining the discriminative capability of diffusion models for speech enhancement. To address these issues, we propose a novel approach that integrates a conditional latent diffusion model (cLDM) with dual-context learning (DCL). Our method utilizes a variational autoencoder (VAE) to compress mel-spectrograms into a low-dimensional latent space. We then apply cLDM to transform the latent representations of both clean speech and background noise into Gaussian noise by the DCL process, and a parameterized model is trained to reve
Authors
(none)
Tags
Stats
Related papers
- Speech Enhancement And Dereverberation With Diffusion-based Generative Models (2022)23.51
- Single And Few-step Diffusion For Generative Speech Enhancement (2023)10.21
- GALD-SE: Guided Anisotropic Lightweight Diffusion For Efficient Speech Enhancement (2024)3.58
- Extract And Diffuse: Latent Integration For Improved Diffusion-based Speech And Vocal Enhancement (2024)0.00
- Investigating The Design Space Of Diffusion Models For Speech Enhancement (2023)10.07
- Noise-aware Speech Enhancement Using Diffusion Probabilistic Model (2023)8.82
- Minimally-supervised Speech Synthesis With Conditional Diffusion Model And Language Model: A Comparative Study Of Semantic Coding (2023)8.82
- Cold Diffusion For Speech Enhancement (2022)11.85