Usee: Unified Speech Enhancement And Editing With Conditional Diffusion Models
2023 Β· Muqiao Yang, Chunlei Zhang, Yong Xu, et al.
Abstract
Speech enhancement aims to improve the quality of speech signals in terms of quality and intelligibility, and speech editing refers to the process of editing the speech according to specific user needs. In this paper, we propose a Unified Speech Enhancement and Editing (uSee) model with conditional diffusion models to handle various tasks at the same time in a generative manner. Specifically, by providing multiple types of conditions including self-supervised learning embeddings and proper text prompts to the score-based diffusion model, we can enable controllable generation of the unified speech enhancement and editing model to perform corresponding actions on the source speech. Our experiments show that our proposed uSee model can achieve superior performance in both speech denoising and dereverberation compared to other related generative speech enhancement models, and can perform speech editing given desired environmental sound text description, signal-to-noise ratios (SNR), and ro
Authors
(none)
Tags
Stats
Related papers
- Speech Enhancement And Dereverberation With Diffusion-based Generative Models (2022)23.51
- Noise-aware Speech Enhancement Using Diffusion Probabilistic Model (2023)8.82
- Single And Few-step Diffusion For Generative Speech Enhancement (2023)10.21
- Sense: Semantic-aware High-fidelity Universal Speech Enhancement (2025)3.85
- Gdiffuse: Diffusion-based Speech Enhancement With Noise Model Guidance (2025)0.00
- Investigating The Design Space Of Diffusion Models For Speech Enhancement (2023)10.07
- Extract And Diffuse: Latent Integration For Improved Diffusion-based Speech And Vocal Enhancement (2024)0.00
- Storm: A Diffusion-based Stochastic Regeneration Model For Speech Enhancement And Dereverberation (2022)15.43