Toward Universal Speech Enhancement For Diverse Input Conditions
2023 Β· Wangyou Zhang, Kohei Saijo, Zhong-Qiu Wang, et al.
Abstract
The past decade has witnessed substantial growth of data-driven speech enhancement (SE) techniques thanks to deep learning. While existing approaches have shown impressive performance in some common datasets, most of them are designed only for a single condition (e.g., single-channel, multi-channel, or a fixed sampling frequency) or only consider a single task (e.g., denoising or dereverberation). Currently, there is no universal SE approach that can effectively handle diverse input conditions with a single model. In this paper, we make the first attempt to investigate this line of research. First, we devise a single SE model that is independent of microphone channels, signal lengths, and sampling frequencies. Second, we design a universal SE benchmark by combining existing public corpora with multiple conditions. Our experiments on a wide range of datasets show that the proposed single model can successfully handle diverse conditions with strong performance.
Authors
(none)
Tags
Stats
Related papers
- Beyond Performance Plateaus: A Comprehensive Study On Scalability In Speech Enhancement (2024)7.81
- Human Listening And Live Captioning: Multi-task Training For Speech Enhancement (2021)9.92
- A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, And Extraction (2023)7.16
- Sense: Semantic-aware High-fidelity Universal Speech Enhancement (2025)3.85
- Exploring The Potential Of Data-driven Spatial Audio Enhancement Using A Single-channel Model (2024)0.00
- Lisennet: Lightweight Sub-band And Dual-path Modeling For Real-time Speech Enhancement (2024)9.03
- Time-domain Multi-modal Bone/air Conducted Speech Enhancement (2019)12.99
- Plugin Speech Enhancement: A Universal Speech Enhancement Framework Inspired By Dynamic Neural Network (2024)0.00