A Comparative Evaluation Of Deep Learning Models For Speech Enhancement In Real-world Noisy Environments
2025 Β· Md Jahangir Alam Khondkar, Ajan Ahmed, Stephanie Schuckers, et al.
Abstract
Speech enhancement, particularly denoising, is vital in improving the intelligibility and quality of speech signals for real-world applications, especially in noisy environments. While prior research has introduced various deep learning models for this purpose, many struggle to balance noise suppression, perceptual quality, and speaker-specific feature preservation, leaving a critical research gap in their comparative performance evaluation. This study benchmarks three state-of-the-art models Wave-U-Net, CMGAN, and U-Net, on diverse datasets such as SpEAR, VPQAD, and Clarkson datasets. These models were chosen due to their relevance in the literature and code accessibility. The evaluation reveals that U-Net achieves high noise suppression with SNR improvements of +71.96% on SpEAR, +64.83% on VPQAD, and +364.2% on the Clarkson dataset. CMGAN outperforms in perceptual quality, attaining the highest PESQ scores of 4.04 on SpEAR and 1.46 on VPQAD, making it well-suited for applications pri
Authors
(none)
Tags
Stats
Related papers
- Towards Speech Enhancement Using A Variational U-net Architecture (2020)7.81
- Perceptual Loss Based Speech Denoising With An Ensemble Of Audio Pattern Recognition And Self-supervised Models (2020)10.21
- Unetgan: A Robust Speech Enhancement Approach In Time Domain For Extremely Low Signal-to-noise Ratio Condition (2020)11.49
- Improved Speech Enhancement With The Wave-u-net (2018)0.00
- Single-channel Speech Enhancement With Deep Complex U-networks And Probabilistic Latent Space Models (2023)5.24
- Metricgan-u: Unsupervised Speech Enhancement/ Dereverberation Based Only On Noisy/ Reverberated Speech (2021)11.67
- Phase-aware Speech Enhancement With Deep Complex U-net (2019)0.00
- Effect Of Noise Suppression Losses On Speech Distortion And ASR Performance (2021)10.74