FNSE-SBGAN: Far-field Speech Enhancement With Schrodinger Bridge And Generative Adversarial Networks
2025 Β· Tong Lei, Qinwen Hu, Ziyao Lin, et al.
Abstract
The prevailing method for neural speech enhancement predominantly utilizes fully-supervised deep learning with simulated pairs of far-field noisy-reverberant speech and clean speech. Nonetheless, these models frequently demonstrate restricted generalizability to mixtures recorded in real-world conditions. To address this issue, this study investigates training enhancement models directly on real mixtures. Specifically, we revisit the single-channel far-field to near-field speech enhancement (FNSE) task, focusing on real-world data characterized by low signal-to-noise ratio (SNR), high reverberation, and mid-to-high frequency attenuation. We propose FNSE-SBGAN, a framework that integrates a Schrodinger Bridge (SB)-based diffusion model with generative adversarial networks (GANs). Our approach achieves state-of-the-art performance across various metrics and subjective evaluations, significantly reducing the character error rate (CER) by up to 14.58% compared to far-field signals. Experim
Authors
(none)
Tags
Stats
Related papers
- Diffusion-based Speech Enhancement With Schr\"odinger Bridge And Symmetric Noise Schedule (2024)0.00
- SEGAN: Speech Enhancement Generative Adversarial Network (2017)21.85
- Robust Speech Recognition With Schr\"odinger Bridge-based Speech Enhancement (2025)2.26
- SEFGAN: Harvesting The Power Of Normalizing Flows And Gans For Efficient High-quality Speech Enhancement (2023)5.84
- Investigating Training Objectives For Generative Speech Enhancement (2024)9.76
- Towards Generalized Speech Enhancement With Generative Adversarial Networks (2019)10.35
- Conditional Generative Adversarial Networks For Speech Enhancement And Noise-robust Speaker Verification (2017)16.03
- Single Channel Far Field Feature Enhancement For Speaker Verification In The Wild (2020)0.00