Reducing The Gap Between Pretrained Speech Enhancement And Recognition Models Using A Real Speech-trained Bridging Module
2025 Β· Zhongjian Cui, Chenrui Cui, Tianrui Wang, et al.
Abstract
The information loss or distortion caused by single-channel speech enhancement (SE) harms the performance of automatic speech recognition (ASR). Observation addition (OA) is an effective post-processing method to improve ASR performance by balancing noisy and enhanced speech. Determining the OA coefficient is crucial. However, the currently supervised OA coefficient module, called the bridging module, only utilizes simulated noisy speech for training, which has a severe mismatch with real noisy speech. In this paper, we propose training strategies to train the bridging module with real noisy speech. First, DNSMOS is selected to evaluate the perceptual quality of real noisy speech with no need for the corresponding clean label to train the bridging module. Additional constraints during training are introduced to enhance the robustness of the bridging module further. Each utterance is evaluated by the ASR back-end using various OA coefficients to obtain the word error rates (WERs). The W
Authors
(none)
Tags
Stats
Related papers
- Bridging The Gap: Integrating Pre-trained Speech Enhancement And Recognition Models For Robust Speech Recognition (2024)7.50
- Bridging The Gap Between Monaural Speech Enhancement And Recognition With Distortion-independent Acoustic Modeling (2019)7.50
- Rethinking Processing Distortions: Disentangling The Impact Of Speech Enhancement Errors On Speech Recognition Performance (2024)8.35
- Joint Training Of Speech Enhancement And Self-supervised Model For Noise-robust ASR (2022)0.00
- Reinforcement Learning Based Speech Enhancement For Robust Speech Recognition (2018)11.08
- How Does End-to-end Speech Recognition Training Impact Speech Enhancement Artifacts? (2023)7.50
- How Bad Are Artifacts?: Analyzing The Impact Of Speech Enhancement Errors On ASR (2022)13.17
- Effect Of Noise Suppression Losses On Speech Distortion And ASR Performance (2021)10.74