Bridging The Gap: Integrating Pre-trained Speech Enhancement And Recognition Models For Robust Speech Recognition
2024 Β· Kuan-Chen Wang, You-Jin Li, Wei-Lun Chen, et al.
Abstract
Noise robustness is critical when applying automatic speech recognition (ASR) in real-world scenarios. One solution involves the used of speech enhancement (SE) models as the front end of ASR. However, neural network-based (NN-based) SE often introduces artifacts into the enhanced signals and harms ASR performance, particularly when SE and ASR are independently trained. Therefore, this study introduces a simple yet effective SE post-processing technique to address the gap between various pre-trained SE and ASR models. A bridge module, which is a lightweight NN, is proposed to evaluate the signal-level information of the speech signal. Subsequently, using the signal-level information, the observation addition technique is applied to effectively reduce the shortcomings of SE. The experimental results demonstrate the success of our method in integrating diverse pre-trained SE and ASR models, considerably boosting the ASR robustness. Crucially, no prior knowledge of the ASR or speech conte
Authors
(none)
Tags
Stats
Related papers
- Reducing The Gap Between Pretrained Speech Enhancement And Recognition Models Using A Real Speech-trained Bridging Module (2025)2.26
- Joint Training Of Speech Enhancement And Self-supervised Model For Noise-robust ASR (2022)0.00
- Reinforcement Learning Based Speech Enhancement For Robust Speech Recognition (2018)11.08
- Learning To Enhance Or Not: Neural Network-based Switching Of Enhanced And Observed Signals For Overlapping Speech Recognition (2022)10.21
- Towards Decoupling Frontend Enhancement And Backend Recognition In Monaural Robust ASR (2024)4.52
- How Does End-to-end Speech Recognition Training Impact Speech Enhancement Artifacts? (2023)7.50
- Robust Speech Recognition With Schr\"odinger Bridge-based Speech Enhancement (2025)2.26
- Snri Target Training For Joint Speech Enhancement And Recognition (2021)8.82