Time-domain Speech Enhancement For Robust Automatic Speech Recognition
2022 Β· Yufeng Yang, Ashutosh Pandey, Deliang Wang
Abstract
It has been shown that the intelligibility of noisy speech can be improved by speech enhancement algorithms. However, speech enhancement has not been established as an effective frontend for robust automatic speech recognition (ASR) in noisy conditions compared to an ASR model trained on noisy speech directly. The divide between speech enhancement and ASR impedes the progress of robust ASR systems especially as speech enhancement has made big strides in recent years. In this work, we focus on eliminating this divide with an ARN (attentive recurrent network) based time-domain enhancement model. The proposed system fully decouples speech enhancement and an acoustic model trained only on clean speech. Results on the CHiME-2 corpus show that ARN enhanced speech translates to improved ASR results. The proposed system achieves \(6.28%\) average word error rate, outperforming the previous best by \(19.3%\) relatively.
Authors
(none)
Tags
Stats
Related papers
- Towards Decoupling Frontend Enhancement And Backend Recognition In Monaural Robust ASR (2024)4.52
- Improving Noise Robust Automatic Speech Recognition With Single-channel Time-domain Enhancement Network (2020)13.88
- Speaker Reinforcement Using Target Source Extraction For Robust Automatic Speech Recognition (2022)7.50
- Investigation Of Monaural Front-end Processing For Robust ASR Without Retraining Or Joint-training (2018)0.00
- Bridging The Gap: Integrating Pre-trained Speech Enhancement And Recognition Models For Robust Speech Recognition (2024)7.50
- Deep Learning Based Dereverberation Of Temporal Envelopesfor Robust Speech Recognition (2020)5.84
- On Monoaural Speech Enhancement For Automatic Recognition Of Real Noisy Speech Using Mixture Invariant Training (2022)4.52
- Unpaired Speech Enhancement By Acoustic And Adversarial Supervision For Speech Recognition (2018)10.21