Improving Noise Robust Automatic Speech Recognition With Single-channel Time-domain Enhancement Network
2020 Β· Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, et al.
Abstract
With the advent of deep learning, research on noise-robust automatic speech recognition (ASR) has progressed rapidly. However, ASR performance in noisy conditions of single-channel systems remains unsatisfactory. Indeed, most single-channel speech enhancement (SE) methods (denoising) have brought only limited performance gains over state-of-the-art ASR back-end trained on multi-condition training data. Recently, there has been much research on neural network-based SE methods working in the time-domain showing levels of performance never attained before. However, it has not been established whether the high enhancement performance achieved by such time-domain approaches could be translated into ASR. In this paper, we show that a single-channel time-domain denoising approach can significantly improve ASR performance, providing more than 30 % relative word error reduction over a strong ASR back-end on the real evaluation data of the single-channel track of the CHiME-4 dataset. These posit
Authors
(none)
Tags
Stats
Related papers
- Time-domain Speech Enhancement For Robust Automatic Speech Recognition (2022)7.16
- Speaker Reinforcement Using Target Source Extraction For Robust Automatic Speech Recognition (2022)7.50
- Towards Decoupling Frontend Enhancement And Backend Recognition In Monaural Robust ASR (2024)4.52
- Closing The Gap Between Time-domain Multi-channel Speech Enhancement On Real And Simulation Conditions (2021)8.82
- Bridging The Gap: Integrating Pre-trained Speech Enhancement And Recognition Models For Robust Speech Recognition (2024)7.50
- Reinforcement Learning Based Speech Enhancement For Robust Speech Recognition (2018)11.08
- Cross-domain Single-channel Speech Enhancement Model With Bi-projection Fusion Module For Noise-robust ASR (2021)8.09
- Rethinking Processing Distortions: Disentangling The Impact Of Speech Enhancement Errors On Speech Recognition Performance (2024)8.35