Reinforcement Learning Based Speech Enhancement For Robust Speech Recognition
2018 Β· Yih-Liang Shen, Chao-Yuan Huang, Syu-Siang Wang, et al.
Abstract
Conventional deep neural network (DNN)-based speech enhancement (SE) approaches aim to minimize the mean square error (MSE) between enhanced speech and clean reference. The MSE-optimized model may not directly improve the performance of an automatic speech recognition (ASR) system. If the target is to minimize the recognition error, the recognition results should be used to design the objective function for optimizing the SE model. However, the structure of an ASR system, which consists of multiple units, such as acoustic and language models, is usually complex and not differentiable. In this study, we proposed to adopt the reinforcement learning algorithm to optimize the SE model based on the recognition results. We evaluated the propsoed SE system on the Mandarin Chinese broadcast news corpus (MATBN). Experimental results demonstrate that the proposed method can effectively improve the ASR results with a notable 12.40% and 19.23% error rate reductions for signal to noise ratio at 0 d
Authors
(none)
Tags
Stats
Related papers
- Bridging The Gap: Integrating Pre-trained Speech Enhancement And Recognition Models For Robust Speech Recognition (2024)7.50
- Joint Training Of Speech Enhancement And Self-supervised Model For Noise-robust ASR (2022)0.00
- Learning To Enhance Or Not: Neural Network-based Switching Of Enhanced And Observed Signals For Overlapping Speech Recognition (2022)10.21
- Speaker Reinforcement Using Target Source Extraction For Robust Automatic Speech Recognition (2022)7.50
- Snri Target Training For Joint Speech Enhancement And Recognition (2021)8.82
- Monaural Speech Enhancement Using Deep Neural Networks By Maximizing A Short-time Objective Intelligibility Measure (2018)11.76
- Improving Noise Robust Automatic Speech Recognition With Single-channel Time-domain Enhancement Network (2020)13.88
- Human Listening And Live Captioning: Multi-task Training For Speech Enhancement (2021)9.92