Speaker Reinforcement Using Target Source Extraction For Robust Automatic Speech Recognition
2022 Β· Catalin Zorila, Rama Doddipatla
Abstract
Improving the accuracy of single-channel automatic speech recognition (ASR) in noisy conditions is challenging. Strong speech enhancement front-ends are available, however, they typically require that the ASR model is retrained to cope with the processing artifacts. In this paper we explore a speaker reinforcement strategy for improving recognition performance without retraining the acoustic model (AM). This is achieved by remixing the enhanced signal with the unprocessed input to alleviate the processing artifacts. We evaluate the proposed approach using a DNN speaker extraction based speech denoiser trained with a perceptually motivated loss function. Results show that (without AM retraining) our method yields about 23% and 25% relative accuracy gains compared with the unprocessed for the monoaural simulated and real CHiME-4 evaluation sets, respectively, and outperforms a state-of-the-art reference method.
Authors
(none)
Tags
Stats
Related papers
- Improving Noise Robust Automatic Speech Recognition With Single-channel Time-domain Enhancement Network (2020)13.88
- Time-domain Speech Enhancement For Robust Automatic Speech Recognition (2022)7.16
- On Monoaural Speech Enhancement For Automatic Recognition Of Real Noisy Speech Using Mixture Invariant Training (2022)4.52
- Investigation Of Monaural Front-end Processing For Robust ASR Without Retraining Or Joint-training (2018)0.00
- Analysis Of DNN Speech Signal Enhancement For Robust Speaker Recognition (2018)11.39
- A Two-stage Speaker Extraction Algorithm Under Adverse Acoustic Conditions Using A Single-microphone (2023)0.00
- On The Use Of DNN Autoencoder For Robust Speaker Recognition (2018)0.00
- Towards Decoupling Frontend Enhancement And Backend Recognition In Monaural Robust ASR (2024)4.52