On Monoaural Speech Enhancement For Automatic Recognition Of Real Noisy Speech Using Mixture Invariant Training
2022 Β· Jisi Zhang, Catalin Zorila, Rama Doddipatla, et al.
Abstract
In this paper, we explore an improved framework to train a monoaural neural enhancement model for robust speech recognition. The designed training framework extends the existing mixture invariant training criterion to exploit both unpaired clean speech and real noisy data. It is found that the unpaired clean speech is crucial to improve quality of separated speech from real noisy speech. The proposed method also performs remixing of processed and unprocessed signals to alleviate the processing artifacts. Experiments on the single-channel CHiME-3 real test sets show that the proposed method improves significantly in terms of speech recognition performance over the enhancement system trained either on the mismatched simulated data in a supervised fashion or on the matched real data in an unsupervised fashion. Between 16% and 39% relative WER reduction has been achieved by the proposed system compared to the unprocessed signal using end-to-end and hybrid acoustic models without retraining
Authors
(none)
Tags
Stats
Related papers
- Speaker Reinforcement Using Target Source Extraction For Robust Automatic Speech Recognition (2022)7.50
- Investigation Of Monaural Front-end Processing For Robust ASR Without Retraining Or Joint-training (2018)0.00
- Superm2m: Supervised And Mixture-to-mixture Co-learning For Speech Enhancement And Noise-robust ASR (2024)5.24
- Improving Noise Robust Automatic Speech Recognition With Single-channel Time-domain Enhancement Network (2020)13.88
- Bridging The Gap Between Monaural Speech Enhancement And Recognition With Distortion-independent Acoustic Modeling (2019)7.50
- Supervised And Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization (2017)18.80
- Unsupervised Multi-channel Separation And Adaptation (2023)4.52
- Unsupervised Speech Enhancement Based On Multichannel Nmf-informed Beamforming For Noise-robust Automatic Speech Recognition (2019)13.23