A Conformer-based ASR Frontend For Joint Acoustic Echo Cancellation, Speech Enhancement And Speech Separation
2021 Β· Tom O'Malley, Arun Narayanan, Quan Wang, et al.
Abstract
We present a frontend for improving robustness of automatic speech recognition (ASR), that jointly implements three modules within a single model: acoustic echo cancellation, speech enhancement, and speech separation. This is achieved by using a contextual enhancement neural network that can optionally make use of different types of side inputs: (1) a reference signal of the playback audio, which is necessary for echo cancellation; (2) a noise context, which is useful for speech enhancement; and (3) an embedding vector representing the voice characteristic of the target speaker of interest, which is not only critical in speech separation, but also helpful for echo cancellation and speech enhancement. We present detailed evaluations to show that the joint model performs almost as well as the task-specific models, and significantly reduces word error rate in noisy conditions even when using a large-scale state-of-the-art ASR model. Compared to the noisy baseline, the joint model reduces
Authors
(none)
Tags
Stats
Related papers
- A Universally-deployable ASR Frontend For Joint Acoustic Echo Cancellation, Speech Enhancement, And Voice Separation (2022)5.84
- Towards Decoupling Frontend Enhancement And Backend Recognition In Monaural Robust ASR (2024)4.52
- On The Efficacy And Noise-robustness Of Jointly Learned Speech Emotion And Automatic Speech Recognition (2023)3.58
- Elevating Robust Multi-talker ASR By Decoupling Speaker Separation And Speech Recognition (2025)0.00
- Real-time Joint Personalized Speech Enhancement And Acoustic Echo Cancellation (2022)4.52
- Investigation Of Monaural Front-end Processing For Robust ASR Without Retraining Or Joint-training (2018)0.00
- Snri Target Training For Joint Speech Enhancement And Recognition (2021)8.82
- Neuralecho: A Self-attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement (2022)0.00