Elevating Robust Multi-talker ASR By Decoupling Speaker Separation And Speech Recognition
2025 Β· Yufeng Yang, Hassan Taherian, Vahid Ahmadi Kalkhorani, et al.
Abstract
Despite the tremendous success of automatic speech recognition (ASR) with the introduction of deep learning, its performance is still unsatisfactory in many real-world multi-talker scenarios. Speaker separation excels in separating individual talkers but, as a frontend, it introduces processing artifacts that degrade the ASR backend trained on clean speech. As a result, mainstream robust ASR systems train the backend on noisy speech to avoid processing artifacts. In this work, we propose to decouple the training of the speaker separation frontend and the ASR backend, with the latter trained on clean speech only. Our decoupled system achieves 5.1% word error rates (WER) on the Libri2Mix dev/test sets, significantly outperforming other multi-talker ASR baselines. Its effectiveness is also demonstrated with the state-of-the-art 7.60%/5.74% WERs on 1-ch and 6-ch SMS-WSJ. Furthermore, on recorded LibriCSS, we achieve the speaker-attributed WER of 2.92%. These state-of-the-art results sugges
Authors
(none)
Tags
Stats
Related papers
- A Sidecar Separator Can Convert A Single-talker Speech Recognition System To A Multi-talker One (2023)9.03
- Towards Decoupling Frontend Enhancement And Backend Recognition In Monaural Robust ASR (2024)4.52
- Integration Of Speech Separation, Diarization, And Recognition For Multi-speaker Meetings: System Description, Comparison, And Analysis (2020)13.23
- Investigation Of Practical Aspects Of Single Channel Speech Separation For ASR (2021)7.81
- End-to-end Dereverberation, Beamforming, And Speech Recognition With Improved Numerical Stability And Advanced Frontend (2021)10.97
- Streaming Multi-speaker ASR With RNN-T (2020)10.07
- Unified Modeling Of Multi-talker Overlapped Speech Recognition And Diarization With A Sidecar Separator (2023)7.50
- End-to-end Monaural Multi-speaker ASR System Without Pretraining (2018)11.93