Integration Of Speech Separation, Diarization, And Recognition For Multi-speaker Meetings: System Description, Comparison, And Analysis
2020 Β· Desh Raj, Pavel Denisov, Zhuo Chen, et al.
Abstract
Multi-speaker speech recognition of unsegmented recordings has diverse applications such as meeting transcription and automatic subtitle generation. With technical advances in systems dealing with speech separation, speaker diarization, and automatic speech recognition (ASR) in the last decade, it has become possible to build pipelines that achieve reasonable error rates on this task. In this paper, we propose an end-to-end modular system for the LibriCSS meeting data, which combines independently trained separation, diarization, and recognition components, in that order. We study the effect of different state-of-the-art methods at each stage of the pipeline, and report results using task-specific metrics like SDR and DER, as well as downstream WER. Experiments indicate that the problem of overlapping speech for diarization and ASR can be effectively mitigated with the presence of a well-trained separation module. Our best system achieves a speaker-attributed WER of 12.7%, which is clo
Authors
(none)
Tags
Stats
Related papers
- Meeting Recognition With Continuous Speech Separation And Transcription-supported Diarization (2023)6.77
- Elevating Robust Multi-talker ASR By Decoupling Speaker Separation And Speech Recognition (2025)0.00
- Royalflush Speaker Diarization System For ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (2022)0.00
- Simultaneous Diarization And Separation Of Meetings Through The Integration Of Statistical Mixture Models (2024)0.00
- Unified Modeling Of Multi-talker Overlapped Speech Recognition And Diarization With A Sidecar Separator (2023)7.50
- A Comparative Study On Speaker-attributed Automatic Speech Recognition In Multi-party Meetings (2022)8.09
- Scdiar: A Streaming Diarization System Based On Speaker Change Detection And Speech Recognition (2025)2.26
- A Comparative Study On Multichannel Speaker-attributed Automatic Speech Recognition In Multi-party Meetings (2022)5.24