Scdiar: A Streaming Diarization System Based On Speaker Change Detection And Speech Recognition
2025 Β· Naijun Zheng, Xucheng Wan, Kai Liu, et al.
Abstract
In hours-long meeting scenarios, real-time speech stream often struggles with achieving accurate speaker diarization, commonly leading to speaker identification and speaker count errors. To address this challenge, we propose SCDiar, a system that operates on speech segments, split at the token level by a speaker change detection (SCD) module. Building on these segments, we introduce several enhancements to efficiently select the best available segment for each speaker. These improvements lead to significant gains across various benchmarks. Notably, on real-world meeting data involving more than ten participants, SCDiar outperforms previous systems by up to 53.6% in accuracy, substantially narrowing the performance gap between online and offline systems.
Authors
(none)
Tags
Stats
Related papers
- Integration Of Speech Separation, Diarization, And Recognition For Multi-speaker Meetings: System Description, Comparison, And Analysis (2020)13.23
- Once More Diarization: Improving Meeting Transcription Systems Through Segment-level Speaker Reassignment (2024)5.24
- Diarist: Streaming Speech Translation With Speaker Diarization (2023)0.00
- Exploring Speaker-related Information In Spoken Language Understanding For Better Speaker Diarization (2023)0.00
- Sequence-to-sequence Neural Diarization With Automatic Speaker Detection And Representation (2024)6.34
- Multi-microphone Automatic Speech Segmentation In Meetings Based On Circular Harmonics Features (2023)0.00
- One Model To Rule Them All ? Towards End-to-end Joint Speaker Diarization And Speech Recognition (2023)9.59
- Enhancements For Audio-only Diarization Systems (2019)0.00