Lexical Speaker Error Correction: Leveraging Language Models For Speaker Diarization Error Correction
2023 Β· Rohit Paturi, Sundararajan Srinivasan, Xiang Li
Abstract
Speaker diarization (SD) is typically used with an automatic speech recognition (ASR) system to ascribe speaker labels to recognized words. The conventional approach reconciles outputs from independently optimized ASR and SD systems, where the SD system typically uses only acoustic information to identify the speakers in the audio stream. This approach can lead to speaker errors especially around speaker turns and regions of speaker overlap. In this paper, we propose a novel second-pass speaker error correction system using lexical information, leveraging the power of modern language models (LMs). Our experiments across multiple telephony datasets show that our approach is both effective and robust. Training and tuning only on the Fisher dataset, this error correction approach leads to relative word-level diarization error rate (WDER) reductions of 15-30% on three telephony datasets: RT03-CTS, Callhome American English and held-out portions of Fisher.
Authors
(none)
Tags
Stats
Related papers
- SEAL: Speaker Error Correction Using Acoustic-conditioned Large Language Models (2025)0.00
- Llm-based Speaker Diarization Correction: A Generalizable Approach (2024)7.16
- Diarizationlm: Speaker Diarization Post-processing With Large Language Models (2024)10.21
- Diacorrect: End-to-end Error Correction For Speaker Diarization (2022)0.00
- Enhancing Speaker Diarization With Large Language Models: A Contextual Beam Search Approach (2023)7.50
- Speaker Diarization With Lexical Information (2018)9.76
- Speakerlm: End-to-end Versatile Speaker Diarization And Recognition With Multimodal Large Language Models (2025)5.24
- Multimodal Speaker Segmentation And Diarization Using Lexical And Acoustic Cues Via Sequence To Sequence Neural Networks (2018)9.92