SEAL: Speaker Error Correction Using Acoustic-conditioned Large Language Models
2025 Β· Anurag Kumar, Rohit Paturi, Amber Afshan, et al.
Abstract
Speaker Diarization (SD) is a crucial component of modern end-to-end ASR pipelines. Traditional SD systems, which are typically audio-based and operate independently of ASR, often introduce speaker errors, particularly during speaker transitions and overlapping speech. Recently, language models including fine-tuned large language models (LLMs) have shown to be effective as a second-pass speaker error corrector by leveraging lexical context in the transcribed output. In this work, we introduce a novel acoustic conditioning approach to provide more fine-grained information from the acoustic diarizer to the LLM. We also show that a simpler constrained decoding strategy reduces LLM hallucinations, while avoiding complicated post-processing. Our approach significantly reduces the speaker error rates by 24-43% across Fisher, Callhome, and RT03-CTS datasets, compared to the first-pass Acoustic SD.
Authors
(none)
Tags
Stats
Related papers
- Lexical Speaker Error Correction: Leveraging Language Models For Speaker Diarization Error Correction (2023)0.00
- Llm-based Speaker Diarization Correction: A Generalizable Approach (2024)7.16
- Speakerlm: End-to-end Versatile Speaker Diarization And Recognition With Multimodal Large Language Models (2025)5.24
- Enhancing Speaker Diarization With Large Language Models: A Contextual Beam Search Approach (2023)7.50
- Diarizationlm: Speaker Diarization Post-processing With Large Language Models (2024)10.21
- Multi-stage Large Language Model Correction For Speech Recognition (2023)0.00
- Exploring Speaker-related Information In Spoken Language Understanding For Better Speaker Diarization (2023)0.00
- Large Language Model Guided Decoding For Self-supervised Speech Recognition (2025)0.00