Enhancing Speaker Diarization With Large Language Models: A Contextual Beam Search Approach
2023 Β· Tae Jin Park, Kunal Dhawan, Nithin Koluguri, et al.
Abstract
Large language models (LLMs) have shown great promise for capturing contextual information in natural language processing tasks. We propose a novel approach to speaker diarization that incorporates the prowess of LLMs to exploit contextual cues in human dialogues. Our method builds upon an acoustic-based speaker diarization system by adding lexical information from an LLM in the inference stage. We model the multi-modal decoding process probabilistically and perform joint acoustic and lexical beam search to incorporate cues from both modalities: audio and text. Our experiments demonstrate that infusing lexical knowledge from the LLM into an acoustics-only diarization system improves overall speaker-attributed word error rate (SA-WER). The experimental results show that LLMs can provide complementary information to acoustic models for the speaker diarization task via proposed beam search decoding approach showing up to 39.8% relative delta-SA-WER improvement from the baseline system. Th
Authors
(none)
Tags
Stats
Related papers
- Diarizationlm: Speaker Diarization Post-processing With Large Language Models (2024)10.21
- Llm-based Speaker Diarization Correction: A Generalizable Approach (2024)7.16
- End-to-end Speech Recognition Contextualization With Large Language Models (2023)0.00
- SEAL: Speaker Error Correction Using Acoustic-conditioned Large Language Models (2025)0.00
- Lexical Speaker Error Correction: Leveraging Language Models For Speaker Diarization Error Correction (2023)0.00
- Enhancing Large Language Model-based Speech Recognition By Contextualization For Rare And Ambiguous Words (2024)0.00
- Speakerlm: End-to-end Versatile Speaker Diarization And Recognition With Multimodal Large Language Models (2025)5.24
- Latent Class Model With Application To Speaker Diarization (2019)3.58