Integrating Audio, Visual, And Semantic Information For Enhanced Multimodal Speaker Diarization
2024 Β· Luyao Cheng, Hui Wang, Siqi Zheng, et al.
Abstract
Speaker diarization, the process of segmenting an audio stream or transcribed speech content into homogenous partitions based on speaker identity, plays a crucial role in the interpretation and analysis of human speech. Most existing speaker diarization systems rely exclusively on unimodal acoustic information, making the task particularly challenging due to the innate ambiguities of audio signals. Recent studies have made tremendous efforts towards audio-visual or audio-semantic modeling to enhance performance. However, even the incorporation of up to two modalities often falls short in addressing the complexities of spontaneous and unstructured conversations. To exploit more meaningful dialogue patterns, we propose a novel multimodal approach that jointly utilizes audio, visual, and semantic cues to enhance speaker diarization. Our method elegantly formulates the multimodal modeling as a constrained optimization problem. First, we build insights into the visual connections among acti
Authors
(none)
Tags
Stats
Related papers
- Audio-visual Speaker Diarization Based On Spatiotemporal Bayesian Fusion (2016)14.51
- Exploring Speaker-related Information In Spoken Language Understanding For Better Speaker Diarization (2023)0.00
- Multi-input Multi-output Target-speaker Voice Activity Detection For Unified, Flexible, And Robust Audio-visual Speaker Diarization (2024)0.00
- 3d-speaker-toolkit: An Open-source Toolkit For Multimodal Speaker Verification And Diarization (2024)6.93
- Multimodal Speaker Segmentation And Diarization Using Lexical And Acoustic Cues Via Sequence To Sequence Neural Networks (2018)9.92
- Late Audio-visual Fusion For In-the-wild Speaker Diarization (2022)3.58
- Improving Speaker Diarization Using Semantic Information: Joint Pairwise Constraints Propagation (2023)0.00
- Incorporating Spatial Cues In Modular Speaker Diarization For Multi-channel Multi-party Meetings (2024)4.52