Multi-scale Speaker Diarization With Neural Affinity Score Fusion
2020 Β· Tae Jin Park, Manoj Kumar, Shrikanth Narayanan
Abstract
Identifying the identity of the speaker of short segments in human dialogue has been considered one of the most challenging problems in speech signal processing. Speaker representations of short speech segments tend to be unreliable, resulting in poor fidelity of speaker representations in tasks requiring speaker recognition. In this paper, we propose an unconventional method that tackles the trade-off between temporal resolution and the quality of the speaker representations. To find a set of weights that balance the scores from multiple temporal scales of segments, a neural affinity score fusion model is presented. Using the CALLHOME dataset, we show that our proposed multi-scale segmentation and integration approach can achieve a state-of-the-art diarization performance.
Authors
(none)
Tags
Stats
Related papers
- Multi-scale Speaker Embedding-based Graph Attention Networks For Speaker Diarisation (2021)8.35
- Multimodal Speaker Segmentation And Diarization Using Lexical And Acoustic Cues Via Sequence To Sequence Neural Networks (2018)9.92
- Multi-target Extractor And Detector For Unknown-number Speaker Diarization (2022)8.09
- Probabilistic Fusion And Calibration Of Neural Speaker Diarization Models (2025)0.00
- Audio-visual Speaker Diarization Based On Spatiotemporal Bayesian Fusion (2016)14.51
- Integrating Audio, Visual, And Semantic Information For Enhanced Multimodal Speaker Diarization (2024)0.00
- Leveraging Speaker Embeddings In End-to-end Neural Diarization For Two-speaker Scenarios (2024)0.00
- Late Audio-visual Fusion For In-the-wild Speaker Diarization (2022)3.58