Multi-stream Extension Of Variational Bayesian HMM Clustering (ms-vbx) For Combined End-to-end And Vector Clustering-based Diarization
2023 Β· Marc Delcroix, Naohiro Tawara, Mireia Diez, et al.
Abstract
Combining end-to-end neural speaker diarization (EEND) with vector clustering (VC), known as EEND-VC, has gained interest for leveraging the strengths of both methods. EEND-VC estimates activities and speaker embeddings for all speakers within an audio chunk and uses VC to associate these activities with speaker identities across different chunks. EEND-VC generates thus multiple streams of embeddings, one for each speaker in a chunk. We can cluster these embeddings using constrained agglomerative hierarchical clustering (cAHC), ensuring embeddings from the same chunk belong to different clusters. This paper introduces an alternative clustering approach, a multi-stream extension of the successful Bayesian HMM clustering of x-vectors (VBx), called MS-VBx. Experiments on three datasets demonstrate that MS-VBx outperforms cAHC in diarization and speaker counting performance.
Authors
(none)
Tags
Stats
Related papers
- Bayesian HMM Clustering Of X-vector Sequences (vbx) In Speaker Diarization: Theory, Implementation And Analysis On Standard Tasks (2020)0.00
- Discriminative Training Of Vbx Diarization (2023)5.84
- Advances In Integration Of End-to-end Neural And Clustering-based Diarization For Real Conversational Speech (2021)16.48
- Speakers Unembedded: Embedding-free Approach To Long-form Neural Diarization (2024)3.58
- Integrating End-to-end Neural And Clustering-based Diarization: Getting The Best Of Both Worlds (2020)13.74
- Combination Of Deep Speaker Embeddings For Diarisation (2020)8.60
- End-to-end Supervised Hierarchical Graph Clustering For Speaker Diarization (2024)5.24
- Multi-channel End-to-end Neural Diarization With Distributed Microphones (2021)10.21