Speaker Diarization Using Two-pass Leave-one-out Gaussian PLDA Clustering Of DNN Embeddings
2021 Β· Kiran Karra, Alan McCree
Abstract
Many modern systems for speaker diarization, such as the recently-developed VBx approach, rely on clustering of DNN speaker embeddings followed by resegmentation. Two problems with this approach are that the DNN is not directly optimized for this task, and the parameters need significant retuning for different applications. We have recently presented progress in this direction with a Leave-One-Out Gaussian PLDA (LGP) clustering algorithm and an approach to training the DNN such that embeddings directly optimize performance of this scoring method. This paper presents a new two-pass version of this system, where the second pass uses finer time resolution to significantly improve overall performance. For the Callhome corpus, we achieve the first published error rate below 4% without any task-dependent parameter tuning. We also show significant progress towards a robust single solution for multiple diarization tasks.
Authors
(none)
Tags
Stats
Related papers
- Speaker Diarization With LSTM (2017)17.48
- Enhancements For Audio-only Diarization Systems (2019)0.00
- Learning Deep Representations By Multilayer Bootstrap Networks For Speaker Diarization (2019)0.00
- End-to-end Speaker Diarization As Post-processing (2020)11.08
- Joint Training Of Speaker Embedding Extractor, Speech And Overlap Detection For Diarization (2024)2.26
- Meta-learning With Latent Space Clustering In Generative Adversarial Network For Speaker Diarization (2020)9.03
- Leveraging Speaker Embeddings In End-to-end Neural Diarization For Two-speaker Scenarios (2024)0.00
- NTT Speaker Diarization System For Chime-7: Multi-domain, Multi-microphone End-to-end And Vector Clustering Diarization (2023)7.16