LSTM Based Similarity Measurement With Spectral Clustering For Speaker Diarization
2019 Β· Qingjian Lin, Ruiqing Yin, Ming Li, et al.
Abstract
More and more neural network approaches have achieved considerable improvement upon submodules of speaker diarization system, including speaker change detection and segment-wise speaker embedding extraction. Still, in the clustering stage, traditional algorithms like probabilistic linear discriminant analysis (PLDA) are widely used for scoring the similarity between two speech segments. In this paper, we propose a supervised method to measure the similarity matrix between all segments of an audio recording with sequential bidirectional long short-term memory networks (Bi-LSTM). Spectral clustering is applied on top of the similarity matrix to further improve the performance. Experimental results show that our system significantly outperforms the state-of-the-art methods and achieves a diarization error rate of 6.63% on the NIST SRE 2000 CALLHOME database.
Authors
(none)
Tags
Stats
Related papers
- Speaker Diarization With LSTM (2017)17.48
- Speaker Diarization Using Two-pass Leave-one-out Gaussian PLDA Clustering Of DNN Embeddings (2021)2.26
- Assessing The Robustness Of Spectral Clustering For Deep Speaker Diarization (2024)3.58
- Self-tuning Spectral Clustering For Speaker Diarization (2024)3.81
- Enhancements For Audio-only Diarization Systems (2019)0.00
- Learning Deep Representations By Multilayer Bootstrap Networks For Speaker Diarization (2019)0.00
- Spectral Clustering-aware Learning Of Embeddings For Speaker Diarisation (2022)2.26
- Multi-class Spectral Clustering With Overlaps For Speaker Diarization (2020)10.35