Fully Supervised Speaker Diarization
2018 Β· Aonan Zhang, Quan Wang, Zhenyao Zhu, et al.
Abstract
In this paper, we propose a fully supervised speaker diarization approach, named unbounded interleaved-state recurrent neural networks (UIS-RNN). Given extracted speaker-discriminative embeddings (a.k.a. d-vectors) from input utterances, each individual speaker is modeled by a parameter-sharing RNN, while the RNN states for different speakers interleave in the time domain. This RNN is naturally integrated with a distance-dependent Chinese restaurant process (ddCRP) to accommodate an unknown number of speakers. Our system is fully supervised and is able to learn from examples where time-stamped speaker labels are annotated. We achieved a 7.6% diarization error rate on NIST SRE 2000 CALLHOME, which is better than the state-of-the-art method using spectral clustering. Moreover, our method decodes in an online fashion while most state-of-the-art systems rely on offline clustering.
Authors
(none)
Tags
Stats
Related papers
- Speaker Diarization Using Deep Recurrent Convolutional Neural Networks For Speaker Embeddings (2017)9.41
- Speaker Diarization With LSTM (2017)17.48
- A Reinforcement Learning Framework For Online Speaker Diarization (2023)0.00
- Supervised Online Diarization With Sample Mean Loss For Multi-domain Data (2019)9.92
- Transcribe-to-diarize: Neural Speaker Diarization For Unlimited Number Of Speakers Using End-to-end Speaker-attributed ASR (2021)11.49
- Speaker Diarization Using Two-pass Leave-one-out Gaussian PLDA Clustering Of DNN Embeddings (2021)2.26
- Deep Self-supervised Hierarchical Clustering For Speaker Diarization (2020)5.24
- Sequence-to-sequence Neural Diarization With Automatic Speaker Detection And Representation (2024)6.34