Speaker Diarization Using Deep Recurrent Convolutional Neural Networks For Speaker Embeddings
2017 · Pawel Cyrta, Tomasz Trzciński, Wojciech Stokowiec
Abstract
In this paper we propose a new method of speaker diarization that employs a deep learning architecture to learn speaker embeddings. In contrast to the traditional approaches that build their speaker embeddings using manually hand-crafted spectral features, we propose to train for this purpose a recurrent convolutional neural network applied directly on magnitude spectrograms. To compare our approach with the state of the art, we collect and release for the public an additional dataset of over 6 hours of fully annotated broadcast material. The results of our evaluation on the new dataset and three other benchmark datasets show that our proposed method significantly outperforms the competitors and reduces diarization error rate by a large margin of over 30% with respect to the baseline.
Authors
(none)
Tags
Stats
Related papers
- Speaker Diarization With LSTM (2017)17.48
- Leveraging Speaker Embeddings In End-to-end Neural Diarization For Two-speaker Scenarios (2024)0.00
- Combination Of Deep Speaker Embeddings For Diarisation (2020)8.60
- Advancing The Dimensionality Reduction Of Speaker Embeddings For Speaker Diarisation: Disentangling Noise And Informing Speech Activity (2021)2.26
- Multi-scale Speaker Embedding-based Graph Attention Networks For Speaker Diarisation (2021)8.35
- Fully Supervised Speaker Diarization (2018)15.80
- Deep Self-supervised Hierarchical Clustering For Speaker Diarization (2020)5.24
- Sequence-to-sequence Neural Diarization With Automatic Speaker Detection And Representation (2024)6.34