Semi-supervised Multi-channel Speaker Diarization With Cross-channel Attention
2023 Β· Shilong Wu, Jun Du, Maokui He, et al.
Abstract
Most neural speaker diarization systems rely on sufficient manual training data labels, which are hard to collect under real-world scenarios. This paper proposes a semi-supervised speaker diarization system to utilize large-scale multi-channel training data by generating pseudo-labels for unlabeled data. Furthermore, we introduce cross-channel attention into the Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding (NSD-MA-MSE) to learn channel contextual information of speaker embeddings better. Experimental results on the CHiME-7 Mixer6 dataset which only contains partial speakers' labels of the training set, show that our system achieved 57.01% relative DER reduction compared to the clustering-based model on the development set. We further conducted experiments on the CHiME-6 dataset to simulate the scenario of missing partial training set labels. When using 80% and 50% labeled training data, our system performs comparably to the results obtained using 100% labeled d
Authors
(none)
Tags
Stats
Related papers
- Neural Speaker Diarization Using Memory-aware Multi-speaker Embedding With Sequence-to-sequence Architecture (2023)3.87
- Mutual Learning Of Single- And Multi-channel End-to-end Neural Diarization (2022)3.58
- NTT Speaker Diarization System For Chime-7: Multi-domain, Multi-microphone End-to-end And Vector Clustering Diarization (2023)7.16
- End-to-end Neural Diarization: Reformulating Speaker Diarization As Simple Multi-label Classification (2020)0.00
- Incorporating Spatial Cues In Modular Speaker Diarization For Multi-channel Multi-party Meetings (2024)4.52
- Speaker Diarization Using Deep Recurrent Convolutional Neural Networks For Speaker Embeddings (2017)9.41
- Neural Blind Source Separation And Diarization For Distant Speech Recognition (2024)0.00
- Multi-channel End-to-end Neural Diarization With Distributed Microphones (2021)10.21