Supervised Online Diarization With Sample Mean Loss For Multi-domain Data
2019 Β· Enrico Fini, Alessio Brutti
Abstract
Recently, a fully supervised speaker diarization approach was proposed (UIS-RNN) which models speakers using multiple instances of a parameter-sharing recurrent neural network. In this paper we propose qualitative modifications to the model that significantly improve the learning efficiency and the overall diarization performance. In particular, we introduce a novel loss function, we called Sample Mean Loss and we present a better modelling of the speaker turn behaviour, by devising an analytical expression to compute the probability of a new speaker joining the conversation. In addition, we demonstrate that our model can be trained on fixed-length speech segments, removing the need for speaker change information in inference. Using x-vectors as input features, we evaluate our proposed approach on the multi-domain dataset employed in the DIHARD II challenge: our online method improves with respect to the original UIS-RNN and achieves similar performance to an offline agglomerative clus
Authors
(none)
Tags
Stats
Related papers
- Fully Supervised Speaker Diarization (2018)15.80
- A Reinforcement Learning Framework For Online Speaker Diarization (2023)0.00
- Speaker Diarization As A Fully Online Learning Problem In Minivox (2020)0.00
- Speaker Diarization Using Deep Recurrent Convolutional Neural Networks For Speaker Embeddings (2017)9.41
- Enhancements For Audio-only Diarization Systems (2019)0.00
- Overlap-aware Low-latency Online Speaker Diarization Based On End-to-end Local Segmentation (2021)10.35
- Improved Large-margin Softmax Loss For Speaker Diarisation (2019)6.34
- Semi-supervised Multi-channel Speaker Diarization With Cross-channel Attention (2023)2.26