Improved Large-margin Softmax Loss For Speaker Diarisation
2019 Β· Yassir Fathullah, Chao Zhang, Philip C. Woodland
Abstract
Speaker diarisation systems nowadays use embeddings generated from speech segments in a bottleneck layer, which are needed to be discriminative for unseen speakers. It is well-known that large-margin training can improve the generalisation ability to unseen data, and its use in such open-set problems has been widespread. Therefore, this paper introduces a general approach to the large-margin softmax loss without any approximations to improve the quality of speaker embeddings for diarisation. Furthermore, a novel and simple way to stabilise training, when large-margin softmax is used, is proposed. Finally, to combat the effect of overlapping speech, different training margins are used to reduce the negative effect overlapping speech has on creating discriminative embeddings. Experiments on the AMI meeting corpus show that the use of large-margin softmax significantly improves the speaker error rate (SER). By using all hyper parameters of the loss in a unified way, further improvements w
Authors
(none)
Tags
Stats
Related papers
- Spectral Clustering-aware Learning Of Embeddings For Speaker Diarisation (2022)2.26
- Large Margin Softmax Loss For Speaker Verification (2019)14.66
- Multi-scale Speaker Embedding-based Graph Attention Networks For Speaker Diarisation (2021)8.35
- Leveraging Speaker Embeddings In End-to-end Neural Diarization For Two-speaker Scenarios (2024)0.00
- Margin Matters: Towards More Discriminative Deep Neural Network Embeddings For Speaker Recognition (2019)15.25
- Advancing The Dimensionality Reduction Of Speaker Embeddings For Speaker Diarisation: Disentangling Noise And Informing Speech Activity (2021)2.26
- Speaker Diarization Using Deep Recurrent Convolutional Neural Networks For Speaker Embeddings (2017)9.41
- Scoring Of Large-margin Embeddings For Speaker Verification: Cosine Or PLDA? (2022)9.76