A Comparison Of Metric Learning Loss Functions For End-to-end Speaker Verification
2020 Β· Juan M. Coria, HervΓ© Bredin, Sahar Ghannay, et al.
Abstract
Despite the growing popularity of metric learning approaches, very little work has attempted to perform a fair comparison of these techniques for speaker verification. We try to fill this gap and compare several metric learning loss functions in a systematic manner on the VoxCeleb dataset. The first family of loss functions is derived from the cross entropy loss (usually used for supervised classification) and includes the congenerous cosine loss, the additive angular margin loss, and the center loss. The second family of loss functions focuses on the similarity between training samples and includes the contrastive loss and the triplet loss. We show that the additive angular margin loss function outperforms all other loss functions in the study, while learning more robust representations. Based on a combination of SincNet trainable features and the x-vector architecture, the network used in this paper brings us a step closer to a really-end-to-end speaker verification system, when comb
Authors
(none)
Tags
Stats
Related papers
- Large Margin Softmax Loss For Speaker Verification (2019)14.66
- Angular Softmax Loss For End-to-end Speaker Verification (2018)11.19
- Masked Proxy Loss For Text-independent Speaker Verification (2020)2.26
- Experimenting With Additive Margins For Contrastive Self-supervised Speaker Verification (2023)4.52
- Multi-task Metric Learning For Text-independent Speaker Verification (2020)0.00
- Semi-supervised Contrastive Learning With Generalized Contrastive Loss And Its Application To Speaker Recognition (2020)0.00
- Svsnet: An End-to-end Speaker Voice Similarity Assessment Model (2021)6.34
- Momentum Contrast Speaker Representation Learning (2020)0.00