Challenging Margin-based Speaker Embedding Extractors By Using The Variational Information Bottleneck
2024 Β· Themos Stafylakis, Anna Silnova, Johan Rohdin, et al.
Abstract
Speaker embedding extractors are typically trained using a classification loss over the training speakers. During the last few years, the standard softmax/cross-entropy loss has been replaced by the margin-based losses, yielding significant improvements in speaker recognition accuracy. Motivated by the fact that the margin merely reduces the logit of the target speaker during training, we consider a probabilistic framework that has a similar effect. The variational information bottleneck provides a principled mechanism for making deterministic nodes stochastic, resulting in an implicit reduction of the posterior of the target speaker. We experiment with a wide range of speaker recognition benchmarks and scoring methods and report competitive results to those obtained with the state-of-the-art Additive Angular Margin loss.
Authors
(none)
Tags
Stats
Related papers
- Margin Matters: Towards More Discriminative Deep Neural Network Embeddings For Speaker Recognition (2019)15.25
- Large Margin Softmax Loss For Speaker Verification (2019)14.66
- Angular Softmax Loss For End-to-end Speaker Verification (2018)11.19
- A Study On Angular Based Embedding Learning For Text-independent Speaker Verification (2019)2.26
- Rethinking Session Variability: Leveraging Session Embeddings For Session Robustness In Speaker Verification (2023)5.24
- Within-sample Variability-invariant Loss For Robust Speaker Recognition Under Noisy Environments (2020)11.85
- Improved Large-margin Softmax Loss For Speaker Diarisation (2019)6.34
- Unified Hypersphere Embedding For Speaker Recognition (2018)0.00