Curricular Sincnet: Towards Robust Deep Speaker Recognition By Emphasizing Hard Samples In Latent Space
2021 Β· Labib Chowdhury, Mustafa Kamal, Najia Hasan, et al.
Abstract
Deep learning models have become an increasingly preferred option for biometric recognition systems, such as speaker recognition. SincNet, a deep neural network architecture, gained popularity in speaker recognition tasks due to its parameterized sinc functions that allow it to work directly on the speech signal. The original SincNet architecture uses the softmax loss, which may not be the most suitable choice for recognition-based tasks. Such loss functions do not impose inter-class margins nor differentiate between easy and hard training samples. Curriculum learning, particularly those leveraging angular margin-based losses, has proven very successful in other biometric applications such as face recognition. The advantage of such a curriculum learning-based techniques is that it will impose inter-class margins as well as taking to account easy and hard samples. In this paper, we propose Curricular SincNet (CL-SincNet), an improved SincNet model where we use a curricular loss function
Authors
(none)
Tags
Stats
Related papers
- Additive Margin Sincnet For Speaker Recognition (2019)7.16
- Speaker Recognition From Raw Waveform With Sincnet (2018)20.65
- Speech And Speaker Recognition From Raw Waveform With Sincnet (2018)0.00
- On Deep Speaker Embeddings For Text-independent Speaker Recognition (2018)11.93
- Margin Matters: Towards More Discriminative Deep Neural Network Embeddings For Speaker Recognition (2019)15.25
- Adversarial Defense For Deep Speaker Recognition Using Hybrid Adversarial Training (2020)9.59
- Few Shot Speaker Recognition Using Deep Neural Networks (2019)0.00
- Unified Hypersphere Embedding For Speaker Recognition (2018)0.00