Centroid-based Deep Metric Learning For Speaker Recognition
2019 Β· Jixuan Wang, Kuan-Chieh Wang, Marc Law, et al.
Abstract
Speaker embedding models that utilize neural networks to map utterances to a space where distances reflect similarity between speakers have driven recent progress in the speaker recognition task. However, there is still a significant performance gap between recognizing speakers in the training set and unseen speakers. The latter case corresponds to the few-shot learning task, where a trained model is evaluated on unseen classes. Here, we optimize a speaker embedding model with prototypical network loss (PNL), a state-of-the-art approach for the few-shot image classification task. The resulting embedding model outperforms the state-of-the-art triplet loss based models in both speaker verification and identification tasks, for both seen and unseen speakers.
Authors
(none)
Tags
Stats
Related papers
- Multi-task Metric Learning For Text-independent Speaker Verification (2020)0.00
- Few Shot Speaker Recognition Using Deep Neural Networks (2019)0.00
- Partial AUC Optimization Based Deep Speaker Embeddings With Class-center Learning For Text-independent Speaker Verification (2019)9.59
- Margin Matters: Towards More Discriminative Deep Neural Network Embeddings For Speaker Recognition (2019)15.25
- On Deep Speaker Embeddings For Text-independent Speaker Recognition (2018)11.93
- Triplet Based Embedding Distance And Similarity Learning For Text-independent Speaker Verification (2019)5.24
- Unified Hypersphere Embedding For Speaker Recognition (2018)0.00
- Parameterized Channel Normalization For Far-field Deep Speaker Verification (2021)3.58