Unified Hypersphere Embedding For Speaker Recognition
2018 Β· Mahdi Hajibabaei, Dengxin Dai
Abstract
Incremental improvements in accuracy of Convolutional Neural Networks are usually achieved through use of deeper and more complex models trained on larger datasets. However, enlarging dataset and models increases the computation and storage costs and cannot be done indefinitely. In this work, we seek to improve the identification and verification accuracy of a text-independent speaker recognition system without use of extra data or deeper and more complex models by augmenting the training and testing data, finding the optimal dimensionality of embedding space and use of more discriminative loss functions. Results of experiments on VoxCeleb dataset suggest that: (i) Simple repetition and random time-reversion of utterances can reduce prediction errors by up to 18%. (ii) Lower dimensional embeddings are more suitable for verification. (iii) Use of proposed logistic margin loss function leads to unified embeddings with state-of-the-art identification and competitive verification accuracie
Authors
(none)
Tags
Stats
Related papers
- Margin Matters: Towards More Discriminative Deep Neural Network Embeddings For Speaker Recognition (2019)15.25
- Large Margin Softmax Loss For Speaker Verification (2019)14.66
- Deep Speaker Embedding Learning With Multi-level Pooling For Text-independent Speaker Verification (2019)0.00
- Adapting End-to-end Neural Speaker Verification To New Languages And Recording Conditions With Adversarial Training (2018)9.59
- Delving Into Voxceleb: Environment Invariant Speaker Recognition (2019)10.35
- Neural Scoring: A Refreshed End-to-end Approach For Speaker Recognition In Complex Conditions (2024)0.00
- ECAPA2: A Hybrid Neural Network Architecture And Training Strategy For Robust Speaker Embeddings (2024)0.00
- An Improved Deep Neural Network For Modeling Speaker Characteristics At Different Temporal Scales (2020)6.34