Unsupervised Representation Learning For Speaker Recognition Via Contrastive Equilibrium Learning
2020 Β· Sung Hwan Mun, Woo Hyun Kang, Min Hyun Han, et al.
Abstract
In this paper, we propose a simple but powerful unsupervised learning method for speaker recognition, namely Contrastive Equilibrium Learning (CEL), which increases the uncertainty on nuisance factors latent in the embeddings by employing the uniformity loss. Also, to preserve speaker discriminability, a contrastive similarity loss function is used together. Experimental results showed that the proposed CEL significantly outperforms the state-of-the-art unsupervised speaker verification systems and the best performing model achieved 8.01% and 4.01% EER on VoxCeleb1 and VOiCES evaluation sets, respectively. On top of that, the performance of the supervised speaker embedding networks trained with initial parameters pre-trained via CEL showed better performance than those trained with randomly initialized parameters.
Authors
(none)
Tags
Stats
Related papers
- Bootstrap Equilibrium And Probabilistic Speaker Representation Learning For Self-supervised Speaker Verification (2021)5.24
- Momentum Contrast Speaker Representation Learning (2020)0.00
- Speaker Representation Learning Via Contrastive Loss With Maximal Speaker Separability (2022)10.68
- Label-efficient Self-supervised Speaker Verification With Information Maximization And Contrastive Learning (2022)6.77
- Semi-supervised Contrastive Learning With Generalized Contrastive Loss And Its Application To Speaker Recognition (2020)0.00
- Unified Hypersphere Embedding For Speaker Recognition (2018)0.00
- Asymmetric Clean Segments-guided Self-supervised Learning For Robust Speaker Verification (2023)5.84
- Self-supervised Text-independent Speaker Verification Using Prototypical Momentum Contrastive Learning (2020)12.93