Momentum Contrast Speaker Representation Learning
2020 Β· Jangho Lee, Jaihyun Koh, Sungroh Yoon
Abstract
Unsupervised representation learning has shown remarkable achievement by reducing the performance gap with supervised feature learning, especially in the image domain. In this study, to extend the technique of unsupervised learning to the speech domain, we propose the Momentum Contrast for VoxCeleb (MoCoVox) as a form of learning mechanism. We pre-trained the MoCoVox on the VoxCeleb1 by implementing instance discrimination. Applying MoCoVox for speaker verification revealed that it outperforms the state-of-the-art metric learning-based approach by a large margin. We also empirically demonstrate the features of contrastive learning in the speech domain by analyzing the distribution of learned representations. Furthermore, we explored which pretext task is adequate for speaker verification. We expect that learning speaker representation without human supervision helps to address the open-set speaker recognition.
Authors
(none)
Tags
Stats
Related papers
- Self-supervised Text-independent Speaker Verification Using Prototypical Momentum Contrastive Learning (2020)12.93
- Unsupervised Representation Learning For Speaker Recognition Via Contrastive Equilibrium Learning (2020)0.00
- Unsupervised Voice-face Representation Learning By Cross-modal Prototype Contrast (2022)10.35
- Label-efficient Self-supervised Speaker Verification With Information Maximization And Contrastive Learning (2022)6.77
- Discriminative Speaker Representation Via Contrastive Learning With Class-aware Attention In Angular Space (2022)8.60
- Semi-supervised Contrastive Learning With Generalized Contrastive Loss And Its Application To Speaker Recognition (2020)0.00
- A Comparison Of Metric Learning Loss Functions For End-to-end Speaker Verification (2020)6.77
- Multi-domain Adaptation By Self-supervised Learning For Speaker Verification (2023)0.00