Pushing The Limits Of Self-supervised Speaker Verification Using Regularized Distillation Framework
2022 Β· Yafeng Chen, Siqi Zheng, Hui Wang, et al.
Abstract
Training robust speaker verification systems without speaker labels has long been a challenging task. Previous studies observed a large performance gap between self-supervised and fully supervised methods. In this paper, we apply a non-contrastive self-supervised learning framework called DIstillation with NO labels (DINO) and propose two regularization terms applied to embeddings in DINO. One regularization term guarantees the diversity of the embeddings, while the other regularization term decorrelates the variables of each embedding. The effectiveness of various data augmentation techniques are explored, on both time and frequency domain. A range of experiments conducted on the VoxCeleb datasets demonstrate the superiority of the regularized DINO framework in speaker verification. Our method achieves the state-of-the-art speaker verification performance under a single-stage self-supervised setting on VoxCeleb. Code has been made publicly available at https://github.com/alibaba-damo-
Authors
(none)
Tags
Stats
Related papers
- Self-supervised Learning With Cluster-aware-dino For High-performance Robust Speaker Verification (2023)0.00
- Self-supervised Speaker Verification With Simple Siamese Network And Self-supervised Regularization (2021)10.85
- Self-distillation Prototypes Network: Learning Robust Speaker Representations Without Supervision (2023)4.52
- Curriculum Learning For Self-supervised Speaker Verification (2022)8.09
- DINO-VITS: Data-efficient Zero-shot TTS With Self-supervised Speaker Verification Loss For Noise Robustness (2023)3.58
- Self-supervised Speaker Verification Using Dynamic Loss-gate And Label Correction (2022)10.74
- An Iterative Framework For Self-supervised Deep Speaker Representation Learning (2020)10.61
- Label-efficient Self-supervised Speaker Verification With Information Maximization And Contrastive Learning (2022)6.77