Self-supervised Reflective Learning Through Self-distillation And Online Clustering For Speaker Representation Learning
2024 Β· Danwei Cai, Zexin Cai, Ze Li, et al.
Abstract
Speaker representation learning is crucial for voice recognition systems, with recent advances in self-supervised approaches reducing dependency on labeled data. Current two-stage iterative frameworks, while effective, suffer from significant computational overhead due to repeated rounds of clustering and training. They also struggle with noisy pseudo labels that can impair model learning. This paper introduces self-supervised reflective learning (SSRL), an improved framework that addresses these limitations by enabling continuous refinement of pseudo labels during training. Through a teacher-student architecture and online clustering mechanism, SSRL eliminates the need for iterative training rounds. To handle label noise, we incorporate noisy label modeling and pseudo label queues that maintain temporal consistency. Experiments on VoxCeleb show SSRL's superiority over current two-stage iterative approaches, surpassing the performance of a 5-round method in just a single training round
Authors
(none)
Tags
Stats
Related papers
- Dinosr: Self-distillation And Online Clustering For Self-supervised Speech Representation Learning (2023)0.00
- An Iterative Framework For Self-supervised Deep Speaker Representation Learning (2020)10.61
- Self-distillation Prototypes Network: Learning Robust Speaker Representations Without Supervision (2023)4.52
- Self-supervised Speaker Verification With Simple Siamese Network And Self-supervised Regularization (2021)10.85
- Asymmetric Clean Segments-guided Self-supervised Learning For Robust Speaker Verification (2023)5.84
- Self-supervised Representation Learning With Path Integral Clustering For Speaker Diarization (2021)8.35
- Curriculum Learning For Self-supervised Speaker Verification (2022)8.09
- Self-supervised Speaker Recognition With Loss-gated Learning (2021)16.93