Speaker-ipl: Unsupervised Learning Of Speaker Characteristics With I-vector Based Pseudo-labels
2024 Β· Zakaria Aldeneh, Takuya Higuchi, Jee-Weon Jung, et al.
Abstract
Iterative self-training, or iterative pseudo-labeling (IPL) -- using an improved model from the current iteration to provide pseudo-labels for the next iteration -- has proven to be a powerful approach to enhance the quality of speaker representations. Recent applications of IPL in unsupervised speaker recognition start with representations extracted from very elaborate self-supervised methods (e.g., DINO). However, training such strong self-supervised models is not straightforward (they require hyper-parameter tuning and may not generalize to out-of-domain data) and, moreover, may not be needed at all. To this end, we show that the simple, well-studied, and established i-vector generative model is enough to bootstrap the IPL process for the unsupervised learning of speaker representations. We also systematically study the impact of other components on the IPL process, which includes the initial model, the encoder, augmentations, the number of clusters, and the clustering algorithm. Re
Authors
(none)
Tags
Stats
Related papers
- Slimipl: Language-model-free Iterative Pseudo-labeling (2020)10.74
- Discriminatively Re-trained I-vector Extractor For Speaker Recognition (2018)5.84
- Investigation Of Using VAE For I-vector Speaker Verification (2017)0.00
- Self-supervised Reflective Learning Through Self-distillation And Online Clustering For Speaker Representation Learning (2024)2.26
- An Iterative Framework For Self-supervised Deep Speaker Representation Learning (2020)10.61
- Weakly Supervised Training Of Speaker Identification Models (2018)5.84
- Local Training For PLDA In Speaker Verification (2016)0.00
- Curriculum Learning For Self-supervised Speaker Verification (2022)8.09