Speaker-ipl: Unsupervised Learning Of Speaker Characteristics With I-vector Based Pseudo-labels

Abstract

Iterative self-training, or iterative pseudo-labeling (IPL) -- using an improved model from the current iteration to provide pseudo-labels for the next iteration -- has proven to be a powerful approach to enhance the quality of speaker representations. Recent applications of IPL in unsupervised speaker recognition start with representations extracted from very elaborate self-supervised methods (e.g., DINO). However, training such strong self-supervised models is not straightforward (they require hyper-parameter tuning and may not generalize to out-of-domain data) and, moreover, may not be needed at all. To this end, we show that the simple, well-studied, and established i-vector generative model is enough to bootstrap the IPL process for the unsupervised learning of speaker representations. We also systematically study the impact of other components on the IPL process, which includes the initial model, the encoder, augmentations, the number of clusters, and the clustering algorithm. Re

Speaker-ipl: Unsupervised Learning Of Speaker Characteristics With I-vector Based Pseudo-labels

Abstract

Authors

Tags

Stats

Related papers