An Attention-based Backend Allowing Efficient Fine-tuning Of Transformer Models For Speaker Verification
2022 Β· Junyi Peng, Oldrich Plchot, Themos Stafylakis, et al.
Abstract
In recent years, self-supervised learning paradigm has received extensive attention due to its great success in various down-stream tasks. However, the fine-tuning strategies for adapting those pre-trained models to speaker verification task have yet to be fully explored. In this paper, we analyze several feature extraction approaches built on top of a pre-trained model, as well as regularization and learning rate schedule to stabilize the fine-tuning process and further boost performance: multi-head factorized attentive pooling is proposed to factorize the comparison of speaker representations into multiple phonetic clusters. We regularize towards the parameters of the pre-trained model and we set different learning rates for each layer of the pre-trained model during fine-tuning. The experimental results show our method can significantly shorten the training time to 4 hours and achieve SOTA performance: 0.59%, 0.79% and 1.77% EER on Vox1-O, Vox1-E and Vox1-H, respectively.
Authors
(none)
Tags
Stats
Related papers
- Efficient Adapter Tuning Of Pre-trained Speech Models For Automatic Speaker Verification (2024)0.00
- Improving Transformer-based Networks With Locality For Automatic Speaker Verification (2023)0.00
- Parameter-efficient Transfer Learning Of Pre-trained Transformer Models For Speaker Verification Using Adapters (2022)0.00
- Short-segment Speaker Verification With Pre-trained Models And Multi-resolution Encoder (2025)0.00
- Attention Back-end For Automatic Speaker Verification With Multiple Enrollment Utterances (2021)10.21
- Enhancing Speaker Verification With W2v-bert 2.0 And Knowledge Distillation Guided Structured Pruning (2025)3.33
- Input-independent Attention Weights Are Expressive Enough: A Study Of Attention In Self-supervised Audio Transformers (2020)0.00
- Self-attentive Multi-layer Aggregation With Feature Recalibration And Normalization For End-to-end Speaker Verification System (2020)0.00