Enhancing Speaker Verification With W2v-bert 2.0 And Knowledge Distillation Guided Structured Pruning
2025 Β· Ze Li, Ming Cheng, Ming Li
Abstract
Large-scale self-supervised Pre-Trained Models (PTMs) have shown significant improvements in the speaker verification (SV) task by providing rich feature representations. In this paper, we utilize w2v-BERT 2.0, a model with approximately 600 million parameters trained on 4.5 million hours of unlabeled data across 143 languages, for the SV task. The MFA structure with Layer Adapter is employed to process the multi-layer feature outputs from the PTM and extract speaker embeddings. Additionally, we incorporate LoRA for efficient fine-tuning. Our model achieves state-of-the-art results with 0.12% and 0.55% EER on the Vox1-O and Vox1-H test sets, respectively. Furthermore, we apply knowledge distillation guided structured pruning, reducing the model size by 80% while achieving only a 0.04% EER degradation. Source code and models are released at https://github.com/ZXHY-82/w2v-BERT-2.0_SV.
Authors
(none)
Tags
Stats
Code
Related papers
- One-step Knowledge Distillation And Fine-tuning In Using Large Pre-trained Self-supervised Learning Models For Speaker Verification (2023)7.81
- Leveraging ASR Pretrained Conformers For Speaker Verification Through Transfer Learning And Knowledge Distillation (2023)10.74
- Exploring Wav2vec 2.0 On Speaker Verification And Language Identification (2020)15.59
- Short-segment Speaker Verification With Pre-trained Models And Multi-resolution Encoder (2025)0.00
- Towards Supervised Performance On Speaker Verification With Self-supervised Learning By Leveraging Large-scale ASR Models (2024)7.50
- An Attention-based Backend Allowing Efficient Fine-tuning Of Transformer Models For Speaker Verification (2022)11.08
- Application Of Knowledge Distillation To Multi-task Speech Representation Learning (2022)2.26
- Efficient Adapter Tuning Of Pre-trained Speech Models For Automatic Speaker Verification (2024)0.00