One-step Knowledge Distillation And Fine-tuning In Using Large Pre-trained Self-supervised Learning Models For Speaker Verification
2023 Β· Jungwoo Heo, Chan-Yeong Lim, Ju-Ho Kim, et al.
Abstract
The application of speech self-supervised learning (SSL) models has achieved remarkable performance in speaker verification (SV). However, there is a computational cost hurdle in employing them, which makes development and deployment difficult. Several studies have simply compressed SSL models through knowledge distillation (KD) without considering the target task. Consequently, these methods could not extract SV-tailored features. This paper suggests One-Step Knowledge Distillation and Fine-Tuning (OS-KDFT), which incorporates KD and fine-tuning (FT). We optimize a student model for SV during KD training to avert the distillation of inappropriate information for the SV. OS-KDFT could downsize Wav2Vec 2.0 based ECAPA-TDNN size by approximately 76.2%, and reduce the SSL model's inference time by 79% while presenting an EER of 0.98%. The proposed OS-KDFT is validated across VoxCeleb1 and VoxCeleb2 datasets and W2V2 and HuBERT SSL models. Experiments are available on our GitHub.
Authors
(none)
Tags
Stats
Related papers
- Deep Versus Wide: An Analysis Of Student Architectures For Task-agnostic Knowledge Distillation Of Self-supervised Speech Models (2022)9.23
- Towards Supervised Performance On Speaker Verification With Self-supervised Learning By Leveraging Large-scale ASR Models (2024)7.50
- SKILL: Similarity-aware Knowledge Distillation For Speech Self-supervised Learning (2024)3.58
- Enhancing Speaker Verification With W2v-bert 2.0 And Knowledge Distillation Guided Structured Pruning (2025)3.33
- Fithubert: Going Thinner And Deeper For Knowledge Distillation Of Speech Self-supervised Learning (2022)10.97
- Application Of Knowledge Distillation To Multi-task Speech Representation Learning (2022)2.26
- Emphasized Non-target Speaker Knowledge In Knowledge Distillation For Automatic Speaker Verification (2023)8.35
- Integrated Multi-level Knowledge Distillation For Enhanced Speaker Verification (2024)0.00