Leveraging ASR Pretrained Conformers For Speaker Verification Through Transfer Learning And Knowledge Distillation
2023 Β· Danwei Cai, Ming Li
Abstract
This paper explores the use of ASR-pretrained Conformers for speaker verification, leveraging their strengths in modeling speech signals. We introduce three strategies: (1) Transfer learning to initialize the speaker embedding network, improving generalization and reducing overfitting. (2) Knowledge distillation to train a more flexible speaker verification model, incorporating frame-level ASR loss as an auxiliary task. (3) A lightweight speaker adaptor for efficient feature conversion without altering the original ASR Conformer, allowing parallel ASR and speaker verification. Experiments on VoxCeleb show significant improvements: transfer learning yields a 0.48% EER, knowledge distillation results in a 0.43% EER, and the speaker adaptor approach, with just an added 4.92M parameters to a 130.94M-parameter model, achieves a 0.57% EER. Overall, our methods effectively transfer ASR capabilities to speaker verification tasks.
Authors
(none)
Tags
Stats
Related papers
- Efficient Adapter Tuning Of Pre-trained Speech Models For Automatic Speaker Verification (2024)0.00
- Enhancing Speaker Verification With W2v-bert 2.0 And Knowledge Distillation Guided Structured Pruning (2025)3.33
- One-step Knowledge Distillation And Fine-tuning In Using Large Pre-trained Self-supervised Learning Models For Speaker Verification (2023)7.81
- Emphasized Non-target Speaker Knowledge In Knowledge Distillation For Automatic Speaker Verification (2023)8.35
- Towards A Unified Conformer Structure: From ASR To ASV Task (2022)13.11
- Eres2netv2: Boosting Short-duration Speaker Verification Performance With Computational Efficiency (2024)9.41
- Mfa-conformer: Multi-scale Feature Aggregation Conformer For Automatic Speaker Verification (2022)15.46
- Adapting End-to-end Neural Speaker Verification To New Languages And Recording Conditions With Adversarial Training (2018)9.59