Distilling Multi-level X-vector Knowledge For Small-footprint Speaker Verification
2023 Β· Xuechen Liu, Md Sahidullah, Tomi Kinnunen
Abstract
Even though deep speaker models have demonstrated impressive accuracy in speaker verification tasks, this often comes at the expense of increased model size and computation time, presenting challenges for deployment in resource-constrained environments. Our research focuses on addressing this limitation through the development of small footprint deep speaker embedding extraction using knowledge distillation. While previous work in this domain has concentrated on speaker embedding extraction at the utterance level, our approach involves amalgamating embeddings from different levels of the x-vector model (teacher network) to train a compact student network. The results highlight the significance of frame-level information, with the student models exhibiting a remarkable size reduction of 85%-91% compared to their teacher counterparts, depending on the size of the teacher embeddings. Notably, by concatenating teacher embeddings, we achieve student networks that maintain comparable perform
Authors
(none)
Tags
Stats
Related papers
- Integrated Multi-level Knowledge Distillation For Enhanced Speaker Verification (2024)0.00
- Open-set Short Utterance Forensic Speaker Verification Using Teacher-student Network With Explicit Inductive Bias (2020)9.41
- Emphasized Non-target Speaker Knowledge In Knowledge Distillation For Automatic Speaker Verification (2023)8.35
- Multi-task Learning With High-order Statistics For X-vector Based Text-independent Speaker Verification (2019)8.35
- Deep Speaker Embedding Learning With Multi-level Pooling For Text-independent Speaker Verification (2019)0.00
- Leveraging ASR Pretrained Conformers For Speaker Verification Through Transfer Learning And Knowledge Distillation (2023)10.74
- Multi-level Transfer Learning From Near-field To Far-field Speaker Verification (2021)0.00
- Vae-based Domain Adaptation For Speaker Verification (2019)7.50