Predicting Multi-codebook Vector Quantization Indexes For Knowledge Distillation
2022 Β· Liyong Guo, Xiaoyu Yang, Quandong Wang, et al.
Abstract
Knowledge distillation(KD) is a common approach to improve model performance in automatic speech recognition (ASR), where a student model is trained to imitate the output behaviour of a teacher model. However, traditional KD methods suffer from teacher label storage issue, especially when the training corpora are large. Although on-the-fly teacher label generation tackles this issue, the training speed is significantly slower as the teacher model has to be evaluated every batch. In this paper, we reformulate the generation of teacher label as a codec problem. We propose a novel Multi-codebook Vector Quantization (MVQ) approach that compresses teacher embeddings to codebook indexes (CI). Based on this, a KD training framework (MVQ-KD) is proposed where a student model predicts the CI generated from the embeddings of a self-supervised pre-trained teacher model. Experiments on the LibriSpeech clean-100 hour show that MVQ-KD framework achieves comparable performance as traditional KD metho
Authors
(none)
Tags
Stats
Related papers
- Integrated Multi-level Knowledge Distillation For Enhanced Speaker Verification (2024)0.00
- VIC-KD: Variance-invariance-covariance Knowledge Distillation To Make Keyword Spotting More Robust Against Adversarial Attacks (2023)2.26
- Inter-kd: Intermediate Knowledge Distillation For Ctc-based Automatic Speech Recognition (2022)7.50
- Emphasized Non-target Speaker Knowledge In Knowledge Distillation For Automatic Speaker Verification (2023)8.35
- Leave No Knowledge Behind During Knowledge Distillation: Towards Practical And Effective Knowledge Distillation For Code-switching ASR Using Realistic Data (2024)3.58
- Knowledge Distillation From Non-streaming To Streaming ASR Encoder Using Auxiliary Non-streaming Layer (2023)0.00
- Intra-utterance Similarity Preserving Knowledge Distillation For Audio Tagging (2020)3.58
- MT2KD: Towards A General-purpose Encoder For Speech, Speaker, And Audio Events (2024)0.00