Inter-kd: Intermediate Knowledge Distillation For Ctc-based Automatic Speech Recognition
2022 Β· Ji Won Yoon, Beom Jun Woo, Sunghwan Ahn, et al.
Abstract
Recently, the advance in deep learning has brought a considerable improvement in the end-to-end speech recognition field, simplifying the traditional pipeline while producing promising results. Among the end-to-end models, the connectionist temporal classification (CTC)-based model has attracted research interest due to its non-autoregressive nature. However, such CTC models require a heavy computational cost to achieve outstanding performance. To mitigate the computational burden, we propose a simple yet effective knowledge distillation (KD) for the CTC framework, namely Inter-KD, that additionally transfers the teacher's knowledge to the intermediate CTC layers of the student network. From the experimental results on the LibriSpeech, we verify that the Inter-KD shows better achievements compared to the conventional KD methods. Without using any language model (LM) and data augmentation, Inter-KD improves the word error rate (WER) performance from 8.85 % to 6.30 % on the test-clean.
Authors
(none)
Tags
Stats
Related papers
- Distilling Knowledge From Ensembles Of Acoustic Models For Joint Ctc-attention End-to-end Speech Recognition (2020)8.09
- Distil-dccrn: A Small-footprint DCCRN Leveraging Feature-based Knowledge Distillation In Speech Enhancement (2024)2.26
- Integrated Multi-level Knowledge Distillation For Enhanced Speaker Verification (2024)0.00
- CTC Blank Triggered Dynamic Layer-skipping For Efficient Ctc-based Speech Recognition (2024)0.00
- Leave No Knowledge Behind During Knowledge Distillation: Towards Practical And Effective Knowledge Distillation For Code-switching ASR Using Realistic Data (2024)3.58
- Knowledge Distillation For Neural Transducer-based Target-speaker ASR: Exploiting Parallel Mixture/single-talker Speech Data (2023)4.52
- Decouple Non-parametric Knowledge Distillation For End-to-end Speech Translation (2023)0.00
- Knowledge Transfer And Distillation From Autoregressive To Non-autoregressive Speech Recognition (2022)0.00