Leave No Knowledge Behind During Knowledge Distillation: Towards Practical And Effective Knowledge Distillation For Code-switching ASR Using Realistic Data
2024 Β· Liang-Hsuan Tseng, Zih-Ching Chen, Wei-Shun Chang, et al.
Abstract
Recent advances in automatic speech recognition (ASR) often rely on large speech foundation models for generating high-quality transcriptions. However, these models can be impractical due to limited computing resources. The situation is even more severe in terms of more realistic or difficult scenarios, such as code-switching ASR (CS-ASR). To address this, we present a framework for developing more efficient models for CS-ASR through knowledge distillation using realistic speech-only data. Our proposed method, Leave No Knowledge Behind During Knowledge Distillation (K\(^2\)D), leverages both the teacher model's knowledge and additional insights from a small auxiliary model. We evaluate our approach on two in-domain and two out-domain datasets, demonstrating that K\(^2\)D is effective. By conducting K\(^2\)D on the unlabeled realistic data, we have successfully obtained a 2-time smaller model with 5-time faster generation speed while outperforming the baseline methods and the teacher mo
Authors
(none)
Tags
Stats
Related papers
- Reducing The Gap Between Streaming And Non-streaming Transducer-based ASR By Adaptive Two-stage Knowledge Distillation (2023)4.52
- Inter-kd: Intermediate Knowledge Distillation For Ctc-based Automatic Speech Recognition (2022)7.50
- Knowledge Distillation From Non-streaming To Streaming ASR Encoder Using Auxiliary Non-streaming Layer (2023)0.00
- Decouple Non-parametric Knowledge Distillation For End-to-end Speech Translation (2023)0.00
- Distilling Knowledge From Ensembles Of Acoustic Models For Joint Ctc-attention End-to-end Speech Recognition (2020)8.09
- Distil-dccrn: A Small-footprint DCCRN Leveraging Feature-based Knowledge Distillation In Speech Enhancement (2024)2.26
- Integrated Multi-level Knowledge Distillation For Enhanced Speaker Verification (2024)0.00
- On The Compression Of Shallow Non-causal ASR Models Using Knowledge Distillation And Tied-and-reduced Decoder For Low-latency On-device Speech Recognition (2023)0.00