Knowledge Distillation From Language Model To Acoustic Model: A Hierarchical Multi-task Learning Approach
2021 Β· Mun-Hak Lee, Joon-Hyuk Chang
Abstract
The remarkable performance of the pre-trained language model (LM) using self-supervised learning has led to a major paradigm shift in the study of natural language processing. In line with these changes, leveraging the performance of speech recognition systems with massive deep learning-based LMs is a major topic of speech recognition research. Among the various methods of applying LMs to speech recognition systems, in this paper, we focus on a cross-modal knowledge distillation method that transfers knowledge between two types of deep neural networks with different modalities. We propose an acoustic model structure with multiple auxiliary output layers for cross-modal distillation and demonstrate that the proposed method effectively compensates for the shortcomings of the existing label-interpolation-based distillation method. In addition, we extend the proposed method to a hierarchical distillation method using LMs trained in different units (senones, monophones, and subwords) and re
Authors
(none)
Tags
Stats
Related papers
- Application Of Knowledge Distillation To Multi-task Speech Representation Learning (2022)2.26
- Adaptive Knowledge Distillation Between Text And Speech Pre-trained Models (2023)4.52
- Audio-visual Representation Learning Via Knowledge Distillation From Speech Foundation Models (2025)7.81
- SKILL: Similarity-aware Knowledge Distillation For Speech Self-supervised Learning (2024)3.58
- Distilling Knowledge From Ensembles Of Acoustic Models For Joint Ctc-attention End-to-end Speech Recognition (2020)8.09
- An Efficient End-to-end Approach To Noise Invariant Speech Features Via Multi-task Learning (2024)0.00
- BLSP-KD: Bootstrapping Language-speech Pre-training Via Knowledge Distillation (2024)0.00
- Cross-modal Distillation For Widely Differing Modalities (2025)0.00