Gendistiller: Distilling Pre-trained Language Models Based On Generative Models
2023 Β· Yingying Gao, Shilei Zhang, Zihao Cui, et al.
Abstract
Self-supervised pre-trained models such as HuBERT and WavLM leverage unlabeled speech data for representation learning and offer significantly improve for numerous downstream tasks. Despite the success of these methods, their large memory and strong computational requirements hinder their application on resource restricted devices. Therefore, this paper introduces GenDistiller, a novel knowledge distillation framework to distill hidden representations from teacher network based on generative language model. The generative structure enables the proposed model to generate the target teacher hidden layers autoregressively, considering the interactions between hidden layers without instroducing additional inputs. A two-dimensional attention mechanism is implemented to ensure the causality of hidden layers, while preserving bidirectional attention in the time dimension. Experiments reveal the advantage of the generative distiller over the baseline system that predicts the hidden layers of t
Authors
(none)
Tags
Stats
Related papers
- Gendistiller: Distilling Pre-trained Language Models Based On An Autoregressive Generative Model (2024)2.26
- Distilhubert: Speech Representation Learning By Layer-wise Distillation Of Hidden-unit BERT (2021)15.06
- An Efficient End-to-end Approach To Noise Invariant Speech Features Via Multi-task Learning (2024)0.00
- Knowledge Distillation From Language Model To Acoustic Model: A Hierarchical Multi-task Learning Approach (2021)3.58
- Text-guided Hubert: Self-supervised Speech Pre-training Via Generative Adversarial Networks (2024)4.52
- Adaptive Knowledge Distillation Between Text And Speech Pre-trained Models (2023)4.52
- Cross-modal Distillation For Widely Differing Modalities (2025)0.00
- Application Of Knowledge Distillation To Multi-task Speech Representation Learning (2022)2.26