Adaptive Knowledge Distillation Between Text And Speech Pre-trained Models
2023 Β· Jinjie Ni, Yukun Ma, Wen Wang, et al.
Abstract
Learning on a massive amount of speech corpus leads to the recent success of many self-supervised speech models. With knowledge distillation, these models may also benefit from the knowledge encoded by language models that are pre-trained on rich sources of texts. The distillation process, however, is challenging due to the modal disparity between textual and speech embedding spaces. This paper studies metric-based distillation to align the embedding space of text and speech with only a small amount of data without modifying the model structure. Since the semantic and granularity gap between text and speech has been omitted in literature, which impairs the distillation, we propose the Prior-informed Adaptive knowledge Distillation (PAD) that adaptively leverages text/speech units of variable granularity and prior distributions to achieve better global and local alignments between text and speech pre-trained models. We evaluate on three spoken language understanding benchmarks to show t
Authors
(none)
Tags
Stats
Related papers
- BLSP-KD: Bootstrapping Language-speech Pre-training Via Knowledge Distillation (2024)0.00
- Two-stage Textual Knowledge Distillation For End-to-end Spoken Language Understanding (2020)9.41
- Knowledge Distillation From Language Model To Acoustic Model: A Hierarchical Multi-task Learning Approach (2021)3.58
- Reducing The Gap Between Streaming And Non-streaming Transducer-based ASR By Adaptive Two-stage Knowledge Distillation (2023)4.52
- SKILL: Similarity-aware Knowledge Distillation For Speech Self-supervised Learning (2024)3.58
- Integrated Multi-level Knowledge Distillation For Enhanced Speaker Verification (2024)0.00
- Application Of Knowledge Distillation To Multi-task Speech Representation Learning (2022)2.26
- Deep Versus Wide: An Analysis Of Student Architectures For Task-agnostic Knowledge Distillation Of Self-supervised Speech Models (2022)9.23