BLSP-KD: Bootstrapping Language-speech Pre-training Via Knowledge Distillation
2024 Β· Chen Wang, Minpeng Liao, Zhongqiang Huang, et al.
Abstract
Recent end-to-end approaches have shown promise in extending large language models (LLMs) to speech inputs, but face limitations in directly assessing and optimizing alignment quality and fail to achieve fine-grained alignment due to speech-text length mismatch. We introduce BLSP-KD, a novel approach for Bootstrapping Language-Speech Pretraining via Knowledge Distillation, which addresses these limitations through two key techniques. First, it optimizes speech-text alignment by minimizing the divergence between the LLM's next-token prediction distributions for speech and text inputs using knowledge distillation. Second, it employs a continuous-integrate-andfire strategy to segment speech into tokens that correspond one-to-one with text tokens, enabling fine-grained alignment. We also introduce Partial LoRA (PLoRA), a new adaptation method supporting LLM finetuning for speech inputs under knowledge distillation. Quantitative evaluation shows that BLSP-KD outperforms previous end-to-end
Authors
(none)
Tags
Stats
Related papers
- BLSP: Bootstrapping Language-speech Pre-training Via Behavior Alignment Of Continuation Writing (2023)0.00
- Adaptive Knowledge Distillation Between Text And Speech Pre-trained Models (2023)4.52
- Two-stage Textual Knowledge Distillation For End-to-end Spoken Language Understanding (2020)9.41
- SKILL: Similarity-aware Knowledge Distillation For Speech Self-supervised Learning (2024)3.58
- Sequence-level Knowledge Distillation For Class-incremental End-to-end Spoken Language Understanding (2023)0.00
- I\(^2\)KD-SLU: An Intra-inter Knowledge Distillation Framework For Zero-shot Cross-lingual Spoken Language Understanding (2023)0.00
- Spatio-temporal Attention Mechanism And Knowledge Distillation For Lip Reading (2021)0.00
- Integrated Multi-level Knowledge Distillation For Enhanced Speaker Verification (2024)0.00