BLSP: Bootstrapping Language-speech Pre-training Via Behavior Alignment Of Continuation Writing
2023 Β· Chen Wang, Minpeng Liao, Zhongqiang Huang, et al.
Abstract
The emergence of large language models (LLMs) has sparked significant interest in extending their remarkable language capabilities to speech. However, modality alignment between speech and text still remains an open problem. Current solutions can be categorized into two strategies. One is a cascaded approach where outputs (tokens or states) of a separately trained speech recognition system are used as inputs for LLMs, which limits their potential in modeling alignment between speech and text. The other is an end-to-end approach that relies on speech instruction data, which is very difficult to collect in large quantities. In this paper, we address these issues and propose the BLSP approach that Bootstraps Language-Speech Pre-training via behavior alignment of continuation writing. We achieve this by learning a lightweight modality adapter between a frozen speech encoder and an LLM, ensuring that the LLM exhibits the same generation behavior regardless of the modality of input: a speech
Authors
(none)
Tags
Stats
Related papers
- BLSP-KD: Bootstrapping Language-speech Pre-training Via Knowledge Distillation (2024)0.00
- From Alignment To Advancement: Bootstrapping Audio-language Alignment With Synthetic Data (2025)2.26
- Paralinguistics-enhanced Large Language Modeling Of Spoken Dialogue (2023)0.00
- Align-slm: Textless Spoken Language Models With Reinforcement Learning From AI Feedback (2024)7.16
- A Comprehensive Solution To Connect Speech Encoder And Large Language Model For ASR (2024)0.00
- Alignformer: Modality Matching Can Achieve Better Zero-shot Instruction-following Speech-llm (2024)6.77
- Integrating Pre-trained Speech And Language Models For End-to-end Speech Recognition (2023)0.00
- Closing The Gap Between Text And Speech Understanding In Llms (2025)0.00