Chain-of-thought Prompting For Speech Translation
2024 Β· Ke Hu, Zhehuai Chen, Chao-Han Huck Yang, et al.
Abstract
Large language models (LLMs) have demonstrated remarkable advancements in language understanding and generation. Building on the success of text-based LLMs, recent research has adapted these models to use speech embeddings for prompting, resulting in Speech-LLM models that exhibit strong performance in automatic speech recognition (ASR) and automatic speech translation (AST). In this work, we propose a novel approach to leverage ASR transcripts as prompts for AST in a Speech-LLM built on an encoder-decoder text LLM. The Speech-LLM model consists of a speech encoder and an encoder-decoder structure Megatron-T5. By first decoding speech to generate ASR transcripts and subsequently using these transcripts along with encoded speech for prompting, we guide the speech translation in a two-step process like chain-of-thought (CoT) prompting. Low-rank adaptation (LoRA) is used for the T5 LLM for model adaptation and shows superior performance to full model fine-tuning. Experimental results show
Authors
(none)
Tags
Stats
Related papers
- Revisiting Direct Speech-to-text Translation With Speech Llms: Better Scaling Than Cot Prompting? (2025)0.00
- RALL-E: Robust Codec Language Modeling With Chain-of-thought Prompting For Text-to-speech Synthesis (2024)0.00
- Internalizing ASR With Implicit Chain Of Thought For Efficient Speech-to-speech Conversational LLM (2024)0.00
- Investigating Decoder-only Large Language Models For Speech-to-text Translation (2024)0.00
- Zero-resource Speech Translation And Recognition With Llms (2024)3.58
- Effective Text Adaptation For Llm-based ASR Through Soft Prompt Fine-tuning (2024)5.84
- Prompting Large Language Models With Audio For General-purpose Speech Summarization (2024)6.34
- Harnessing The Zero-shot Power Of Instruction-tuned Large Language Model In End-to-end Speech Recognition (2023)0.00