Task Arithmetic For Language Expansion In Speech Translation
2024 Β· Yao-Fei Cheng, Hayato Futami, Yosuke Kashiwagi, et al.
Abstract
Recent progress in large language models (LLMs) has gained interest in speech-text multimodal foundation models, achieving strong performance on instruction-tuned speech translation (ST). However, expanding language pairs is costly due to re-training on combined new and previous datasets. To address this, we aim to build a one-to-many ST system from existing one-to-one ST systems using task arithmetic without re-training. Direct application of task arithmetic in ST leads to language confusion; therefore, we introduce an augmented task arithmetic method incorporating a language control model to ensure correct target language generation. Our experiments on MuST-C and CoVoST-2 show BLEU score improvements of up to 4.66 and 4.92, with COMET gains of 8.87 and 11.83. In addition, we demonstrate our framework can extend to language pairs lacking paired ST training data or pre-trained ST models by synthesizing ST models based on existing machine translation (MT) and ST models via task analogie
Authors
(none)
Tags
Stats
Related papers
- Zero-resource Speech Translation And Recognition With Llms (2024)3.58
- Making Llms Better Many-to-many Speech-to-text Translators With Curriculum Learning (2024)7.31
- MCAT: Scaling Many-to-many Speech-to-text Translation With Mllms To 70 Languages (2025)2.41
- Rethinking And Improving Multi-task Learning For End-to-end Speech Translation (2023)5.84
- Lae-st-moe: Boosted Language-aware Encoder Using Speech Translation Auxiliary Task For E2E Code-switching ASR (2023)6.34
- Hearing To Translate: The Effectiveness Of Speech Modality Integration Into Llms (2026)0.00
- SLM-S2ST: A Multimodal Language Model For Direct Speech-to-speech Translation (2025)0.00
- TTA: Transcribe, Translate And Alignment For Cross-lingual Speech Representation (2025)0.00