Efficient And Adaptive Simultaneous Speech Translation With Fully Unidirectional Architecture
2025 Β· Biao Fu, Donglei Yu, Minpeng Liao, et al.
Abstract
Simultaneous speech translation (SimulST) produces translations incrementally while processing partial speech input. Although large language models (LLMs) have showcased strong capabilities in offline translation tasks, applying them to SimulST poses notable challenges. Existing LLM-based SimulST approaches either incur significant computational overhead due to repeated encoding of bidirectional speech encoder, or they depend on a fixed read/write policy, limiting the efficiency and performance. In this work, we introduce Efficient and Adaptive Simultaneous Speech Translation (EASiST) with fully unidirectional architecture, including both speech encoder and LLM. EASiST includes a multi-latency data curation strategy to generate semantically aligned SimulST training samples and redefines SimulST as an interleaved generation task with explicit read/write tokens. To facilitate adaptive inference, we incorporate a lightweight policy head that dynamically predicts read/write actions. Additi
Authors
(none)
Tags
Stats
Related papers
- Simuls2s-llm: Unlocking Simultaneous Inference Of Speech Llms For Speech-to-speech Translation (2025)3.58
- Tagged End-to-end Simultaneous Speech Translation Training Using Simultaneous Interpretation Data (2023)0.00
- Exploring Continuous Integrate-and-fire For Adaptive Simultaneous Speech Translation (2022)4.52
- Towards Achieving Human Parity On End-to-end Simultaneous Speech Translation Via LLM Agent (2024)0.00
- Simulsense: Sense-driven Interpreting For Efficient Simultaneous Speech Translation (2025)0.00
- Streamspeech: Simultaneous Speech-to-speech Translation With Multi-task Learning (2024)7.81
- Direct Simultaneous Speech-to-text Translation Assisted By Synchronized Streaming ASR (2021)6.77
- Visualization: The Missing Factor In Simultaneous Speech Translation (2021)0.00