Simulsense: Sense-driven Interpreting For Efficient Simultaneous Speech Translation
2025 Β· Haotian Tan, Hiroki Ouchi, Sakriani Sakti
Abstract
How to make human-interpreter-like read/write decisions for simultaneous speech translation (SimulST) systems? Current state-of-the-art systems formulate SimulST as a multi-turn dialogue task, requiring specialized interleaved training data and relying on computationally expensive large language model (LLM) inference for decision-making. In this paper, we propose SimulSense, a novel framework for SimulST that mimics human interpreters by continuously reading input speech and triggering write decisions to produce translation when a new sense unit is perceived. Experiments against two state-of-the-art baseline systems demonstrate that our proposed method achieves a superior quality-latency tradeoff and substantially improved real-time efficiency, where its decision-making is up to 9.6x faster than the baselines.
Authors
(none)
Tags
Stats
Related papers
- Efficient And Adaptive Simultaneous Speech Translation With Fully Unidirectional Architecture (2025)2.26
- Visualization: The Missing Factor In Simultaneous Speech Translation (2021)0.00
- Simuls2s-llm: Unlocking Simultaneous Inference Of Speech Llms For Speech-to-speech Translation (2025)3.58
- Tagged End-to-end Simultaneous Speech Translation Training Using Simultaneous Interpretation Data (2023)0.00
- Exploring Continuous Integrate-and-fire For Adaptive Simultaneous Speech Translation (2022)4.52
- Towards Achieving Human Parity On End-to-end Simultaneous Speech Translation Via LLM Agent (2024)0.00
- Does Simultaneous Speech Translation Need Simultaneous Models? (2022)4.52
- Streamspeech: Simultaneous Speech-to-speech Translation With Multi-task Learning (2024)7.81