Streamspeech: Simultaneous Speech-to-speech Translation With Multi-task Learning
2024 Β· Shaolei Zhang, Qingkai Fang, Shoutao Guo, et al.
Abstract
Simultaneous speech-to-speech translation (Simul-S2ST, a.k.a streaming speech translation) outputs target speech while receiving streaming speech inputs, which is critical for real-time communication. Beyond accomplishing translation between speech, Simul-S2ST requires a policy to control the model to generate corresponding target speech at the opportune moment within speech inputs, thereby posing a double challenge of translation and policy. In this paper, we propose StreamSpeech, a direct Simul-S2ST model that jointly learns translation and simultaneous policy in a unified framework of multi-task learning. Adhering to a multi-task learning approach, StreamSpeech can perform offline and simultaneous speech recognition, speech translation and speech synthesis via an "All-in-One" seamless model. Experiments on CVSS benchmark demonstrate that StreamSpeech achieves state-of-the-art performance in both offline S2ST and Simul-S2ST tasks. Besides, StreamSpeech is able to present high-quality
Authors
(none)
Tags
Stats
Related papers
- Simuls2s-llm: Unlocking Simultaneous Inference Of Speech Llms For Speech-to-speech Translation (2025)3.58
- Direct Simultaneous Speech-to-text Translation Assisted By Synchronized Streaming ASR (2021)6.77
- Streamatt: Direct Streaming Speech-to-text Translation With Attention-based Audio History Selection (2024)4.52
- Efficient And Adaptive Simultaneous Speech Translation With Fully Unidirectional Architecture (2025)2.26
- SLM-S2ST: A Multimodal Language Model For Direct Speech-to-speech Translation (2025)0.00
- Exploring Continuous Integrate-and-fire For Adaptive Simultaneous Speech Translation (2022)4.52
- Learning When To Speak: Latency And Quality Trade-offs For Simultaneous Speech-to-speech Translation With Offline Models (2023)0.00
- Tagged End-to-end Simultaneous Speech Translation Training Using Simultaneous Interpretation Data (2023)0.00