Fluent And Low-latency Simultaneous Speech-to-speech Translation With Self-adaptive Training
2020 Β· Renjie Zheng, Mingbo Ma, Baigong Zheng, et al.
Abstract
Simultaneous speech-to-speech translation is widely useful but extremely challenging, since it needs to generate target-language speech concurrently with the source-language speech, with only a few seconds delay. In addition, it needs to continuously translate a stream of sentences, but all recent solutions merely focus on the single-sentence scenario. As a result, current approaches accumulate latencies progressively when the speaker talks faster, and introduce unnatural pauses when the speaker talks slower. To overcome these issues, we propose Self-Adaptive Translation (SAT) which flexibly adjusts the length of translations to accommodate different source speech rates. At similar levels of translation quality (as measured by BLEU), our method generates more fluent target speech (as measured by the naturalness metric MOS) with substantially lower latency than the baseline, in both Zh <-> En directions.
Authors
(none)
Tags
Stats
Related papers
- Learning When To Speak: Latency And Quality Trade-offs For Simultaneous Speech-to-speech Translation With Offline Models (2023)0.00
- Direct Simultaneous Speech-to-text Translation Assisted By Synchronized Streaming ASR (2021)6.77
- Efficient And Adaptive Simultaneous Speech Translation With Fully Unidirectional Architecture (2025)2.26
- Long-form End-to-end Speech Translation Via Latent Alignment Segmentation (2023)0.00
- Low-latency Neural Speech Translation (2018)9.03
- Towards Achieving Human Parity On End-to-end Simultaneous Speech Translation Via LLM Agent (2024)0.00
- Textless Speech-to-speech Translation With Limited Parallel Data (2023)3.58
- Simuls2s-llm: Unlocking Simultaneous Inference Of Speech Llms For Speech-to-speech Translation (2025)3.58