Exploring Continuous Integrate-and-fire For Adaptive Simultaneous Speech Translation
2022 Β· Chih-Chiang Chang, Hung-Yi Lee
Abstract
Simultaneous speech translation (SimulST) is a challenging task aiming to translate streaming speech before the complete input is observed. A SimulST system generally includes two components: the pre-decision that aggregates the speech information and the policy that decides to read or write. While recent works had proposed various strategies to improve the pre-decision, they mainly adopt the fixed wait-k policy, leaving the adaptive policies rarely explored. This paper proposes to model the adaptive policy by adapting the Continuous Integrate-and-Fire (CIF). Compared with monotonic multihead attention (MMA), our method has the advantage of simpler computation, superior quality at low latency, and better generalization to long utterances. We conduct experiments on the MuST-C V2 dataset and show the effectiveness of our approach.
Authors
(none)
Tags
Stats
Related papers
- Efficient And Adaptive Simultaneous Speech Translation With Fully Unidirectional Architecture (2025)2.26
- Streamspeech: Simultaneous Speech-to-speech Translation With Multi-task Learning (2024)7.81
- Visualization: The Missing Factor In Simultaneous Speech Translation (2021)0.00
- Contrastive Feedback Mechanism For Simultaneous Speech Translation (2024)2.26
- Simulsense: Sense-driven Interpreting For Efficient Simultaneous Speech Translation (2025)0.00
- Simuls2s-llm: Unlocking Simultaneous Inference Of Speech Llms For Speech-to-speech Translation (2025)3.58
- Tagged End-to-end Simultaneous Speech Translation Training Using Simultaneous Interpretation Data (2023)0.00
- End-to-end Simultaneous Speech Translation With Differentiable Segmentation (2023)7.16