Label-synchronous Neural Transducer For E2E Simultaneous Speech Translation
2024 Β· Keqi Deng, Philip C. Woodland
Abstract
While the neural transducer is popular for online speech recognition, simultaneous speech translation (SST) requires both streaming and re-ordering capabilities. This paper presents the LS-Transducer-SST, a label-synchronous neural transducer for SST, which naturally possesses these two properties. The LS-Transducer-SST dynamically decides when to emit translation tokens based on an Auto-regressive Integrate-and-Fire (AIF) mechanism. A latency-controllable AIF is also proposed, which can control the quality-latency trade-off either only during decoding, or it can be used in both decoding and training. The LS-Transducer-SST can naturally utilise monolingual text-only data via its prediction network which helps alleviate the key issue of data sparsity for E2E SST. During decoding, a chunk-based incremental joint decoding technique is designed to refine and expand the search space. Experiments on the Fisher-CallHome Spanish (Es-En) and MuST-C En-De data show that the LS-Transducer-SST giv
Authors
(none)
Tags
Stats
Related papers
- Label-synchronous Neural Transducer For Adaptable Online E2E Speech Recognition (2023)3.58
- Simuls2s-llm: Unlocking Simultaneous Inference Of Speech Llms For Speech-to-speech Translation (2025)3.58
- Realtrans: End-to-end Simultaneous Speech Translation With Convolutional Weighted-shrinking Transformer (2021)5.84
- Efficient And Adaptive Simultaneous Speech Translation With Fully Unidirectional Architecture (2025)2.26
- Large-scale Streaming End-to-end Speech Translation With Neural Transducers (2022)9.59
- LAMASSU: Streaming Language-agnostic Multilingual Speech Recognition And Translation Using Neural Transducers (2022)7.50
- Direct Simultaneous Speech-to-text Translation Assisted By Synchronized Streaming ASR (2021)6.77
- SLM-S2ST: A Multimodal Language Model For Direct Speech-to-speech Translation (2025)0.00