Shiftable Context: Addressing Training-inference Context Mismatch In Simultaneous Speech Translation
2023 Β· Matthew Raffel, Drew Penney, Lizhong Chen
Abstract
Transformer models using segment-based processing have been an effective architecture for simultaneous speech translation. However, such models create a context mismatch between training and inference environments, hindering potential translation accuracy. We solve this issue by proposing Shiftable Context, a simple yet effective scheme to ensure that consistent segment and context sizes are maintained throughout training and inference, even with the presence of partially filled segments due to the streaming nature of simultaneous translation. Shiftable Context is also broadly applicable to segment-based transformers for streaming tasks. Our experiments on the English-German, English-French, and English-Spanish language pairs from the MUST-C dataset demonstrate that when applied to the Augmented Memory Transformer, a state-of-the-art model for simultaneous speech translation, the proposed scheme achieves an average increase of 2.09, 1.83, and 1.95 BLEU scores across each wait-k value f
Authors
(none)
Tags
Stats
Related papers
- Implicit Memory Transformer For Computationally Efficient Simultaneous Speech Translation (2023)0.00
- Streaming Simultaneous Speech Translation With Augmented Memory Transformer (2020)6.77
- Transformer Transducer: One Model Unifying Streaming And Non-streaming Speech Recognition (2020)0.00
- Towards Effective And Compact Contextual Representation For Conformer Transducer Speech Recognition Systems (2023)7.16
- Blockwise Streaming Transformer For Spoken Language Understanding And Simultaneous Speech Translation (2022)4.52
- Enhancing End-to-end Conversational Speech Translation Through Target Language Context Utilization (2023)3.58
- Transformers With Convolutional Context For ASR (2019)0.00
- Simultaneous Translation For Unsegmented Input: A Sliding Window Approach (2022)0.00