Implicit Memory Transformer For Computationally Efficient Simultaneous Speech Translation
2023 Β· Matthew Raffel, Lizhong Chen
Abstract
Simultaneous speech translation is an essential communication task difficult for humans whereby a translation is generated concurrently with oncoming speech inputs. For such a streaming task, transformers using block processing to break an input sequence into segments have achieved state-of-the-art performance at a reduced cost. Current methods to allow information to propagate across segments, including left context and memory banks, have faltered as they are both insufficient representations and unnecessarily expensive to compute. In this paper, we propose an Implicit Memory Transformer that implicitly retains memory through a new left context method, removing the need to explicitly represent memory with memory banks. We generate the left context from the attention output of the previous segment and include it in the keys and values of the current segment's attention calculation. Experiments on the MuST-C dataset show that the Implicit Memory Transformer provides a substantial speedu
Authors
(none)
Tags
Stats
Related papers
- Streaming Simultaneous Speech Translation With Augmented Memory Transformer (2020)6.77
- Blockwise Streaming Transformer For Spoken Language Understanding And Simultaneous Speech Translation (2022)4.52
- Shiftable Context: Addressing Training-inference Context Mismatch In Simultaneous Speech Translation (2023)0.00
- Efficient And Adaptive Simultaneous Speech Translation With Fully Unidirectional Architecture (2025)2.26
- Efficient Speech Translation With Dynamic Latent Perceivers (2022)0.00
- Speechformer: Reducing Information Loss In Direct Speech Translation (2021)7.16
- Streaming Transformer-based Acoustic Models Using Self-attention With Augmented Memory (2020)0.00
- Exploring Continuous Integrate-and-fire For Adaptive Simultaneous Speech Translation (2022)4.52