Blockwise Streaming Transformer For Spoken Language Understanding And Simultaneous Speech Translation
2022 Β· Keqi Deng, Shinji Watanabe, Jiatong Shi, et al.
Abstract
Although Transformers have gained success in several speech processing tasks like spoken language understanding (SLU) and speech translation (ST), achieving online processing while keeping competitive performance is still essential for real-world interaction. In this paper, we take the first step on streaming SLU and simultaneous ST using a blockwise streaming Transformer, which is based on contextual block processing and blockwise synchronous beam search. Furthermore, we design an automatic speech recognition (ASR)-based intermediate loss regularization for the streaming SLU task to improve the classification performance further. As for the simultaneous ST task, we propose a cross-lingual encoding method, which employs a CTC branch optimized with target language translations. In addition, the CTC translation output is also used to refine the search space with CTC prefix score, achieving joint CTC/attention simultaneous translation for the first time. Experiments for SLU are conducted
Authors
(none)
Tags
Stats
Related papers
- Implicit Memory Transformer For Computationally Efficient Simultaneous Speech Translation (2023)0.00
- Conv-transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-end Speech Recognition (2020)11.08
- Transformer ASR With Contextual Block Processing (2019)0.00
- Developing Real-time Streaming Transformer Transducer For Speech Recognition On Large-scale Dataset (2020)0.00
- Streamspeech: Simultaneous Speech-to-speech Translation With Multi-task Learning (2024)7.81
- Streaming Transformer Transducer Based Speech Recognition Using Non-causal Convolution (2021)8.82
- Streaming Simultaneous Speech Translation With Augmented Memory Transformer (2020)6.77
- Transformer Transducer: One Model Unifying Streaming And Non-streaming Speech Recognition (2020)0.00