Synchronous Speech Recognition And Speech-to-text Translation With Interactive Decoding
2019 Β· Yuchen Liu, Jiajun Zhang, Hao Xiong, et al.
Abstract
Speech-to-text translation (ST), which translates source language speech into target language text, has attracted intensive attention in recent years. Compared to the traditional pipeline system, the end-to-end ST model has potential benefits of lower latency, smaller model size, and less error propagation. However, it is notoriously difficult to implement such a model without transcriptions as intermediate. Existing works generally apply multi-task learning to improve translation quality by jointly training end-to-end ST along with automatic speech recognition (ASR). However, different tasks in this method cannot utilize information from each other, which limits the improvement. Other works propose a two-stage model where the second model can use the hidden state from the first one, but its cascade manner greatly affects the efficiency of training and inference process. In this paper, we propose a novel interactive attention mechanism which enables ASR and ST to perform synchronously
Authors
(none)
Tags
Stats
Related papers
- Direct Simultaneous Speech-to-text Translation Assisted By Synchronized Streaming ASR (2021)6.77
- Improving Cross-lingual Transfer Learning For End-to-end Speech Recognition With Speech Translation (2020)9.92
- Bridging The Modality Gap For Speech-to-text Translation (2020)0.00
- Joint Training And Decoding For Multilingual End-to-end Simultaneous Speech Translation (2025)0.95
- Efficient And Adaptive Simultaneous Speech Translation With Fully Unidirectional Architecture (2025)2.26
- Leveraging Weakly Supervised Data To Improve End-to-end Speech-to-text Translation (2018)13.05
- Multilingual End-to-end Speech Translation (2019)0.00
- Textless Direct Speech-to-speech Translation With Discrete Speech Representation (2022)9.76