End-to-end Speech-to-text Translation: A Survey
2023 Β· Nivedita Sethiya, Chandresh Kumar Maurya
Abstract
Speech-to-text translation pertains to the task of converting speech signals in a language to text in another language. It finds its application in various domains, such as hands-free communication, dictation, video lecture transcription, and translation, to name a few. Automatic Speech Recognition (ASR), as well as Machine Translation(MT) models, play crucial roles in traditional ST translation, enabling the conversion of spoken language in its original form to written text and facilitating seamless cross-lingual communication. ASR recognizes spoken words, while MT translates the transcribed text into the target language. Such disintegrated models suffer from cascaded error propagation and high resource and training costs. As a result, researchers have been exploring end-to-end (E2E) models for ST translation. However, to our knowledge, there is no comprehensive review of existing works on E2E ST. The present survey, therefore, discusses the work in this direction. Our attempt has bee
Authors
(none)
Tags
Stats
Related papers
- Multilingual End-to-end Speech Translation (2019)0.00
- Synchronous Speech Recognition And Speech-to-text Translation With Interactive Decoding (2019)10.48
- Speech Translation And The End-to-end Promise: Taking Stock Of Where We Are (2020)11.93
- Leveraging Weakly Supervised Data To Improve End-to-end Speech-to-text Translation (2018)13.05
- Speech Is More Than Words: Do Speech-to-text Translation Systems Leverage Prosody? (2024)2.26
- When End-to-end Is Overkill: Rethinking Cascaded Speech-to-text Translation (2025)0.00
- Bridging The Modality Gap For Speech-to-text Translation (2020)0.00
- End-to-end Speech Translation With Knowledge Distillation (2019)0.00