A Comparative Study On End-to-end Speech To Text Translation
2019 Β· Parnia Bahar, Tobias Bieschke, Hermann Ney
Abstract
Recent advances in deep learning show that end-to-end speech to text translation model is a promising approach to direct the speech translation field. In this work, we provide an overview of different end-to-end architectures, as well as the usage of an auxiliary connectionist temporal classification (CTC) loss for better convergence. We also investigate on pre-training variants such as initializing different components of a model using pre-trained models, and their impact on the final performance, which gives boosts up to 4% in BLEU and 5% in TER. Our experiments are performed on 270h IWSLT TED-talks En->De, and 100h LibriSpeech Audiobooks En->Fr. We also show improvements over the current end-to-end state-of-the-art systems on both tasks.
Authors
(none)
Tags
Stats
Related papers
- Leveraging Weakly Supervised Data To Improve End-to-end Speech-to-text Translation (2018)13.05
- Harnessing Indirect Training Data For End-to-end Automatic Speech Translation: Tricks Of The Trade (2019)0.00
- Inter-connection: Effective Connection Between Pre-trained Encoder And Decoder For Speech Translation (2023)3.58
- Long-form End-to-end Speech Translation Via Latent Alignment Segmentation (2023)0.00
- Speech Translation And The End-to-end Promise: Taking Stock Of Where We Are (2020)11.93
- End-to-end Evaluation For Low-latency Simultaneous Speech Translation (2023)0.00
- Leveraging Unsupervised And Weakly-supervised Data To Improve Direct Speech-to-speech Translation (2022)8.35
- An Improved Hybrid Ctc-attention Model For Speech Recognition (2018)0.00