Strategies For Improving Low Resource Speech To Text Translation Relying On Pre-trained ASR Models
2023 Β· Santosh Kesiraju, Marek Sarvas, Tomas Pavlicek, et al.
Abstract
This paper presents techniques and findings for improving the performance of low-resource speech to text translation (ST). We conducted experiments on both simulated and real-low resource setups, on language pairs English - Portuguese, and Tamasheq - French respectively. Using the encoder-decoder framework for ST, our results show that a multilingual automatic speech recognition system acts as a good initialization under low-resource scenarios. Furthermore, using the CTC as an additional objective for translation during training and decoding helps to reorder the internal representations and improves the final translation. Through our experiments, we try to identify various factors (initializations, objectives, and hyper-parameters) that contribute the most for improvements in low-resource setups. With only 300 hours of pre-training data, our model achieved 7.3 BLEU score on Tamasheq - French data, outperforming prior published works from IWSLT 2022 by 1.6 points.
Authors
(none)
Tags
Stats
Related papers
- Pretraining By Backtranslation For End-to-end ASR In Low-resource Settings (2018)0.00
- Improving Cross-lingual Transfer Learning For End-to-end Speech Recognition With Speech Translation (2020)9.92
- Analyzing ASR Pretraining For Low-resource Speech-to-text Translation (2019)10.07
- Leveraging Translations For Speech Transcription In Low-resource Settings (2018)6.77
- Zero-resource Speech Translation And Recognition With Llms (2024)3.58
- Kit's Low-resource Speech Translation Systems For IWSLT2025: System Enhancement With Synthetic Data And Model Regularization (2025)0.00
- Adaptive Activation Network For Low Resource Multilingual Speech Recognition (2022)0.00
- Sequence-based Multi-lingual Low Resource Speech Recognition (2018)12.40