Kit's Low-resource Speech Translation Systems For IWSLT2025: System Enhancement With Synthetic Data And Model Regularization
2025 Β· Zhaolin Li, Yining Liu, Danni Liu, et al.
Abstract
This paper presents KIT's submissions to the IWSLT 2025 low-resource track. We develop both cascaded systems, consisting of Automatic Speech Recognition (ASR) and Machine Translation (MT) models, and end-to-end (E2E) Speech Translation (ST) systems for three language pairs: Bemba, North Levantine Arabic, and Tunisian Arabic into English. Building upon pre-trained models, we fine-tune our systems with different strategies to utilize resources efficiently. This study further explores system enhancement with synthetic data and model regularization. Specifically, we investigate MT-augmented ST by generating translations from ASR data using MT models. For North Levantine, which lacks parallel ST training data, a system trained solely on synthetic data slightly surpasses the cascaded system trained on real data. We also explore augmentation using text-to-speech models by generating synthetic speech from MT data, demonstrating the benefits of synthetic data in improving both ASR and ST perfor
Authors
(none)
Tags
Stats
Related papers
- Blending Llms Into Cascaded Speech Translation: Kit's Offline Speech Translation System For IWSLT 2024 (2024)0.00
- ON-TRAC Consortium Systems For The IWSLT 2022 Dialect And Low-resource Speech Translation Tasks (2022)3.58
- Strategies For Improving Low Resource Speech To Text Translation Relying On Pre-trained ASR Models (2023)5.24
- Leveraging Weakly Supervised Data To Improve End-to-end Speech-to-text Translation (2018)13.05
- Efficient Yet Competitive Speech Translation: FBK@IWSLT2022 (2022)4.52
- Leveraging Synthetic Audio Data For End-to-end Low-resource Speech Translation (2024)0.00
- Frustratingly Easy Data Augmentation For Low-resource ASR (2025)0.00
- Leveraging Unsupervised And Weakly-supervised Data To Improve Direct Speech-to-speech Translation (2022)8.35