Speech Translation With Foundation Models And Optimal Transport: UPC At IWSLT23
2023 · Ioannis Tsiamas, Gerard I. Gállego, José A. R. Fonollosa, et al.
Abstract
This paper describes the submission of the UPC Machine Translation group to the IWSLT 2023 Offline Speech Translation task. Our Speech Translation systems utilize foundation models for speech (wav2vec 2.0) and text (mBART50). We incorporate a Siamese pretraining step of the speech and text encoders with CTC and Optimal Transport, to adapt the speech representations to the space of the text model, thus maximizing transfer learning from MT. After this pretraining, we fine-tune our system end-to-end on ST, with Cross Entropy and Knowledge Distillation. Apart from the available ST corpora, we create synthetic data with SegAugment to better adapt our models to the custom segmentations of the IWSLT test sets. Our best single model obtains 31.2 BLEU points on MuST-C tst-COMMON, 29.8 points on IWLST.tst2020 and 33.4 points on the newly released IWSLT.ACLdev2023.
Authors
(none)
Tags
Stats
Related papers
- End-to-end Speech Translation With Pre-trained Models And Adapters: UPC At IWSLT 2021 (2021)7.81
- Efficient Yet Competitive Speech Translation: FBK@IWSLT2022 (2022)4.52
- Blending Llms Into Cascaded Speech Translation: Kit's Offline Speech Translation System For IWSLT 2024 (2024)0.00
- Multilingual Speech Translation With Unified Transformer: Huawei Noah's Ark Lab At IWSLT 2021 (2021)0.00
- Kit's Low-resource Speech Translation Systems For IWSLT2025: System Enhancement With Synthetic Data And Model Regularization (2025)0.00
- Direct Models For Simultaneous Translation And Automatic Subtitling: FBK@IWSLT2023 (2023)2.26
- Multilingual End-to-end Speech Translation (2019)0.00
- The Niutrans End-to-end Speech Translation System For IWSLT 2021 Offline Task (2021)0.00