Dealing With Training And Test Segmentation Mismatch: FBK@IWSLT2021
2021 Β· Sara Papi, Marco Gaido, Matteo Negri, et al.
Abstract
This paper describes FBK's system submission to the IWSLT 2021 Offline Speech Translation task. We participated with a direct model, which is a Transformer-based architecture trained to translate English speech audio data into German texts. The training pipeline is characterized by knowledge distillation and a two-step fine-tuning procedure. Both knowledge distillation and the first fine-tuning step are carried out on manually segmented real and synthetic data, the latter being generated with an MT system trained on the available corpora. Differently, the second fine-tuning step is carried out on a random segmentation of the MuST-C v2 En-De dataset. Its main goal is to reduce the performance drops occurring when a speech translation model trained on manually segmented data (i.e. an ideal, sentence-like segmentation) is evaluated on automatically segmented audio (i.e. actual, more realistic testing conditions). For the same purpose, a custom hybrid segmentation procedure that accounts f
Authors
(none)
Tags
Stats
Related papers
- Efficient Yet Competitive Speech Translation: FBK@IWSLT2022 (2022)4.52
- Direct Models For Simultaneous Translation And Automatic Subtitling: FBK@IWSLT2023 (2023)2.26
- End-to-end Speech Translation With Pre-trained Models And Adapters: UPC At IWSLT 2021 (2021)7.81
- Don't Discard Fixed-window Audio Segmentation In Speech-to-text Translation (2022)0.00
- Beyond Voice Activity Detection: Hybrid Audio Segmentation For Direct Speech Translation (2021)0.00
- Long-form Speech Translation Through Segmentation With Finite-state Decoding Constraints On Large Language Models (2023)0.00
- Apptek's Submission To The IWSLT 2022 Isometric Spoken Language Translation Task (2022)0.00
- The Niutrans End-to-end Speech Translation System For IWSLT 2021 Offline Task (2021)0.00