Adatrans: Adapting With Boundary-based Shrinking For End-to-end Speech Translation
2022 Β· Xingshan Zeng, Liangyou Li, Qun Liu
Abstract
To alleviate the data scarcity problem in End-to-end speech translation (ST), pre-training on data for speech recognition and machine translation is considered as an important technique. However, the modality gap between speech and text prevents the ST model from efficiently inheriting knowledge from the pre-trained models. In this work, we propose AdaTranS for end-to-end ST. It adapts the speech features with a new shrinking mechanism to mitigate the length mismatch between speech and text features by predicting word boundaries. Experiments on the MUST-C dataset demonstrate that AdaTranS achieves better performance than the other shrinking-based methods, with higher inference speed and lower memory usage. Further experiments also show that AdaTranS can be equipped with additional alignment losses to further improve performance.
Authors
(none)
Tags
Stats
Related papers
- Bridging The Modality Gap For Speech-to-text Translation (2020)0.00
- Realtrans: End-to-end Simultaneous Speech Translation With Convolutional Weighted-shrinking Transformer (2021)5.84
- Tackling Data Scarcity In Speech Translation Using Zero-shot Multilingual Machine Translation Techniques (2022)2.26
- Leveraging Weakly Supervised Data To Improve End-to-end Speech-to-text Translation (2018)13.05
- Improving Cross-lingual Transfer Learning For End-to-end Speech Recognition With Speech Translation (2020)9.92
- Redapt: An Adaptor For Wav2vec 2 Encoding \\ Faster And Smaller Speech Translation Without Quality Compromise (2022)0.00
- Multilingual End-to-end Speech Translation (2019)0.00
- Harnessing Indirect Training Data For End-to-end Automatic Speech Translation: Tricks Of The Trade (2019)0.00