Data Efficient Direct Speech-to-text Translation With Modality Agnostic Meta-learning
2019 Β· Sathish Indurthi, Houjeung Han, Nikhil Kumar Lakumarapu, et al.
Abstract
End-to-end Speech Translation (ST) models have several advantages such as lower latency, smaller model size, and less error compounding over conventional pipelines that combine Automatic Speech Recognition (ASR) and text Machine Translation (MT) models. However, collecting large amounts of parallel data for ST task is more difficult compared to the ASR and MT tasks. Previous studies have proposed the use of transfer learning approaches to overcome the above difficulty. These approaches benefit from weakly supervised training data, such as ASR speech-to-transcript or MT text-to-text translation pairs. However, the parameters in these models are updated independently of each task, which may lead to sub-optimal solutions. In this work, we adopt a meta-learning algorithm to train a modality agnostic multi-task model that transfers knowledge from source tasks=ASR+MT to target task=ST where ST task severely lacks data. In the meta-learning phase, the parameters of the model are exposed to va
Authors
(none)
Tags
Stats
Related papers
- Bridging The Modality Gap For Speech-to-text Translation (2020)0.00
- Leveraging Weakly Supervised Data To Improve End-to-end Speech-to-text Translation (2018)13.05
- One-to-many Multilingual End-to-end Speech Translation (2019)9.23
- Multilingual End-to-end Speech Translation (2019)0.00
- Rethinking And Improving Multi-task Learning For End-to-end Speech Translation (2023)5.84
- Improving Cross-lingual Transfer Learning For End-to-end Speech Recognition With Speech Translation (2020)9.92
- Synchronous Speech Recognition And Speech-to-text Translation With Interactive Decoding (2019)10.48
- Joint Training And Decoding For Multilingual End-to-end Simultaneous Speech Translation (2025)0.95