Improving Cross-lingual Transfer Learning For End-to-end Speech Recognition With Speech Translation
2020 Β· Changhan Wang, Juan Pino, Jiatao Gu
Abstract
Transfer learning from high-resource languages is known to be an efficient way to improve end-to-end automatic speech recognition (ASR) for low-resource languages. Pre-trained or jointly trained encoder-decoder models, however, do not share the language modeling (decoder) for the same language, which is likely to be inefficient for distant target languages. We introduce speech-to-text translation (ST) as an auxiliary task to incorporate additional knowledge of the target language and enable transferring from that target language. Specifically, we first translate high-resource ASR transcripts into a target low-resource language, with which a ST model is trained. Both ST and target ASR share the same attention-based encoder-decoder architecture and vocabulary. The former task then provides a fully pre-trained model for the latter, bringing up to 24.6% word error rate (WER) reduction to the baseline (direct transfer from high-resource ASR). We show that training ST with human translations
Authors
(none)
Tags
Stats
Related papers
- Leveraging Weakly Supervised Data To Improve End-to-end Speech-to-text Translation (2018)13.05
- Synchronous Speech Recognition And Speech-to-text Translation With Interactive Decoding (2019)10.48
- Strategies For Improving Low Resource Speech To Text Translation Relying On Pre-trained ASR Models (2023)5.24
- Multilingual End-to-end Speech Translation (2019)0.00
- End-to-end Speech Translation With Knowledge Distillation (2019)0.00
- Leveraging Unsupervised And Weakly-supervised Data To Improve Direct Speech-to-speech Translation (2022)8.35
- End-to-end Text-to-speech For Low-resource Languages By Cross-lingual Transfer Learning (2019)0.00
- One-to-many Multilingual End-to-end Speech Translation (2019)9.23