Enabling Zero-shot Multilingual Spoken Language Translation With Language-specific Encoders And Decoders
2020 · Carlos Escolano, Marta R. Costa-Jussà, José A. R. Fonollosa, et al.
Abstract
Current end-to-end approaches to Spoken Language Translation (SLT) rely on limited training resources, especially for multilingual settings. On the other hand, Multilingual Neural Machine Translation (MultiNMT) approaches rely on higher-quality and more massive data sets. Our proposed method extends a MultiNMT architecture based on language-specific encoders-decoders to the task of Multilingual SLT (MultiSLT). Our method entirely eliminates the dependency from MultiSLT data and it is able to translate while training only on ASR and MultiNMT data. Our experiments on four different languages show that coupling the speech encoder to the MultiNMT architecture produces similar quality translations compared to a bilingual baseline (\(\pm 0.2\) BLEU) while effectively allowing for zero-shot MultiSLT. Additionally, we propose using an Adapter module for coupling the speech inputs. This Adapter module produces consistent improvements up to +6 BLEU points on the proposed architecture and +1 BL
Authors
(none)
Tags
Stats
Related papers
- One-to-many Multilingual End-to-end Speech Translation (2019)9.23
- Tackling Data Scarcity In Speech Translation Using Zero-shot Multilingual Machine Translation Techniques (2022)2.26
- Zero-resource Speech Translation And Recognition With Llms (2024)3.58
- Low-latency Neural Speech Translation (2018)9.03
- Multilingual End-to-end Speech Translation (2019)0.00
- Zero-shot Multi-speaker Text-to-speech With State-of-the-art Neural Speaker Embeddings (2019)15.67
- Leveraging Multilingual Self-supervised Pretrained Models For Sequence-to-sequence End-to-end Spoken Language Understanding (2023)0.00
- A Weakly-supervised Streaming Multilingual Speech Model With Truly Zero-shot Capability (2022)5.84