Source And Target Bidirectional Knowledge Distillation For End-to-end Speech Translation
2021 Β· Hirofumi Inaguma, Tatsuya Kawahara, Shinji Watanabe
Abstract
A conventional approach to improving the performance of end-to-end speech translation (E2E-ST) models is to leverage the source transcription via pre-training and joint training with automatic speech recognition (ASR) and neural machine translation (NMT) tasks. However, since the input modalities are different, it is difficult to leverage source language text successfully. In this work, we focus on sequence-level knowledge distillation (SeqKD) from external text-based NMT models. To leverage the full potential of the source language information, we propose backward SeqKD, SeqKD from a target-to-source backward NMT model. To this end, we train a bilingual E2E-ST model to predict paraphrased transcriptions as an auxiliary task with a single decoder. The paraphrases are generated from the translations in bitext via back-translation. We further propose bidirectional SeqKD in which SeqKD from both forward and backward NMT models is combined. Experimental evaluations on both autoregressive a
Authors
(none)
Tags
Stats
Related papers
- End-to-end Speech Translation With Knowledge Distillation (2019)0.00
- Decouple Non-parametric Knowledge Distillation For End-to-end Speech Translation (2023)0.00
- Improving End-to-end Speech Translation By Imitation-based Knowledge Distillation With Synthetic Transcripts (2023)0.60
- Improving Cross-lingual Transfer Learning For End-to-end Speech Recognition With Speech Translation (2020)9.92
- Knowledge Distillation For Neural Transducer-based Target-speaker ASR: Exploiting Parallel Mixture/single-talker Speech Data (2023)4.52
- Multilingual End-to-end Speech Translation (2019)0.00
- Data Efficient Direct Speech-to-text Translation With Modality Agnostic Meta-learning (2019)0.00
- Joint Training And Decoding For Multilingual End-to-end Simultaneous Speech Translation (2025)0.95