Abstract

Nowadays, training end-to-end neural models for spoken language translation (SLT) still has to confront with extreme data scarcity conditions. The existing SLT parallel corpora are indeed orders of magnitude smaller than those available for the closely related tasks of automatic speech recognition (ASR) and machine translation (MT), which usually comprise tens of millions of instances. To cope with data paucity, in this paper we explore the effectiveness of transfer learning in end-to-end SLT by presenting a multilingual approach to the task. Multilingual solutions are widely studied in MT and usually rely on ``\textit\{target forcing\}'', in which multilingual parallel data are combined to train a single model by prepending to the input sequences a language token that specifies the target language. However, when tested in speech translation, our experiments show that MT-like \textit\{target forcing\}, used as is, is not effective in discriminating among the target languages. Thus, we

Authors

(none)

Tags

  • Speech Translation
  • Speech Recognition
  • Text-to-Speech

Stats

  • citations16
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score9.23
  • arxiv keydigangi2019one

Related papers