Leveraging Pseudo-labeled Data To Improve Direct Speech-to-speech Translation
2022 Β· Qianqian Dong, Fengpeng Yue, Tom Ko, et al.
Abstract
Direct Speech-to-speech translation (S2ST) has drawn more and more attention recently. The task is very challenging due to data scarcity and complex speech-to-speech mapping. In this paper, we report our recent achievements in S2ST. Firstly, we build a S2ST Transformer baseline which outperforms the original Translatotron. Secondly, we utilize the external data by pseudo-labeling and obtain a new state-of-the-art result on the Fisher English-to-Spanish test set. Indeed, we exploit the pseudo data with a combination of popular techniques which are not trivial when applied to S2ST. Moreover, we evaluate our approach on both syntactically similar (Spanish-English) and distant (English-Chinese) language pairs. Our implementation is available at https://github.com/fengpeng-yue/speech-to-speech-translation.
Authors
(none)
Tags
Stats
Code
Related papers
- Leveraging Unsupervised And Weakly-supervised Data To Improve Direct Speech-to-speech Translation (2022)8.35
- Enhanced Direct Speech-to-speech Translation Using Self-supervised Pre-training And Data Augmentation (2022)10.85
- Joint Pre-training With Speech And Bilingual Text For Direct Speech To Speech Translation (2022)7.81
- Joint Speech Transcription And Translation: Pseudo-labeling With Out-of-distribution Data (2022)0.00
- Improving Speech Emotion Recognition In Under-resourced Languages Via Speech-to-speech Translation With Bootstrapping Data Selection (2024)7.81
- Textless Speech-to-speech Translation On Real Data (2021)13.65
- Direct Speech-to-speech Translation With Discrete Units (2021)13.97
- Leveraging Weakly Supervised Data To Improve End-to-end Speech-to-text Translation (2018)13.05