Translatotron 3: Speech To Speech Translation With Monolingual Data
2023 Β· Eliya Nachmani, Alon Levkovitch, Yifan Ding, et al.
Abstract
This paper presents Translatotron 3, a novel approach to unsupervised direct speech-to-speech translation from monolingual speech-text datasets by combining masked autoencoder, unsupervised embedding mapping, and back-translation. Experimental results in speech-to-speech translation tasks between Spanish and English show that Translatotron 3 outperforms a baseline cascade system, reporting \(18.14\) BLEU points improvement on the synthesized Unpaired-Conversational dataset. In contrast to supervised approaches that necessitate real paired data, or specialized modeling to replicate para-/non-linguistic information such as pauses, speaking rates, and speaker identity, Translatotron 3 showcases its capability to retain it. Audio samples can be found at http://google-research.github.io/lingvo-lab/translatotron3
Authors
(none)
Tags
Stats
Related papers
- Translatotron 2: High-quality Direct Speech-to-speech Translation With Voice Preservation (2021)0.00
- Leveraging Unsupervised And Weakly-supervised Data To Improve Direct Speech-to-speech Translation (2022)8.35
- Textless Direct Speech-to-speech Translation With Discrete Speech Representation (2022)9.76
- Rosettaspeech: Zero-shot Speech-to-speech Translation Without Parallel Speech (2025)0.00
- Towards Unsupervised Speech-to-text Translation (2018)0.00
- Improving Cascaded Unsupervised Speech Translation With Denoising Back-translation (2023)0.00
- Enhanced Direct Speech-to-speech Translation Using Self-supervised Pre-training And Data Augmentation (2022)10.85
- Learning To Speak Fluently In A Foreign Language: Multilingual Speech Synthesis And Cross-language Voice Cloning (2019)15.03