Transpeech: Speech-to-speech Translation With Bilateral Perturbation
2022 Β· Rongjie Huang, Jinglin Liu, Huadai Liu, et al.
Abstract
Direct speech-to-speech translation (S2ST) with discrete units leverages recent progress in speech representation learning. Specifically, a sequence of discrete representations derived in a self-supervised manner are predicted from the model and passed to a vocoder for speech reconstruction, while still facing the following challenges: 1) Acoustic multimodality: the discrete units derived from speech with same content could be indeterministic due to the acoustic property (e.g., rhythm, pitch, and energy), which causes deterioration of translation accuracy; 2) high latency: current S2ST systems utilize autoregressive models which predict each unit conditioned on the sequence previously generated, failing to take full advantage of parallelism. In this work, we propose TranSpeech, a speech-to-speech translation model with bilateral perturbation. To alleviate the acoustic multimodal problem, we propose bilateral perturbation (BiP), which consists of the style normalization and information
Authors
(none)
Tags
Stats
Related papers
- Speech-to-speech Translation With Discrete-unit-based Style Transfer (2023)0.00
- Direct Speech-to-speech Translation With Discrete Units (2021)13.97
- Textless Direct Speech-to-speech Translation With Discrete Speech Representation (2022)9.76
- Joint Pre-training With Speech And Bilingual Text For Direct Speech To Speech Translation (2022)7.81
- Preserving Speaker Information In Direct Speech-to-speech Translation With Non-autoregressive Generation And Pretraining (2024)0.00
- Enhanced Direct Speech-to-speech Translation Using Self-supervised Pre-training And Data Augmentation (2022)10.85
- Leveraging Unsupervised And Weakly-supervised Data To Improve Direct Speech-to-speech Translation (2022)8.35
- Unity: Two-pass Direct Speech-to-speech Translation With Discrete Units (2022)9.59