MSLM-S2ST: A Multitask Speech Language Model For Textless Speech-to-speech Translation With Speaker Style Preservation
2024 Β· Yifan Peng, Ilia Kulikov, Yilin Yang, et al.
Abstract
There have been emerging research interest and advances in speech-to-speech translation (S2ST), translating utterances from one language to another. This work proposes Multitask Speech Language Model (MSLM), which is a decoder-only speech language model trained in a multitask setting. Without reliance on text training data, our model is able to support multilingual S2ST with speaker style preserved.
Authors
(none)
Tags
Stats
Related papers
- SLM-S2ST: A Multimodal Language Model For Direct Speech-to-speech Translation (2025)0.00
- Seamlessexpressivelm: Speech Language Model For Expressive Speech-to-speech Translation With Chain-of-thought (2024)0.00
- Styles2st: Zero-shot Style Transfer For Direct Speech-to-speech Translation (2023)0.00
- Speech-to-speech Translation With Discrete-unit-based Style Transfer (2023)0.00
- Simuls2s-llm: Unlocking Simultaneous Inference Of Speech Llms For Speech-to-speech Translation (2025)3.58
- Streamspeech: Simultaneous Speech-to-speech Translation With Multi-task Learning (2024)7.81
- Zero-resource Speech Translation And Recognition With Llms (2024)3.58
- Styletts 2: Towards Human-level Text-to-speech Through Style Diffusion And Adversarial Training With Large Speech Language Models (2023)8.09