Enhancing Code-switched Text-to-speech Synthesis Capability In Large Language Models With Only Monolingual Corpora
2024 Β· Jing Xu, Daxin Tan, Jiaqi Wang, et al.
Abstract
While Large Language Models (LLMs) have shown potential in speech generation and recognition, their applications are mainly confined to monolingual scenarios, with limited explorations in code-switched (CS) contexts. In this paper, we propose a Code-Switched Large Language Model (CS-LLM) to enhance the code-switched text-to-speech synthesis (CS TTS) capability in LLMs with only monolingual corpora. Specifically, we begin by enhancing the multilingual speech processing ability of LLMs through multilingual speech recognition and synthesis tasks. Then, we develop an effective code-switched (CS) data construction strategy that splits and concatenates words from different monolingual speech corpora to equip LLMs with improved CS TTS ability. Experiments show that our approach outperforms baselines in CS TTS in terms of naturalness, speaker consistency and similarity even with limited data. Additionally, the constructed CS data further improves multilingual speech synthesis and recognition.
Authors
(none)
Tags
Stats
Related papers
- Generative Error Correction For Code-switching Speech Recognition Using Large Language Models (2023)0.00
- Exploring Retraining-free Speech Recognition For Intra-sentential Code-switching (2021)5.84
- Boosting Large Language Model For Speech Synthesis: An Empirical Study (2023)6.77
- Improving Robustness Of Llm-based Speech Synthesis By Learning Monotonic Alignment (2024)0.00
- Making Llms Better Many-to-many Speech-to-text Translators With Curriculum Learning (2024)7.31
- Investigating Decoder-only Large Language Models For Speech-to-text Translation (2024)0.00
- MCAT: Scaling Many-to-many Speech-to-text Translation With Mllms To 70 Languages (2025)2.41
- Unified Model For Code-switching Speech Recognition And Language Identification Based On A Concatenated Tokenizer (2023)8.09