Bisinger: Bilingual Singing Voice Synthesis
2023 Β· Huali Zhou, Yueqian Lin, Yao Shi, et al.
Abstract
Although Singing Voice Synthesis (SVS) has made great strides with Text-to-Speech (TTS) techniques, multilingual singing voice modeling remains relatively unexplored. This paper presents BiSinger, a bilingual pop SVS system for English and Chinese Mandarin. Current systems require separate models per language and cannot accurately represent both Chinese and English, hindering code-switch SVS. To address this gap, we design a shared representation between Chinese and English singing voices, achieved by using the CMU dictionary with mapping rules. We fuse monolingual singing datasets with open-source singing voice conversion techniques to generate bilingual singing voices while also exploring the potential use of bilingual speech data. Experiments affirm that our language-independent representation and incorporation of related datasets enable a single model with enhanced performance in English and code-switch SVS while maintaining Chinese song performance. Audio samples are available at
Authors
(none)
Tags
Stats
Related papers
- Bytesing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-decoder Acoustic Models And Wavernn Vocoders (2020)11.49
- Sifisinger: A High-fidelity End-to-end Singing Voice Synthesizer Based On Source-filter Model (2024)4.52
- Multi-singer: Fast Multi-singer Singing Voice Vocoder With A Large-scale Corpus (2021)13.28
- Polysinger: Singing-voice To Singing-voice Translation From English To Japanese (2024)0.00
- Everyone-can-sing: Zero-shot Singing Voice Synthesis And Conversion With Speech Reference (2025)0.00
- Towards Improving The Expressiveness Of Singing Voice Synthesis With BERT Derived Semantic Information (2023)0.00
- Visinger2+: End-to-end Singing Voice Synthesis Augmented By Self-supervised Learning Representation (2024)4.52
- Techsinger: Technique Controllable Multilingual Singing Voice Synthesis Via Flow Matching (2025)7.81