Prompt-singer: Controllable Singing-voice-synthesis With Natural Language Prompt
2024 Β· Yongqi Wang, Ruofan Hu, Rongjie Huang, et al.
Abstract
Recent singing-voice-synthesis (SVS) methods have achieved remarkable audio quality and naturalness, yet they lack the capability to control the style attributes of the synthesized singing explicitly. We propose Prompt-Singer, the first SVS method that enables attribute controlling on singer gender, vocal range and volume with natural language. We adopt a model architecture based on a decoder-only transformer with a multi-scale hierarchy, and design a range-melody decoupled pitch representation that enables text-conditioned vocal range control while keeping melodic accuracy. Furthermore, we explore various experiment settings, including different types of text representations, text encoder fine-tuning, and introducing speech data to alleviate data scarcity, aiming to facilitate further research. Experiments show that our model achieves favorable controlling ability and audio quality. Audio samples are available at http://prompt-singer.github.io .
Authors
(none)
Tags
Stats
Related papers
- Prompttts++: Controlling Speaker Identity In Prompt-based Text-to-speech Using Natural Language Descriptions (2023)9.23
- Comelsinger: Discrete Token-based Zero-shot Singing Synthesis With Structured Melody Control And Guidance (2025)0.00
- Promptstyle: Controllable Style Transfer For Text-to-speech With Natural Language Descriptions (2023)10.85
- Techsinger: Technique Controllable Multilingual Singing Voice Synthesis Via Flow Matching (2025)7.81
- Cssinger: End-to-end Chunkwise Streaming Singing Voice Synthesis System Based On Conditional Variational Autoencoder (2024)0.00
- Tcsinger: Zero-shot Singing Voice Synthesis With Style Transfer And Multi-level Style Control (2024)7.16
- Everyone-can-sing: Zero-shot Singing Voice Synthesis And Conversion With Speech Reference (2025)0.00
- Makesinger: A Semi-supervised Training Method For Data-efficient Singing Voice Synthesis Via Classifier-free Diffusion Guidance (2024)4.52