Stylesinger: Style Transfer For Out-of-domain Singing Voice Synthesis
2023 Β· Yu Zhang, Rongjie Huang, Ruiqi Li, et al.
Abstract
Style transfer for out-of-domain (OOD) singing voice synthesis (SVS) focuses on generating high-quality singing voices with unseen styles (such as timbre, emotion, pronunciation, and articulation skills) derived from reference singing voice samples. However, the endeavor to model the intricate nuances of singing voice styles is an arduous task, as singing voices possess a remarkable degree of expressiveness. Moreover, existing SVS methods encounter a decline in the quality of synthesized singing voices in OOD scenarios, as they rest upon the assumption that the target vocal attributes are discernible during the training phase. To overcome these challenges, we propose StyleSinger, the first singing voice synthesis model for zero-shot style transfer of out-of-domain reference singing voice samples. StyleSinger incorporates two critical approaches for enhanced effectiveness: 1) the Residual Style Adaptor (RSA) which employs a residual quantization module to capture diverse style character
Authors
(none)
Tags
Stats
Related papers
- Tcsinger: Zero-shot Singing Voice Synthesis With Style Transfer And Multi-level Style Control (2024)7.16
- Everyone-can-sing: Zero-shot Singing Voice Synthesis And Conversion With Speech Reference (2025)0.00
- Improving Data Augmentation-based Cross-speaker Style Transfer For TTS With Singing Voice, Style Filtering, And F0 Matching (2024)0.00
- Generspeech: Towards Style Transfer For Generalizable Out-of-domain Text-to-speech (2022)5.24
- Visinger2+: End-to-end Singing Voice Synthesis Augmented By Self-supervised Learning Representation (2024)4.52
- Diffsinger: Singing Voice Synthesis Via Shallow Diffusion Mechanism (2021)23.76
- Leveraging Symmetrical Convolutional Transformer Networks For Speech To Singing Voice Style Transfer (2022)5.84
- Sifisinger: A High-fidelity End-to-end Singing Voice Synthesizer Based On Source-filter Model (2024)4.52