Enhancing The Vocal Range Of Single-speaker Singing Voice Synthesis With Melody-unsupervised Pre-training
2023 Β· Shaohuan Zhou, Xu Li, Zhiyong Wu, et al.
Abstract
The single-speaker singing voice synthesis (SVS) usually underperforms at pitch values that are out of the singer's vocal range or associated with limited training samples. Based on our previous work, this work proposes a melody-unsupervised multi-speaker pre-training method conducted on a multi-singer dataset to enhance the vocal range of the single-speaker, while not degrading the timbre similarity. This pre-training method can be deployed to a large-scale multi-singer dataset, which only contains audio-and-lyrics pairs without phonemic timing information and pitch annotation. Specifically, in the pre-training step, we design a phoneme predictor to produce the frame-level phoneme probability vectors as the phonemic timing information and a speaker encoder to model the timbre variations of different singers, and directly estimate the frame-level f0 values from the audio to provide the pitch information. These pre-trained model parameters are delivered into the fine-tuning step as prio
Authors
(none)
Tags
Stats
Related papers
- Visinger2+: End-to-end Singing Voice Synthesis Augmented By Self-supervised Learning Representation (2024)4.52
- Self-supervised Singing Voice Pre-training Towards Speech-to-singing Conversion (2024)0.00
- Singaug: Data Augmentation For Singing Voice Synthesis With Cycle-consistent Training Strategy (2022)7.16
- A Melody-unsupervision Model For Singing Voice Synthesis (2021)5.84
- Semi-supervised Learning For Singing Synthesis Timbre (2020)3.58
- Adversarially Trained Multi-singer Sequence-to-sequence Singing Synthesizer (2020)7.81
- Makesinger: A Semi-supervised Training Method For Data-efficient Singing Voice Synthesis Via Classifier-free Diffusion Guidance (2024)4.52
- A Preliminary Investigation On Flexible Singing Voice Synthesis Through Decomposed Framework With Inferrable Features (2024)0.00