Visinger2+: End-to-end Singing Voice Synthesis Augmented By Self-supervised Learning Representation
2024 Β· Yifeng Yu, Jiatong Shi, Yuning Wu, et al.
Abstract
Singing Voice Synthesis (SVS) has witnessed significant advancements with the advent of deep learning techniques. However, a significant challenge in SVS is the scarcity of labeled singing voice data, which limits the effectiveness of supervised learning methods. In response to this challenge, this paper introduces a novel approach to enhance the quality of SVS by leveraging unlabeled data from pre-trained self-supervised learning models. Building upon the existing VISinger2 framework, this study integrates additional spectral feature information into the system to enhance its performance. The integration aims to harness the rich acoustic features from the pre-trained models, thereby enriching the synthesis and yielding a more natural and expressive singing voice. Experimental results in various corpora demonstrate the efficacy of this approach in improving the overall quality of synthesized singing voices in both objective and subjective metrics.
Authors
(none)
Tags
Stats
Related papers
- Visinger: Variational Inference With Adversarial Learning For End-to-end Singing Voice Synthesis (2021)12.99
- Towards Improving The Expressiveness Of Singing Voice Synthesis With BERT Derived Semantic Information (2023)0.00
- Singaug: Data Augmentation For Singing Voice Synthesis With Cycle-consistent Training Strategy (2022)7.16
- Cssinger: End-to-end Chunkwise Streaming Singing Voice Synthesis System Based On Conditional Variational Autoencoder (2024)0.00
- Sifisinger: A High-fidelity End-to-end Singing Voice Synthesizer Based On Source-filter Model (2024)4.52
- Makesinger: A Semi-supervised Training Method For Data-efficient Singing Voice Synthesis Via Classifier-free Diffusion Guidance (2024)4.52
- Self-supervised Singing Voice Pre-training Towards Speech-to-singing Conversion (2024)0.00
- Visinger 2: High-fidelity End-to-end Singing Voice Synthesis Enhanced By Digital Signal Processing Synthesizer (2022)0.00