Vits-based Singing Voice Conversion System With DSPGAN Post-processing For SVCC2023
2023 Β· Yiquan Zhou, Meng Chen, Yi Lei, et al.
Abstract
This paper presents the T02 team's system for the Singing Voice Conversion Challenge 2023 (SVCC2023). Our system entails a VITS-based SVC model, incorporating three modules: a feature extractor, a voice converter, and a post-processor. Specifically, the feature extractor provides F0 contours and extracts speaker-independent linguistic content from the input singing voice by leveraging a HuBERT model. The voice converter is employed to recompose the speaker timbre, F0, and linguistic content to generate the waveform of the target speaker. Besides, to further improve the audio quality, a fine-tuned DSPGAN vocoder is introduced to re-synthesise the waveform. Given the limited target speaker data, we utilize a two-stage training strategy to adapt the base model to the target speaker. During model adaptation, several tricks, such as data augmentation and joint training with auxiliary singer data, are involved. Official challenge results show that our system achieves superior performance, es
Authors
(none)
Tags
Stats
Related papers
- Vits-based Singing Voice Conversion Leveraging Whisper And Multi-scale F0 Modeling (2023)0.00
- A Comparative Study Of Voice Conversion Models With Large-scale Speech And Singing Data: The T13 Systems For The Singing Voice Conversion Challenge 2023 (2023)6.77
- SYKI-SVC: Advancing Singing Voice Conversion With Post-processing Innovations And An Open-source Professional Testset (2025)4.52
- Ppg-based Singing Voice Conversion With Adversarial Representation Learning (2020)9.76
- The Voice Conversion Challenge 2018: Promoting Development Of Parallel And Nonparallel Methods (2018)17.06
- Mandarin Singing Voice Synthesis With Denoising Diffusion Probabilistic Wasserstein GAN (2022)6.34
- Baseline System Of Voice Conversion Challenge 2020 With Cyclic Variational Autoencoder And Parallel Wavegan (2020)4.24
- Phonetic Posteriorgrams Based Many-to-many Singing Voice Conversion Via Adversarial Training (2020)0.00