A Comparative Study Of Voice Conversion Models With Large-scale Speech And Singing Data: The T13 Systems For The Singing Voice Conversion Challenge 2023
2023 Β· Ryuichi Yamamoto, Reo Yoneyama, Lester Phillip Violeta, et al.
Abstract
This paper presents our systems (denoted as T13) for the singing voice conversion challenge (SVCC) 2023. For both in-domain and cross-domain English singing voice conversion (SVC) tasks (Task 1 and Task 2), we adopt a recognition-synthesis approach with self-supervised learning-based representation. To achieve data-efficient SVC with a limited amount of target singer/speaker's data (150 to 160 utterances for SVCC 2023), we first train a diffusion-based any-to-any voice conversion model using publicly available large-scale 750 hours of speech and singing data. Then, we finetune the model for each target singer/speaker of Task 1 and Task 2. Large-scale listening tests conducted by SVCC 2023 show that our T13 system achieves competitive naturalness and speaker similarity for the harder cross-domain SVC (Task 2), which implies the generalization ability of our proposed method. Our objective evaluation results show that using large datasets is particularly beneficial for cross-domain SVC.
Authors
(none)
Tags
Stats
Related papers
- Vits-based Singing Voice Conversion System With DSPGAN Post-processing For SVCC2023 (2023)5.84
- Vits-based Singing Voice Conversion Leveraging Whisper And Multi-scale F0 Modeling (2023)0.00
- The Voice Conversion Challenge 2018: Promoting Development Of Parallel And Nonparallel Methods (2018)17.06
- Leveraging Diverse Semantic-based Audio Pretrained Models For Singing Voice Conversion (2023)0.00
- A Comparative Study Of Self-supervised Speech Representation Based Voice Conversion (2022)9.76
- Voice Conversion Challenge 2020: Intra-lingual Semi-parallel And Cross-lingual Voice Conversion (2020)12.74
- Everyone-can-sing: Zero-shot Singing Voice Synthesis And Conversion With Speech Reference (2025)0.00
- SYKI-SVC: Advancing Singing Voice Conversion With Post-processing Innovations And An Open-source Professional Testset (2025)4.52