Singing Voice Data Scaling-up: An Introduction To Ace-opencpop And Ace-kising
2024 Β· Jiatong Shi, Yueqian Lin, Xinyi Bai, et al.
Abstract
In singing voice synthesis (SVS), generating singing voices from musical scores faces challenges due to limited data availability. This study proposes a unique strategy to address the data scarcity in SVS. We employ an existing singing voice synthesizer for data augmentation, complemented by detailed manual tuning, an approach not previously explored in data curation, to reduce instances of unnatural voice synthesis. This innovative method has led to the creation of two expansive singing voice datasets, ACE-Opencpop and ACE-KiSing, which are instrumental for large-scale, multi-singer voice synthesis. Through thorough experimentation, we establish that these datasets not only serve as new benchmarks for SVS but also enhance SVS performance on other singing voice datasets when used as supplementary resources. The corpora, pre-trained models, and their related training recipes are publicly available at ESPnet-Muskits (https://github.com/espnet/espnet)
Authors
(none)
Tags
Stats
Code
Related papers
- Singaug: Data Augmentation For Singing Voice Synthesis With Cycle-consistent Training Strategy (2022)7.16
- Visinger2+: End-to-end Singing Voice Synthesis Augmented By Self-supervised Learning Representation (2024)4.52
- Everyone-can-sing: Zero-shot Singing Voice Synthesis And Conversion With Speech Reference (2025)0.00
- S2cap: A Benchmark And A Baseline For Singing Style Captioning (2024)0.00
- Multi-singer: Fast Multi-singer Singing Voice Vocoder With A Large-scale Corpus (2021)13.28
- Muskits-espnet: A Comprehensive Toolkit For Singing Voice Synthesis In New Paradigm (2024)12.50
- Singmos: An Extensive Open-source Singing Voice Dataset For MOS Prediction (2024)0.00
- A Comparative Study Of Voice Conversion Models With Large-scale Speech And Singing Data: The T13 Systems For The Singing Voice Conversion Challenge 2023 (2023)6.77