Leveraging Diverse Semantic-based Audio Pretrained Models For Singing Voice Conversion
2023 Β· Xueyao Zhang, Zihao Fang, Yicheng Gu, et al.
Abstract
Singing Voice Conversion (SVC) is a technique that enables any singer to perform any song. To achieve this, it is essential to obtain speaker-agnostic representations from the source audio, which poses a significant challenge. A common solution involves utilizing a semantic-based audio pretrained model as a feature extractor. However, the degree to which the extracted features can meet the SVC requirements remains an open question. This includes their capability to accurately model melody and lyrics, the speaker-independency of their underlying acoustic information, and their robustness for in-the-wild acoustic environments. In this study, we investigate the knowledge within classical semantic-based pretrained models in much detail. We discover that the knowledge of different models is diverse and can be complementary for SVC. Based on the above, we design a Singing Voice Conversion framework based on Diverse Semantic-based Feature Fusion (DSFF-SVC). Experimental results demonstrate th
Authors
(none)
Tags
Stats
Related papers
- Fastsvc: Fast Cross-domain Singing Voice Conversion With Feature-wise Linear Modulation (2020)11.08
- Ppg-based Singing Voice Conversion With Adversarial Representation Learning (2020)9.76
- LHQ-SVC: Lightweight And High Quality Singing Voice Conversion Modeling (2024)3.58
- Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-based Approach For One-shot Singing Voice Conversion (2023)7.50
- Everyone-can-sing: Zero-shot Singing Voice Synthesis And Conversion With Speech Reference (2025)0.00
- Robustsvc: Hubert-based Melody Extractor And Adversarial Learning For Robust Singing Voice Conversion (2024)3.58
- LDM-SVC: Latent Diffusion Model Based Zero-shot Any-to-any Singing Voice Conversion With Singer Guidance (2024)5.84
- Zero-shot Sing Voice Conversion: Built Upon Clustering-based Phoneme Representations (2024)0.00