Knn-svc: Robust Zero-shot Singing Voice Conversion With Additive Synthesis And Concatenation Smoothness Optimization
2025 Β· Keren Shao, Ke Chen, Matthew Baas, et al.
Abstract
Robustness is critical in zero-shot singing voice conversion (SVC). This paper introduces two novel methods to strengthen the robustness of the kNN-VC framework for SVC. First, kNN-VC's core representation, WavLM, lacks harmonic emphasis, resulting in dull sounds and ringing artifacts. To address this, we leverage the bijection between WavLM, pitch contours, and spectrograms to perform additive synthesis, integrating the resulting waveform into the model to mitigate these issues. Second, kNN-VC overlooks concatenative smoothness, a key perceptual factor in SVC. To enhance smoothness, we propose a new distance metric that filters out unsuitable kNN candidates and optimize the summing weights of the candidates during inference. Although our techniques are built on the kNN-VC framework for implementation convenience, they are broadly applicable to general concatenative neural synthesis models. Experimental results validate the effectiveness of these modifications in achieving robust SVC.
Authors
(none)
Tags
Stats
Related papers
- Robustsvc: Hubert-based Melody Extractor And Adversarial Learning For Robust Singing Voice Conversion (2024)3.58
- Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-based Approach For One-shot Singing Voice Conversion (2023)7.50
- Robust One-shot Singing Voice Conversion (2022)0.00
- Zero-shot Sing Voice Conversion: Built Upon Clustering-based Phoneme Representations (2024)0.00
- Everyone-can-sing: Zero-shot Singing Voice Synthesis And Conversion With Speech Reference (2025)0.00
- LDM-SVC: Latent Diffusion Model Based Zero-shot Any-to-any Singing Voice Conversion With Singer Guidance (2024)5.84
- Takin-vc: Expressive Zero-shot Voice Conversion Via Adaptive Hybrid Content Encoding And Enhanced Timbre Modeling (2024)0.00
- LHQ-SVC: Lightweight And High Quality Singing Voice Conversion Modeling (2024)3.58