Fastsvc: Fast Cross-domain Singing Voice Conversion With Feature-wise Linear Modulation
2020 Β· Songxiang Liu, Yuewen Cao, Na Hu, et al.
Abstract
This paper presents FastSVC, a light-weight cross-domain singing voice conversion (SVC) system, which can achieve high conversion performance, with inference speed 4x faster than real-time on CPUs. FastSVC uses Conformer-based phoneme recognizer to extract singer-agnostic linguistic features from singing signals. A feature-wise linear modulation based generator is used to synthesize waveform directly from linguistic features, leveraging information from sine-excitation signals and loudness features. The waveform generator can be trained conveniently using a multi-resolution spectral loss and an adversarial loss. Experimental results show that the proposed FastSVC system, compared with a computationally heavy baseline system, can achieve comparable conversion performance in some scenarios and significantly better conversion performance in other scenarios. Moreover, the proposed FastSVC system achieves desirable cross-lingual singing conversion performance. The inference speed of the Fas
Authors
(none)
Tags
Stats
Related papers
- LHQ-SVC: Lightweight And High Quality Singing Voice Conversion Modeling (2024)3.58
- Leveraging Diverse Semantic-based Audio Pretrained Models For Singing Voice Conversion (2023)0.00
- LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion With Inference Acceleration Via Latent Consistency Distillation (2024)3.58
- Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-based Approach For One-shot Singing Voice Conversion (2023)7.50
- Real-time And Accurate: Zero-shot High-fidelity Singing Voice Conversion With Multi-condition Flow Synthesis (2024)0.00
- Fastvc: Fast Voice Conversion With Non-parallel Data (2020)5.24
- Robust One-shot Singing Voice Conversion (2022)0.00
- LDM-SVC: Latent Diffusion Model Based Zero-shot Any-to-any Singing Voice Conversion With Singer Guidance (2024)5.84