A Comparative Analysis Of Latent Regressor Losses For Singing Voice Conversion
2023 Β· Brendan O'Connor, Simon Dixon
Abstract
Previous research has shown that established techniques for spoken voice conversion (VC) do not perform as well when applied to singing voice conversion (SVC). We propose an alternative loss component in a loss function that is otherwise well-established among VC tasks, which has been shown to improve our model's SVC performance. We first trained a singer identity embedding (SIE) network on mel-spectrograms of singer recordings to produce singer-specific variance encodings using contrastive learning. We subsequently trained a well-known autoencoder framework (AutoVC) conditioned on these SIEs, and measured differences in SVC performance when using different latent regressor loss components. We found that using this loss w.r.t. SIEs leads to better performance than w.r.t. bottleneck embeddings, where converted audio is more natural and specific towards target singers. The inclusion of this loss component has the advantage of explicitly forcing the network to reconstruct with timbral sim
Authors
(none)
Tags
Stats
Related papers
- LDM-SVC: Latent Diffusion Model Based Zero-shot Any-to-any Singing Voice Conversion With Singer Guidance (2024)5.84
- Singing Voice Conversion With Disentangled Representations Of Singer And Vocal Technique Using Variational Autoencoders (2019)10.97
- Leveraging Diverse Semantic-based Audio Pretrained Models For Singing Voice Conversion (2023)0.00
- Ppg-based Singing Voice Conversion With Adversarial Representation Learning (2020)9.76
- Robustsvc: Hubert-based Melody Extractor And Adversarial Learning For Robust Singing Voice Conversion (2024)3.58
- LHQ-SVC: Lightweight And High Quality Singing Voice Conversion Modeling (2024)3.58
- LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion With Inference Acceleration Via Latent Consistency Distillation (2024)3.58
- Robust One-shot Singing Voice Conversion (2022)0.00