Towards High-fidelity Singing Voice Conversion With Acoustic Reference And Contrastive Predictive Coding
2021 Β· Chao Wang, Zhonghao Li, Benlai Tang, et al.
Abstract
Recently, phonetic posteriorgrams (PPGs) based methods have been quite popular in non-parallel singing voice conversion systems. However, due to the lack of acoustic information in PPGs, style and naturalness of the converted singing voices are still limited. To solve these problems, in this paper, we utilize an acoustic reference encoder to implicitly model singing characteristics. We experiment with different auxiliary features, including mel spectrograms, HuBERT, and the middle hidden feature (PPG-Mid) of pretrained automatic speech recognition (ASR) model, as the input of the reference encoder, and finally find the HuBERT feature is the best choice. In addition, we use contrastive predictive coding (CPC) module to further smooth the voices by predicting future observations in latent space. Experiments show that, compared with the baseline models, our proposed model can significantly improve the naturalness of converted singing voices and the similarity with the target singer. Moreo
Authors
(none)
Tags
Stats
Related papers
- Ppg-based Singing Voice Conversion With Adversarial Representation Learning (2020)9.76
- Phonetic Posteriorgrams Based Many-to-many Singing Voice Conversion Via Adversarial Training (2020)0.00
- Singing Voice Conversion With Non-parallel Data (2019)9.59
- A Unified Model For Voice And Accent Conversion In Speech And Singing Using Self-supervised Learning And Feature Extraction (2024)0.00
- Real-time And Accurate: Zero-shot High-fidelity Singing Voice Conversion With Multi-condition Flow Synthesis (2024)0.00
- Improving Accent Conversion With Reference Encoder And End-to-end Text-to-speech (2020)0.00
- Vits-based Singing Voice Conversion Leveraging Whisper And Multi-scale F0 Modeling (2023)0.00
- AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice Conversion (2021)7.50