Hierarchical Disentangled Representation Learning For Singing Voice Conversion
2021 Β· Naoya Takahashi, Mayank Kumar Singh, Yuki Mitsufuji
Abstract
Conventional singing voice conversion (SVC) methods often suffer from operating in high-resolution audio owing to a high dimensionality of data. In this paper, we propose a hierarchical representation learning that enables the learning of disentangled representations with multiple resolutions independently. With the learned disentangled representations, the proposed method progressively performs SVC from low to high resolutions. Experimental results show that the proposed method outperforms baselines that operate with a single resolution in terms of mean opinion score (MOS), similarity score, and pitch accuracy.
Authors
(none)
Tags
Stats
Related papers
- Singing Voice Conversion With Disentangled Representations Of Singer And Vocal Technique Using Variational Autoencoders (2019)10.97
- Leveraging Diverse Semantic-based Audio Pretrained Models For Singing Voice Conversion (2023)0.00
- Zero-shot Sing Voice Conversion: Built Upon Clustering-based Phoneme Representations (2024)0.00
- Ppg-based Singing Voice Conversion With Adversarial Representation Learning (2020)9.76
- Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-based Approach For One-shot Singing Voice Conversion (2023)7.50
- LHQ-SVC: Lightweight And High Quality Singing Voice Conversion Modeling (2024)3.58
- Robust One-shot Singing Voice Conversion (2022)0.00
- Robustsvc: Hubert-based Melody Extractor And Adversarial Learning For Robust Singing Voice Conversion (2024)3.58