Unsupervised Singing Voice Conversion
2019 Β· Eliya Nachmani, Lior Wolf
Abstract
We present a deep learning method for singing voice conversion. The proposed network is not conditioned on the text or on the notes, and it directly converts the audio of one singer to the voice of another. Training is performed without any form of supervision: no lyrics or any kind of phonetic features, no notes, and no matching samples between singers. The proposed network employs a single CNN encoder for all singers, a single WaveNet decoder, and a classifier that enforces the latent representation to be singer-agnostic. Each singer is represented by one embedding vector, which the decoder is conditioned on. In order to deal with relatively small datasets, we propose a new data augmentation scheme, as well as new training losses and protocols that are based on backtranslation. Our evaluation presents evidence that the conversion produces natural signing voices that are highly recognizable as the target singer.
Authors
(none)
Tags
Stats
Related papers
- Singing Voice Conversion With Disentangled Representations Of Singer And Vocal Technique Using Variational Autoencoders (2019)10.97
- Pitchnet: Unsupervised Singing Voice Conversion With Pitch Adversarial Network (2019)10.97
- Singing Voice Conversion With Non-parallel Data (2019)9.59
- Leveraging Symmetrical Convolutional Transformer Networks For Speech To Singing Voice Style Transfer (2022)5.84
- Ppg-based Singing Voice Conversion With Adversarial Representation Learning (2020)9.76
- Real-time And Accurate: Zero-shot High-fidelity Singing Voice Conversion With Multi-condition Flow Synthesis (2024)0.00
- Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-based Approach For One-shot Singing Voice Conversion (2023)7.50
- Robust One-shot Singing Voice Conversion (2022)0.00