Singing Voice Conversion With Non-parallel Data
2019 Β· Xin Chen, Wei Chu, Jinxi Guo, et al.
Abstract
Singing voice conversion is a task to convert a song sang by a source singer to the voice of a target singer. In this paper, we propose using a parallel data free, many-to-one voice conversion technique on singing voices. A phonetic posterior feature is first generated by decoding singing voices through a robust Automatic Speech Recognition Engine (ASR). Then, a trained Recurrent Neural Network (RNN) with a Deep Bidirectional Long Short Term Memory (DBLSTM) structure is used to model the mapping from person-independent content to the acoustic features of the target person. F0 and aperiodic are obtained through the original singing voice, and used with acoustic features to reconstruct the target singing voice through a vocoder. In the obtained singing voice, the targeted and sourced singers sound similar. To our knowledge, this is the first study that uses non parallel data to train a singing voice conversion system. Subjective evaluations demonstrate that the proposed method effectivel
Authors
(none)
Tags
Stats
Related papers
- Unsupervised Singing Voice Conversion (2019)11.19
- Singing Voice Conversion With Disentangled Representations Of Singer And Vocal Technique Using Variational Autoencoders (2019)10.97
- Phonetic Posteriorgrams Based Many-to-many Singing Voice Conversion Via Adversarial Training (2020)0.00
- Real-time And Accurate: Zero-shot High-fidelity Singing Voice Conversion With Multi-condition Flow Synthesis (2024)0.00
- Ppg-based Singing Voice Conversion With Adversarial Representation Learning (2020)9.76
- Learning In Your Voice: Non-parallel Voice Conversion Based On Speaker Consistency Loss (2020)0.00
- Towards High-fidelity Singing Voice Conversion With Acoustic Reference And Contrastive Predictive Coding (2021)7.81
- Recognition-synthesis Based Non-parallel Voice Conversion With Adversarial Learning (2020)0.00