Learning In Your Voice: Non-parallel Voice Conversion Based On Speaker Consistency Loss
2020 Β· Yoohwan Kwon, Soo-Whan Chung, Hee-Soo Heo, et al.
Abstract
In this paper, we propose a novel voice conversion strategy to resolve the mismatch between the training and conversion scenarios when parallel speech corpus is unavailable for training. Based on auto-encoder and disentanglement frameworks, we design the proposed model to extract identity and content representations while reconstructing the input speech signal itself. Since we use other speaker's identity information in the training process, the training philosophy is naturally matched with the objective of voice conversion process. In addition, we effectively design the disentanglement framework to reliably preserve linguistic information and to enhance the quality of converted speech signals. The superiority of the proposed method is shown in subjective listening tests as well as objective measures.
Authors
(none)
Tags
Stats
Related papers
- Non-parallel Sequence-to-sequence Voice Conversion With Disentangled Linguistic And Speaker Representations (2019)14.02
- Recognition-synthesis Based Non-parallel Voice Conversion With Adversarial Learning (2020)0.00
- Singing Voice Conversion With Disentangled Representations Of Singer And Vocal Technique Using Variational Autoencoders (2019)10.97
- Multi-target Voice Conversion Without Parallel Data By Adversarially Learning Disentangled Audio Representations (2018)13.60
- Adversarially Trained Autoencoders For Parallel-data-free Voice Conversion (2019)6.34
- Learning Disentangled Speech Representations With Contrastive Learning And Time-invariant Retrieval (2024)5.84
- Investigation Of Using Disentangled And Interpretable Representations For One-shot Cross-lingual Voice Conversion (2018)6.77
- Using Joint Training Speaker Encoder With Consistency Loss To Achieve Cross-lingual Voice Conversion And Expressive Voice Conversion (2023)0.00