Vocoder-free Non-parallel Conversion Of Whispered Speech With Masked Cycle-consistent Generative Adversarial Networks
2023 Β· Dominik Wagner, Ilja Baumann, Tobias Bocklet
Abstract
Cycle-consistent generative adversarial networks have been widely used in non-parallel voice conversion (VC). Their ability to learn mappings between source and target features without relying on parallel training data eliminates the need for temporal alignments. However, most methods decouple the conversion of acoustic features from synthesizing the audio signal by using separate models for conversion and waveform synthesis. This work unifies conversion and synthesis into a single model, thereby eliminating the need for a separate vocoder. By leveraging cycle-consistent training and a self-supervised auxiliary training task, our model is able to efficiently generate converted high-quality raw audio waveforms. Subjective listening tests showed that our unified approach achieved improvements of up to 6.7% relative to the baseline in whispered VC. Mean opinion score predictions also yielded stable results in conventional VC (between 0.5% and 2.4% relative improvement).
Authors
(none)
Tags
Stats
Related papers
- Parallel-data-free Voice Conversion Using Cycle-consistent Adversarial Networks (2017)0.00
- High-quality Nonparallel Voice Conversion Based On Cycle-consistent Adversarial Network (2018)0.00
- CVC: Contrastive Learning For Non-parallel Voice Conversion (2020)7.50
- Cyclegan-vc2: Improved Cyclegan-based Non-parallel Voice Conversion (2019)17.45
- Many-to-many Voice Conversion Using Conditional Cycle-consistent Adversarial Networks (2020)10.85
- Baseline System Of Voice Conversion Challenge 2020 With Cyclic Variational Autoencoder And Parallel Wavegan (2020)4.24
- Non-parallel Voice Conversion With Cyclic Variational Autoencoder (2019)12.10
- Scyclone: High-quality And Parallel-data-free Voice Conversion Using Spectrogram And Cycle-consistent Adversarial Networks (2020)0.00