Scyclone: High-quality And Parallel-data-free Voice Conversion Using Spectrogram And Cycle-consistent Adversarial Networks
2020 Β· Masaya Tanaka, Takashi Nose, Aoi Kanagaki, et al.
Abstract
This paper proposes Scyclone, a high-quality voice conversion (VC) technique without parallel data training. Scyclone improves speech naturalness and speaker similarity of the converted speech by introducing CycleGAN-based spectrogram conversion with a simplified WaveRNN-based vocoder. In Scyclone, a linear spectrogram is used as the conversion features instead of vocoder parameters, which avoids quality degradation due to extraction errors in fundamental frequency and voiced/unvoiced parameters. The spectrogram of source and target speakers are modeled by modified CycleGAN networks, and the waveform is reconstructed using the simplified WaveRNN with a single Gaussian probability density function. The subjective experiments with completely unpaired training data show that Scyclone is significantly better than CycleGAN-VC2, one of the existing state-of-the-art parallel-data-free VC techniques.
Authors
(none)
Tags
Stats
Related papers
- Parallel-data-free Voice Conversion Using Cycle-consistent Adversarial Networks (2017)0.00
- High-quality Nonparallel Voice Conversion Based On Cycle-consistent Adversarial Network (2018)0.00
- Vocoder-free Non-parallel Conversion Of Whispered Speech With Masked Cycle-consistent Generative Adversarial Networks (2023)0.00
- Cyclegan-vc2: Improved Cyclegan-based Non-parallel Voice Conversion (2019)17.45
- Non-parallel Voice Conversion With Cyclic Variational Autoencoder (2019)12.10
- Multi-target Voice Conversion Without Parallel Data By Adversarially Learning Disentangled Audio Representations (2018)13.60
- CVC: Contrastive Learning For Non-parallel Voice Conversion (2020)7.50
- Many-to-many Voice Conversion Using Conditional Cycle-consistent Adversarial Networks (2020)10.85