Cyclegan-vc2: Improved Cyclegan-based Non-parallel Voice Conversion
2019 Β· Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, et al.
Abstract
Non-parallel voice conversion (VC) is a technique for learning the mapping from source to target speech without relying on parallel data. This is an important task, but it has been challenging due to the disadvantages of the training conditions. Recently, CycleGAN-VC has provided a breakthrough and performed comparably to a parallel VC method without relying on any extra data, modules, or time alignment procedures. However, there is still a large gap between the real target and converted speech, and bridging this gap remains a challenge. To reduce this gap, we propose CycleGAN-VC2, which is an improved version of CycleGAN-VC incorporating three new techniques: an improved objective (two-step adversarial losses), improved generator (2-1-2D CNN), and improved discriminator (PatchGAN). We evaluated our method on a non-parallel VC task and analyzed the effect of each technique in detail. An objective evaluation showed that these techniques help bring the converted feature sequence closer t
Authors
(none)
Tags
Stats
Related papers
- High-quality Nonparallel Voice Conversion Based On Cycle-consistent Adversarial Network (2018)0.00
- Parallel-data-free Voice Conversion Using Cycle-consistent Adversarial Networks (2017)0.00
- CVC: Contrastive Learning For Non-parallel Voice Conversion (2020)7.50
- Cyclegan-vc3: Examining And Improving Cyclegan-vcs For Mel-spectrogram Conversion (2020)14.02
- Maskcyclegan-vc: Learning Non-parallel Voice Conversion With Filling In Frames (2021)0.00
- Stargan-vc2: Rethinking Conditional Methods For Stargan-based Voice Conversion (2019)0.00
- Starganv2-vc: A Diverse, Unsupervised, Non-parallel Framework For Natural-sounding Voice Conversion (2021)13.70
- Vocoder-free Non-parallel Conversion Of Whispered Speech With Masked Cycle-consistent Generative Adversarial Networks (2023)0.00