Cyclegan-vc3: Examining And Improving Cyclegan-vcs For Mel-spectrogram Conversion
2020 Β· Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, et al.
Abstract
Non-parallel voice conversion (VC) is a technique for learning mappings between source and target speeches without using a parallel corpus. Recently, cycle-consistent adversarial network (CycleGAN)-VC and CycleGAN-VC2 have shown promising results regarding this problem and have been widely used as benchmark methods. However, owing to the ambiguity of the effectiveness of CycleGAN-VC/VC2 for mel-spectrogram conversion, they are typically used for mel-cepstrum conversion even when comparative methods employ mel-spectrogram as a conversion target. To address this, we examined the applicability of CycleGAN-VC/VC2 to mel-spectrogram conversion. Through initial experiments, we discovered that their direct applications compromised the time-frequency structure that should be preserved during conversion. To remedy this, we propose CycleGAN-VC3, an improvement of CycleGAN-VC2 that incorporates time-frequency adaptive normalization (TFAN). Using TFAN, we can adjust the scale and bias of the conve
Authors
(none)
Tags
Stats
Related papers
- Maskcyclegan-vc: Learning Non-parallel Voice Conversion With Filling In Frames (2021)0.00
- Cyclegan-vc2: Improved Cyclegan-based Non-parallel Voice Conversion (2019)17.45
- Non-parallel Voice Conversion With Cyclic Variational Autoencoder (2019)12.10
- Melgan-vc: Voice Conversion And Audio Style Transfer On Arbitrarily Long Samples Using Spectrograms (2019)0.00
- High-quality Nonparallel Voice Conversion Based On Cycle-consistent Adversarial Network (2018)0.00
- Parallel-data-free Voice Conversion Using Cycle-consistent Adversarial Networks (2017)0.00
- Cyclegan Voice Conversion Of Spectral Envelopes Using Adversarial Weights (2019)6.77
- Subband-based Generative Adversarial Network For Non-parallel Many-to-many Voice Conversion (2022)0.00