Maskcyclegan-vc: Learning Non-parallel Voice Conversion With Filling In Frames
2021 Β· Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, et al.
Abstract
Non-parallel voice conversion (VC) is a technique for training voice converters without a parallel corpus. Cycle-consistent adversarial network-based VCs (CycleGAN-VC and CycleGAN-VC2) are widely accepted as benchmark methods. However, owing to their insufficient ability to grasp time-frequency structures, their application is limited to mel-cepstrum conversion and not mel-spectrogram conversion despite recent advances in mel-spectrogram vocoders. To overcome this, CycleGAN-VC3, an improved variant of CycleGAN-VC2 that incorporates an additional module called time-frequency adaptive normalization (TFAN), has been proposed. However, an increase in the number of learned parameters is imposed. As an alternative, we propose MaskCycleGAN-VC, which is another extension of CycleGAN-VC2 and is trained using a novel auxiliary task called filling in frames (FIF). With FIF, we apply a temporal mask to the input mel-spectrogram and encourage the converter to fill in missing frames based on surroun
Authors
(none)
Tags
Stats
Related papers
- Cyclegan-vc3: Examining And Improving Cyclegan-vcs For Mel-spectrogram Conversion (2020)14.02
- Cyclegan-vc2: Improved Cyclegan-based Non-parallel Voice Conversion (2019)17.45
- Parallel-data-free Voice Conversion Using Cycle-consistent Adversarial Networks (2017)0.00
- High-quality Nonparallel Voice Conversion Based On Cycle-consistent Adversarial Network (2018)0.00
- CVC: Contrastive Learning For Non-parallel Voice Conversion (2020)7.50
- Non-parallel Voice Conversion With Cyclic Variational Autoencoder (2019)12.10
- Vocoder-free Non-parallel Conversion Of Whispered Speech With Masked Cycle-consistent Generative Adversarial Networks (2023)0.00
- Subband-based Generative Adversarial Network For Non-parallel Many-to-many Voice Conversion (2022)0.00