Training Generative Adversarial Network-based Vocoder With Limited Data Using Augmentation-conditional Discriminator
2024 Β· Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka
Abstract
A generative adversarial network (GAN)-based vocoder trained with an adversarial discriminator is commonly used for speech synthesis because of its fast, lightweight, and high-quality characteristics. However, this data-driven model requires a large amount of training data incurring high data-collection costs. This fact motivates us to train a GAN-based vocoder on limited data. A promising solution is to augment the training data to avoid overfitting. However, a standard discriminator is unconditional and insensitive to distributional changes caused by data augmentation. Thus, augmented speech (which can be extraordinary) may be considered real speech. To address this issue, we propose an augmentation-conditional discriminator (AugCondD) that receives the augmentation state as input in addition to speech, thereby assessing the input speech according to the augmentation state, without inhibiting the learning of the original non-augmented distribution. Experimental results indicate that
Authors
(none)
Tags
Stats
Related papers
- Enhancing Gan-based Vocoders With Contrastive Learning Under Data-limited Condition (2023)3.58
- Relational Data Selection For Data Augmentation Of Speaker-dependent Multi-band Melgan Vocoder (2021)0.00
- Bigvgan: A Universal Neural Vocoder With Large-scale Training (2022)6.17
- Adversarial Data Augmentation Using VAE-GAN For Disordered Speech Recognition (2022)0.00
- Avocodo: Generative Adversarial Network For Artifact-free Vocoder (2022)9.41
- Personalized Adversarial Data Augmentation For Dysarthric And Elderly Speech Recognition (2022)11.49
- Vocgan: A High-fidelity Real-time Vocoder With A Hierarchically-nested Adversarial Network (2020)12.54
- Vnet: A Gan-based Multi-tier Discriminator Network For Speech Synthesis Vocoders (2024)2.26