VQCPC-GAN: Variable-length Adversarial Audio Synthesis Using Vector-quantized Contrastive Predictive Coding
2021 Β· Javier Nistal, Cyran Aouameur, Stefan Lattner, et al.
Abstract
Influenced by the field of Computer Vision, Generative Adversarial Networks (GANs) are often adopted for the audio domain using fixed-size two-dimensional spectrogram representations as the "image data". However, in the (musical) audio domain, it is often desired to generate output of variable duration. This paper presents VQCPC-GAN, an adversarial framework for synthesizing variable-length audio by exploiting Vector-Quantized Contrastive Predictive Coding (VQCPC). A sequence of VQCPC tokens extracted from real audio data serves as conditional input to a GAN architecture, providing step-wise time-dependent features of the generated content. The input noise z (characteristic in adversarial architectures) remains fixed over time, ensuring temporal consistency of global features. We evaluate the proposed model by comparing a diverse set of metrics against various strong baselines. Results show that, even though the baselines score best, VQCPC-GAN achieves comparable performance even when
Authors
(none)
Tags
Stats
Related papers
- Delightfultts 2: End-to-end Speech Synthesis With Adversarial Vector-quantized Auto-encoders (2022)9.23
- Expediting TTS Synthesis With Adversarial Vocoding (2019)6.77
- Analysis By Adversarial Synthesis -- A Novel Approach For Speech Vocoding (2019)3.58
- Bigvgan: A Universal Neural Vocoder With Large-scale Training (2022)6.17
- Gansynth: Adversarial Neural Audio Synthesis (2019)0.00
- Adversarial Audio Synthesis (2018)0.00
- Specdiff-gan: A Spectrally-shaped Noise Diffusion GAN For Speech And Music Synthesis (2024)7.81
- Efficient Non-autoregressive GAN Voice Conversion Using Vqwav2vec Features And Dynamic Convolution (2022)0.00