RAVE: A Variational Autoencoder For Fast And High-quality Neural Audio Synthesis
2021 · Antoine Caillon, Philippe Esling
Abstract
Deep generative models applied to audio have improved by a large margin the state-of-the-art in many speech and music related tasks. However, as raw waveform modelling remains an inherently difficult task, audio generative models are either computationally intensive, rely on low sampling rates, are complicated to control or restrict the nature of possible signals. Among those models, Variational AutoEncoders (VAE) give control over the generation by exposing latent variables, although they usually suffer from low synthesis quality. In this paper, we introduce a Realtime Audio Variational autoEncoder (RAVE) allowing both fast and high-quality audio waveform synthesis. We introduce a novel two-stage training procedure, namely representation learning and adversarial fine-tuning. We show that using a post-training analysis of the latent space allows a direct control between the reconstruction fidelity and the representation compactness. By leveraging a multi-band decomposition of the raw w
Authors
(none)
Tags
Stats
Related papers
- Conditional Variational Autoencoder To Improve Neural Audio Synthesis For Polyphonic Music Sound (2022)0.00
- A Benchmark Of Dynamical Variational Autoencoders Applied To Speech Spectrogram Modeling (2021)6.77
- A Statistically Principled And Computationally Efficient Approach To Speech Enhancement Using Variational Autoencoders (2019)9.23
- Audio-visual Speech Enhancement Using Conditional Variational Auto-encoders (2019)13.65
- A Recurrent Variational Autoencoder For Speech Enhancement (2019)13.97
- Learning And Controlling The Source-filter Representation Of Speech With A Variational Autoencoder (2022)7.50
- EVA-GAN: Enhanced Various Audio Generation Via Scalable Generative Adversarial Networks (2024)0.00
- Audio-to-image Cross-modal Generation (2021)6.34