Conditional Variational Autoencoder To Improve Neural Audio Synthesis For Polyphonic Music Sound
2022 Β· Seokjin Lee, Minhan Kim, Seunghyeon Shin, et al.
Abstract
Deep generative models for audio synthesis have recently been significantly improved. However, the task of modeling raw-waveforms remains a difficult problem, especially for audio waveforms and music signals. Recently, the realtime audio variational autoencoder (RAVE) method was developed for high-quality audio waveform synthesis. The RAVE method is based on the variational autoencoder and utilizes the two-stage training strategy. Unfortunately, the RAVE model is limited in reproducing wide-pitch polyphonic music sound. Therefore, to enhance the reconstruction performance, we adopt the pitch activation data as an auxiliary information to the RAVE model. To handle the auxiliary information, we propose an enhanced RAVE model with a conditional variational autoencoder structure and an additional fully-connected layer. To evaluate the proposed structure, we conducted a listening experiment based on multiple stimulus tests with hidden references and an anchor (MUSHRA) with the MAESTRO. The
Authors
(none)
Tags
Stats
Related papers
- RAVE: A Variational Autoencoder For Fast And High-quality Neural Audio Synthesis (2021)0.00
- Audio-visual Speech Enhancement Using Conditional Variational Auto-encoders (2019)13.65
- Midi-sandwich: Multi-model Multi-task Hierarchical Conditional VAE-GAN Networks For Symbolic Single-track Music Generation (2019)0.00
- Interpretable Timbre Synthesis Using Variational Autoencoders Regularized On Timbre Descriptors (2023)0.00
- Emotion-conditioned Melody Harmonization With Hierarchical Variational Autoencoder (2023)5.24
- Domain Adversarial Training On Conditional Variational Auto-encoder For Controllable Music Generation (2022)0.00
- Rethinking Recurrent Latent Variable Model For Music Composition (2018)7.50
- Multi-view Midivae: Fusing Track- And Bar-view Representations For Long Multi-track Symbolic Music Generation (2024)0.00