Nu-wave: A Diffusion Probabilistic Model For Neural Audio Upsampling
2021 Β· Junhyeok Lee, Seungu Han
Abstract
In this work, we introduce NU-Wave, the first neural audio upsampling model to produce waveforms of sampling rate 48kHz from coarse 16kHz or 24kHz inputs, while prior works could generate only up to 16kHz. NU-Wave is the first diffusion probabilistic model for audio super-resolution which is engineered based on neural vocoders. NU-Wave generates high-quality audio that achieves high performance in terms of signal-to-noise ratio (SNR), log-spectral distance (LSD), and accuracy of the ABX test. In all cases, NU-Wave outperforms the baseline models despite the substantially smaller model capacity (3.0M parameters) than baselines (5.4-21%). The audio samples of our model are available at https://mindslab-ai.github.io/nuwave, and the code will be made available soon.
Authors
(none)
Tags
Stats
Related papers
- Nu-wave 2: A General Neural Audio Upsampling Model For Various Sampling Rates (2022)12.17
- Diffwave: A Versatile Diffusion Model For Audio Synthesis (2020)0.00
- NU-GAN: High Resolution Neural Upsampling With GAN (2020)0.00
- Wavenet: A Generative Model For Raw Audio (2016)0.00
- Audio Super Resolution Using Neural Networks (2017)0.00
- Wave-u-net: A Multi-scale Neural Network For End-to-end Audio Source Separation (2018)0.00
- Universr: Unified And Versatile Audio Super-resolution Via Vocoder-free Flow Matching (2025)0.00
- Wavefit: An Iterative And Non-autoregressive Neural Vocoder Based On Fixed-point Iteration (2022)9.41