NU-GAN: High Resolution Neural Upsampling With GAN
2020 Β· Rithesh Kumar, Kundan Kumar, Vicki Anand, et al.
Abstract
In this paper, we propose NU-GAN, a new method for resampling audio from lower to higher sampling rates (upsampling). Audio upsampling is an important problem since productionizing generative speech technology requires operating at high sampling rates. Such applications use audio at a resolution of 44.1 kHz or 48 kHz, whereas current speech synthesis methods are equipped to handle a maximum of 24 kHz resolution. NU-GAN takes a leap towards solving audio upsampling as a separate component in the text-to-speech (TTS) pipeline by leveraging techniques for audio generation using GANs. ABX preference tests indicate that our NU-GAN resampler is capable of resampling 22 kHz to 44.1 kHz audio that is distinguishable from original audio only 7.4% higher than random chance for single speaker dataset, and 10.8% higher than chance for multi-speaker dataset.
Authors
(none)
Tags
Stats
Related papers
- Nu-wave: A Diffusion Probabilistic Model For Neural Audio Upsampling (2021)12.40
- Nu-wave 2: A General Neural Audio Upsampling Model For Various Sampling Rates (2022)12.17
- An Investigation Of Pre-upsampling Generative Modelling And Generative Adversarial Networks In Audio Super Resolution (2021)0.00
- Hifi-gan: Generative Adversarial Networks For Efficient And High Fidelity Speech Synthesis (2020)0.00
- Audio Super Resolution Using Neural Networks (2017)0.00
- EVA-GAN: Enhanced Various Audio Generation Via Scalable Generative Adversarial Networks (2024)0.00
- Bandwidth Extension On Raw Audio Via Generative Adversarial Networks (2019)0.00
- Time-domain Speech Super-resolution With GAN Based Modeling For Telephony Speaker Verification (2022)5.24