Neural Vocoder Is All You Need For Speech Super-resolution
2022 Β· Haohe Liu, Woosung Choi, Xubo Liu, et al.
Abstract
Speech super-resolution (SR) is a task to increase speech sampling rate by generating high-frequency components. Existing speech SR methods are trained in constrained experimental settings, such as a fixed upsampling ratio. These strong constraints can potentially lead to poor generalization ability in mismatched real-world cases. In this paper, we propose a neural vocoder based speech super-resolution method (NVSR) that can handle a variety of input resolution and upsampling ratios. NVSR consists of a mel-bandwidth extension module, a neural vocoder module, and a post-processing module. Our proposed system achieves state-of-the-art results on the VCTK multi-speaker benchmark. On 44.1 kHz target resolution, NVSR outperforms WSRGlow and Nu-wave by 8% and 37% respectively on log spectral distance and achieves a significantly better perceptual quality. We also demonstrate that prior knowledge in the pre-trained vocoder is crucial for speech SR by performing mel-bandwidth extension with a
Authors
(none)
Tags
Stats
Related papers
- MSR-NV: Neural Vocoder Using Multiple Sampling Rates (2021)2.26
- Wave-u-mamba: An End-to-end Framework For High-quality And Efficient Speech Super Resolution (2024)3.58
- Univnet: A Neural Vocoder With Multi-resolution Spectrogram Discriminators For High-fidelity Waveform Generation (2021)14.80
- Hifi-sr: A Unified Generative Transformer-convolutional Adversarial Network For High-fidelity Speech Super-resolution (2025)10.81
- Super Denoise Net: Speech Super Resolution With Noise Cancellation In Low Sampling Rate Noisy Environments (2023)0.00
- La-voce: Low-snr Audio-visual Speech Enhancement Using Neural Vocoders (2022)0.00
- Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis (2024)5.24
- Universr: Unified And Versatile Audio Super-resolution Via Vocoder-free Flow Matching (2025)0.00