Audio Super-resolution With Latent Bridge Models
2025 Β· Chang Li, Zehua Chen, Liyuan Wang, et al.
Abstract
Audio super-resolution (SR), i.e., upsampling the low-resolution (LR) waveform to the high-resolution (HR) version, has recently been explored with diffusion and bridge models, while previous methods often suffer from sub-optimal upsampling quality due to their uninformative generation prior. Towards high-quality audio super-resolution, we present a new system with latent bridge models (LBMs), where we compress the audio waveform into a continuous latent space and design an LBM to enable a latent-to-latent generation process that naturally matches the LR-toHR upsampling process, thereby fully exploiting the instructive prior information contained in the LR waveform. To further enhance the training results despite the limited availability of HR samples, we introduce frequency-aware LBMs, where the prior and target frequency are taken as model input, enabling LBMs to explicitly learn an any-to-any upsampling process at the training stage. Furthermore, we design cascaded LBMs and present
Authors
(none)
Tags
Stats
Related papers
- Voicebridge: General Speech Restoration With One-step Latent Bridge Models (2025)0.00
- Audio Super Resolution Using Neural Networks (2017)0.00
- Flashsr: One-step Versatile Audio Super-resolution Via Diffusion Distillation (2025)4.52
- STSR: High-fidelity Speech Super-resolution Via Spectral-transient Context Modeling (2025)0.00
- Flowhigh: Towards Efficient And High-quality Audio Super-resolution With Single-step Flow Matching (2025)5.84
- Inspiremusic: Integrating Super Resolution And Large Language Model For High-fidelity Long-form Music Generation (2025)6.26
- Wave-u-mamba: An End-to-end Framework For High-quality And Efficient Speech Super Resolution (2024)3.58
- Nu-wave 2: A General Neural Audio Upsampling Model For Various Sampling Rates (2022)12.17