Flashsr: One-step Versatile Audio Super-resolution Via Diffusion Distillation
2025 Β· Jaekwon Im, Juhan Nam
Abstract
Versatile audio super-resolution (SR) is the challenging task of restoring high-frequency components from low-resolution audio with sampling rates between 4kHz and 32kHz in various domains such as music, speech, and sound effects. Previous diffusion-based SR methods suffer from slow inference due to the need for a large number of sampling steps. In this paper, we introduce FlashSR, a single-step diffusion model for versatile audio super-resolution aimed at producing 48kHz audio. FlashSR achieves fast inference by utilizing diffusion distillation with three objectives: distillation loss, adversarial loss, and distribution-matching distillation loss. We further enhance performance by proposing the SR Vocoder, which is specifically designed for SR models operating on mel-spectrograms. FlashSR demonstrates competitive performance with the current state-of-the-art model in both objective and subjective evaluations while being approximately 22 times faster.
Authors
(none)
Tags
Stats
Related papers
- Flowhigh: Towards Efficient And High-quality Audio Super-resolution With Single-step Flow Matching (2025)5.84
- Flashaudio: Rectified Flows For Fast And High-fidelity Text-to-audio Generation (2024)5.13
- STSR: High-fidelity Speech Super-resolution Via Spectral-transient Context Modeling (2025)0.00
- Universr: Unified And Versatile Audio Super-resolution Via Vocoder-free Flow Matching (2025)0.00
- Neural Vocoder Is All You Need For Speech Super-resolution (2022)12.25
- Audio Super-resolution With Latent Bridge Models (2025)0.00
- Edmsound: Spectrogram Based Diffusion Models For Efficient And High-quality Audio Synthesis (2023)0.00
- Score Distillation Sampling For Audio: Source Separation, Synthesis, And Beyond (2025)0.00