Solving Audio Inverse Problems With A Diffusion Model
2022 · Eloi Moliner, Jaakko Lehtinen, Vesa Välimäki
Abstract
This paper presents CQT-Diff, a data-driven generative audio model that can, once trained, be used for solving various different audio inverse problems in a problem-agnostic setting. CQT-Diff is a neural diffusion model with an architecture that is carefully constructed to exploit pitch-equivariant symmetries in music. This is achieved by preconditioning the model with an invertible Constant-Q Transform (CQT), whose logarithmically-spaced frequency axis represents pitch equivariance as translation equivariance. The proposed method is evaluated with objective and subjective metrics in three different and varied tasks: audio bandwidth extension, inpainting, and declipping. The results show that CQT-Diff outperforms the compared baselines and ablations in audio bandwidth extension and, without retraining, delivers competitive performance against modern baselines in audio inpainting and declipping. This work represents the first diffusion-based general framework for solving inverse problem
Authors
(none)
Tags
Stats
Related papers
- Undiff: Unsupervised Voice Restoration With Unconditional Diffusion Model (2023)5.24
- PTQ4ADM: Post-training Quantization For Efficient Text Conditional Audio Diffusion Models (2024)0.00
- Immersediffusion: A Generative Spatial Audio Latent Diffusion Model (2024)0.00
- Edmsound: Spectrogram Based Diffusion Models For Efficient And High-quality Audio Synthesis (2023)0.00
- Audio Generation Through Score-based Generative Modeling: Design Principles And Implementation (2025)1.91
- Token-based Audio Inpainting Via Discrete Diffusion (2025)0.00
- Diffwave: A Versatile Diffusion Model For Audio Synthesis (2020)0.00
- Estimation And Restoration Of Unknown Nonlinear Distortion Using Diffusion (2025)0.00