Token-based Audio Inpainting Via Discrete Diffusion
2025 Β· Tali Dror, Iftach Shoham, Moshe Buchris, et al.
Abstract
Audio inpainting seeks to restore missing segments in degraded recordings. Previous diffusion-based methods exhibit impaired performance when the missing region is large. We introduce the first approach that applies discrete diffusion over tokenized music representations from a pre-trained audio tokenizer, enabling stable and semantically coherent restoration of long gaps. Our method further incorporates two training approaches: a derivative-based regularization loss that enforces smooth temporal dynamics, and a span-based absorbing transition that provides structured corruption during diffusion. Experiments on the MusicNet and MAESTRO datasets with gaps up to 750 ms show that our approach consistently outperforms strong baselines across range of gap lengths, for gaps of 150 ms and above. This work advances musical audio restoration and introduces new directions for discrete diffusion model training. Visit our project page for examples and code.
Authors
(none)
Tags
Stats
Related papers
- Audiotoken: Adaptation Of Text-conditioned Diffusion Models For Audio-to-image Generation (2023)9.76
- VRDMG: Vocal Restoration Via Diffusion Posterior Sampling With Multiple Guidance (2023)5.84
- Solving Audio Inverse Problems With A Diffusion Model (2022)0.00
- Aadiff: Audio-aligned Video Synthesis With Text-to-image Diffusion (2023)0.00
- Undiff: Unsupervised Voice Restoration With Unconditional Diffusion Model (2023)5.24
- Latent Diffusion Bridges For Unsupervised Musical Audio Timbre Transfer (2024)3.58
- Rfm-editing: Rectified Flow Matching For Text-guided Audio Editing (2025)0.00
- Controlaudio: Tackling Text-guided, Timing-indicated And Intelligible Audio Generation Via Progressive Diffusion Modeling (2025)0.00