Score Distillation Sampling For Audio: Source Separation, Synthesis, And Beyond
2025 Β· Jessie Richter-Powell, Antonio Torralba, Jonathan Lorraine
Abstract
We introduce Audio-SDS, a generalization of Score Distillation Sampling (SDS) to text-conditioned audio diffusion models. While SDS was initially designed for text-to-3D generation using image diffusion, its core idea of distilling a powerful generative prior into a separate parametric representation extends to the audio domain. Leveraging a single pretrained model, Audio-SDS enables a broad range of tasks without requiring specialized datasets. In particular, we demonstrate how Audio-SDS can guide physically informed impact sound simulations, calibrate FM-synthesis parameters, and perform prompt-specified source separation. Our findings illustrate the versatility of distillation-based methods across modalities and establish a robust foundation for future work using generative priors in audio tasks.
Authors
(none)
Tags
Stats
Related papers
- Audio Generation Through Score-based Generative Modeling: Design Principles And Implementation (2025)1.91
- Edmsound: Spectrogram Based Diffusion Models For Efficient And High-quality Audio Synthesis (2023)0.00
- Fast Text-to-audio Generation With One-step Sampling Via Energy-scoring And Auxiliary Contextual Representation Distillation (2026)0.00
- Diff-sage: End-to-end Spatial Audio Generation Using Diffusion Models (2024)6.34
- Soloaudio: Target Sound Extraction With Language-oriented Audio Diffusion Transformer (2024)7.50
- S-SONDO: Self-supervised Knowledge Distillation For General Audio Foundation Models (2026)1.82
- Generalized Multi-source Inference For Text Conditioned Music Diffusion Models (2024)0.00
- Audiotoken: Adaptation Of Text-conditioned Diffusion Models For Audio-to-image Generation (2023)9.76