PTQ4ADM: Post-training Quantization For Efficient Text Conditional Audio Diffusion Models
2024 Β· Jayneel Vora, Aditya Krishnan, Nader Bouacida, et al.
Abstract
Denoising diffusion models have emerged as state-of-the-art in generative tasks across image, audio, and video domains, producing high-quality, diverse, and contextually relevant data. However, their broader adoption is limited by high computational costs and large memory footprints. Post-training quantization (PTQ) offers a promising approach to mitigate these challenges by reducing model complexity through low-bandwidth parameters. Yet, direct application of PTQ to diffusion models can degrade synthesis quality due to accumulated quantization noise across multiple denoising steps, particularly in conditional tasks like text-to-audio synthesis. This work introduces PTQ4ADM, a novel framework for quantizing audio diffusion models(ADMs). Our key contributions include (1) a coverage-driven prompt augmentation method and (2) an activation-aware calibration set generation algorithm for text-conditional ADMs. These techniques ensure comprehensive coverage of audio aspects and modalities whi
Authors
(none)
Tags
Stats
Related papers
- Prodiff: Progressive Fast Diffusion Model For High-quality Text-to-speech (2022)0.00
- Fastdiff: A Fast Conditional Diffusion Model For High-quality Speech Synthesis (2022)14.35
- Audiotoken: Adaptation Of Text-conditioned Diffusion Models For Audio-to-image Generation (2023)9.76
- Controlaudio: Tackling Text-guided, Timing-indicated And Intelligible Audio Generation Via Progressive Diffusion Modeling (2025)0.00
- Solving Audio Inverse Problems With A Diffusion Model (2022)0.00
- Priorgrad: Improving Conditional Denoising Diffusion Models With Data-dependent Adaptive Prior (2021)0.00
- Resgrad: Residual Denoising Diffusion Probabilistic Models For Text To Speech (2022)0.00
- BDDM: Bilateral Denoising Diffusion Models For Fast And High-quality Speech Synthesis (2022)4.76