Audiomog: Guiding Audio Generation With Mixture-of-guidance
2025 Β· Junyou Wang, Zehua Chen, Binjie Yuan, et al.
Abstract
The design of diffusion-based audio generation systems has been investigated from diverse perspectives, such as data space, network architecture, and conditioning techniques, while most of these innovations require model re-training. In sampling, classifier-free guidance (CFG) has been uniformly adopted to enhance generation quality by strengthening condition alignment. However, CFG often compromises diversity, resulting in suboptimal performance. Although the recent autoguidance (AG) method proposes another direction of guidance that maintains diversity, its direct application in audio generation has so far underperformed CFG. In this work, we introduce AudioMoG, an improved sampling method that enhances text-to-audio (T2A) and video-to-audio (V2A) generation quality without requiring extensive training resources. We start with an analysis of both CFG and AG, examining their respective advantages and limitations for guiding diffusion models. Building upon our insights, we introduce a
Authors
(none)
Tags
Stats
Related papers
- Mitigating Data Replication In Text-to-audio Generative Diffusion Models Through Anti-memorization Guidance (2025)2.26
- Audiogen: Textually Guided Audio Generation (2022)0.00
- Audio Generation Through Score-based Generative Modeling: Design Principles And Implementation (2025)1.91
- Auffusion: Leveraging The Power Of Diffusion And Large Language Models For Text-to-audio Generation (2024)11.19
- Controlaudio: Tackling Text-guided, Timing-indicated And Intelligible Audio Generation Via Progressive Diffusion Modeling (2025)0.00
- Mmdisco: Multi-modal Discriminator-guided Cooperative Diffusion For Joint Audio And Video Generation (2024)1.91
- Enhance Generation Quality Of Flow Matching V2A Model Via Multi-step Cot-like Guidance And Combined Preference Optimization (2025)0.00
- Guided-tts: A Diffusion Model For Text-to-speech Via Classifier Guidance (2021)0.00