Diffusion Models Generate Images Like Painters: An Analytical Theory Of Outline First, Details Later
2023 Β· Binxu Wang, John J. Vastola
Abstract
How do diffusion generative models convert pure noise into meaningful images? In a variety of pretrained diffusion models (including conditional latent space models like Stable Diffusion), we observe that the reverse diffusion process that underlies image generation has the following properties: (i) individual trajectories tend to be low-dimensional and resemble 2D `rotations'; (ii) high-variance scene features like layout tend to emerge earlier, while low-variance details tend to emerge later; and (iii) early perturbations tend to have a greater impact on image content than later perturbations. To understand these phenomena, we derive and study a closed-form solution to the probability flow ODE for a Gaussian distribution, which shows that the reverse diffusion state rotates towards a gradually-specified target on the image manifold. It also shows that generation involves first committing to an outline, and then to finer and finer details. We find that this solution accurately describ
Authors
(none)
Tags
Stats
Related papers
- Diffusion Art Or Digital Forgery? Investigating Data Replication In Diffusion Models (2022)15.75
- Text-guided Synthesis Of Artistic Images With Retrieval-augmented Diffusion Models (2022)8.29
- Image Retrieval Outperforms Diffusion Models On Data Augmentation (2023)0.00
- Text-to-image Diffusion Models Are Great Sketch-photo Matchmakers (2024)9.41
- Renderers Are Good Zero-shot Representation Learners: Exploring Diffusion Latents For Metric Learning (2023)0.60
- Diff-sbsr: Learning Multimodal Feature-enhanced Diffusion Models For Zero-shot Sketch-based 3D Shape Retrieval (2026)0.00
- Semi-parametric Neural Image Synthesis (2022)0.00
- Imagerag: Dynamic Image Retrieval For Reference-guided Image Generation (2025)0.00