Text-guided Synthesis Of Artistic Images With Retrieval-augmented Diffusion Models
2022 · Robin Rombach, Andreas Blattmann, Björn Ommer
Abstract
Novel architectures have recently improved generative image synthesis leading to excellent visual quality in various tasks. Of particular note is the field of ``AI-Art'', which has seen unprecedented growth with the emergence of powerful multimodal models such as CLIP. By combining speech and image synthesis models, so-called ``prompt-engineering'' has become established, in which carefully selected and composed sentences are used to achieve a certain visual style in the synthesized image. In this note, we present an alternative approach based on retrieval-augmented diffusion models (RDMs). In RDMs, a set of nearest neighbors is retrieved from an external database during training for each training instance, and the diffusion model is conditioned on these informative samples. During inference (sampling), we replace the retrieval database with a more specialized database that contains, for example, only images of a particular visual style. This provides a novel way to prompt a general tr
Authors
(none)
Tags
Stats
Related papers
- Imagerag: Dynamic Image Retrieval For Reference-guided Image Generation (2025)0.00
- Eliminating Hallucination In Diffusion-augmented Interactive Text-to-image Retrieval (2026)0.00
- Semi-parametric Neural Image Synthesis (2022)0.00
- AR-RAG: Autoregressive Retrieval Augmentation For Image Generation (2025)0.00
- Realrag: Retrieval-augmented Realistic Image Generation Via Self-reflective Contrastive Learning (2025)0.00
- Text-to-image Diffusion Models Are Great Sketch-photo Matchmakers (2024)9.41
- MV-RAG: Retrieval Augmented Multiview Diffusion (2025)0.00
- Adafuse: Adaptive Diffusion-generated Image And Text Fusion For Interactive Text-to-image Retrieval (2026)0.00