AR-RAG: Autoregressive Retrieval Augmentation For Image Generation
2025 Β· Jingyuan Qi, Zhiyang Xu, Qifan Wang, et al.
Abstract
We introduce Autoregressive Retrieval Augmentation (AR-RAG), a novel paradigm that enhances image generation by autoregressively incorporating knearest neighbor retrievals at the patch level. Unlike prior methods that perform a single, static retrieval before generation and condition the entire generation on fixed reference images, AR-RAG performs context-aware retrievals at each generation step, using prior-generated patches as queries to retrieve and incorporate the most relevant patch-level visual references, enabling the model to respond to evolving generation needs while avoiding limitations (e.g., over-copying, stylistic bias, etc.) prevalent in existing methods. To realize AR-RAG, we propose two parallel frameworks: (1) Distribution-Augmentation in Decoding (DAiD), a training-free plug-and-use decoding strategy that directly merges the distribution of model-predicted patches with the distribution of retrieved patches, and (2) Feature-Augmentation in Decoding (FAiD), a parameter-
Authors
(none)
Tags
Stats
Related papers
- Imagerag: Dynamic Image Retrieval For Reference-guided Image Generation (2025)0.00
- Cross-modal RAG: Sub-dimensional Text-to-image Retrieval-augmented Generation (2025)0.00
- Realrag: Retrieval-augmented Realistic Image Generation Via Self-reflective Contrastive Learning (2025)0.00
- Regionrag: Region-level Retrieval-augmented Generation For Visual Document Understanding (2025)0.00
- RAVID: Retrieval-augmented Visual Detection: A Knowledge-driven Approach For Ai-generated Image Identification (2025)0.00
- SAR-RAG: ATR Visual Question Answering By Semantic Search, Retrieval, And MLLM Generation (2026)0.00
- Retrieval-augmented Perception: High-resolution Image Perception Meets Visual RAG (2025)0.00
- Visual-rag: Benchmarking Text-to-image Retrieval Augmented Generation For Visual Knowledge Intensive Queries (2025)0.00