Enhancing Image Quality Assessment Ability Of Lmms Via Retrieval-augmented Generation
2026 Β· Kang Fu, Huiyu Duan, Zicheng Zhang, et al.
Abstract
Large Multimodal Models (LMMs) have recently shown remarkable promise in low-level visual perception tasks, particularly in Image Quality Assessment (IQA), demonstrating strong zero-shot capability. However, achieving state-of-the-art performance often requires computationally expensive fine-tuning methods, which aim to align the distribution of quality-related token in output with image quality levels. Inspired by recent training-free works for LMM, we introduce IQARAG, a novel, training-free framework that enhances LMMs' IQA ability. IQARAG leverages Retrieval-Augmented Generation (RAG) to retrieve some semantically similar but quality-variant reference images with corresponding Mean Opinion Scores (MOSs) for input image. These retrieved images and input image are integrated into a specific prompt. Retrieved images provide the LMM with a visual perception anchor for IQA task. IQARAG contains three key phases: Retrieval Feature Extraction, Image Retrieval, and Integration & Quality Sc
Authors
(none)
Tags
Stats
Related papers
- Retrieval-augmented Perception: High-resolution Image Perception Meets Visual RAG (2025)0.00
- Visual-rag: Benchmarking Text-to-image Retrieval Augmented Generation For Visual Knowledge Intensive Queries (2025)0.00
- Murag: Multimodal Retrieval-augmented Generator For Open Question Answering Over Images And Text (2022)14.66
- MLLM Is A Strong Reranker: Advancing Multimodal Retrieval-augmented Generation Via Knowledge-enhanced Reranking And Noise-injected Training (2024)9.18
- Lamra: Large Multimodal Model As Your Advanced Retrieval Assistant (2024)7.50
- Re-ranking The Context For Multimodal Retrieval Augmented Generation (2025)0.00
- A Fine-tuning Enhanced RAG System With Quantized Influence Measure As AI Judge (2024)11.19
- Pixel-grounded Retrieval For Knowledgeable Large Multimodal Models (2026)0.00