Alleviating Hallucination In Large Vision-language Models With Active Retrieval Augmentation
2024 Β· Xiaoye Qu, Qiyuan Chen, Wei Wei, et al.
Abstract
Despite the remarkable ability of large vision-language models (LVLMs) in image comprehension, these models frequently generate plausible yet factually incorrect responses, a phenomenon known as hallucination.Recently, in large language models (LLMs), augmenting LLMs by retrieving information from external knowledge resources has been proven as a promising solution to mitigate hallucinations.However, the retrieval augmentation in LVLM significantly lags behind the widespread applications of LVLM. Moreover, when transferred to augmenting LVLMs, sometimes the hallucination degree of the model is even exacerbated.Motivated by the research gap and counter-intuitive phenomenon, we introduce a novel framework, the Active Retrieval-Augmented large vision-language model (ARA), specifically designed to address hallucinations by incorporating three critical dimensions: (i) dissecting the retrieval targets based on the inherent hierarchical structures of images. (ii) pinpointing the most effectiv
Authors
(none)
Tags
Stats
Related papers
- Leveraging Retrieval-augmented Tags For Large Vision-language Understanding In Complex Scenes (2024)0.00
- Benchmarking Deflection And Hallucination In Large Vision-language Models (2026)0.00
- RAVEN: Multitask Retrieval Augmented Vision-language Learning (2024)0.00
- Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models And Vision Language Models (2024)8.82
- Lamra: Large Multimodal Model As Your Advanced Retrieval Assistant (2024)7.50
- Learning By Hallucinating: Vision-language Pre-training With Weak Supervision (2022)4.52
- Lvlm-aware Multimodal Retrieval For Rag-based Medical Diagnosis With General-purpose Models (2025)0.00
- Aligning Vision Models With Human Aesthetics In Retrieval: Benchmarks And Algorithms (2024)0.00