RAVEN: Multitask Retrieval Augmented Vision-language Learning
2024 Β· Varun Nagaraj Rao, Siddharth Choudhary, Aditya Deshpande, et al.
Abstract
The scaling of large language models to encode all the world's knowledge in model parameters is unsustainable and has exacerbated resource barriers. Retrieval-Augmented Generation (RAG) presents a potential solution, yet its application to vision-language models (VLMs) is under explored. Existing methods focus on models designed for single tasks. Furthermore, they're limited by the need for resource intensive pre training, additional parameter requirements, unaddressed modality prioritization and lack of clear benefit over non-retrieval baselines. This paper introduces RAVEN, a multitask retrieval augmented VLM framework that enhances base VLMs through efficient, task specific fine-tuning. By integrating retrieval augmented samples without the need for additional retrieval-specific parameters, we show that the model acquires retrieval properties that are effective across multiple tasks. Our results and extensive ablations across retrieved modalities for the image captioning and VQA tas
Authors
(none)
Tags
Stats
Related papers
- RAVENEA: A Benchmark For Multimodal Retrieval-augmented Visual Culture Understanding (2025)0.00
- RAVEN: In-context Learning With Retrieval-augmented Encoder-decoder Language Models (2023)0.00
- Leveraging Retrieval-augmented Tags For Large Vision-language Understanding In Complex Scenes (2024)0.00
- M4-RAG: A Massive-scale Multilingual Multi-cultural Multimodal RAG (2025)2.00
- Remote Sensing Retrieval-augmented Generation: Bridging Remote Sensing Imagery And Comprehensive Knowledge With A Multi-modal Dataset And Retrieval-augmented Generation Model (2025)2.26
- Understanding Retrieval-augmented Task Adaptation For Vision-language Models (2024)0.00
- Retrieval-augmented Perception: High-resolution Image Perception Meets Visual RAG (2025)0.00
- RAVID: Retrieval-augmented Visual Detection: A Knowledge-driven Approach For Ai-generated Image Identification (2025)0.00