visual question answering (VQA) datasets
Emerging13papers using it
2022first seen
Visual question answering (VQA) datasets contain images paired with questions and answers, and they are used to evaluate the capabilities of models in understanding and reasoning about visual content in relation to textual queries.
Papers using visual question answering (VQA) datasets (13)
- Occ-VLM: Occupancy Grounded Vision Language Model for Indoor Scene UnderstandingCross-Modal Attention Guided Unlearning in Vision-Language ModelsDo LVLMs Know What They Know? A Systematic Study of Knowledge Boundary Perception in LVLMsProvoking Multi-modal Few-Shot LVLM via Exploration-Exploitation In-Context LearningTowards Resource-efficient Multimodal Intelligence: Learned Routing Among Specialized Expert ModelsDo Large Vision-language Models Distinguish Between The Actual And Apparent Features Of Illusions?Adaptive Token Boundaries: Integrating Human Chunking Mechanisms into
Multimodal LLMsCLIP-TD: CLIP Targeted Distillation for Vision-Language TasksLarge Language Models are Visual Reasoning CoordinatorsMultimodal Adaptive Distillation for Leveraging Unimodal Encoders for
Vision-Language TasksUncertainty-Aware Evaluation for Vision-Language ModelsCAVL: Learning Contrastive and Adaptive Representations of Vision and
LanguageBoth Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM