A-OKVQA
Emerging16papers using it
2023first seen
A-OKVQA is a dataset that contains visual question-answering tasks designed to evaluate the compositional visual reasoning abilities of vision-language models.
Papers using A-OKVQA (16)
- Self-Questioning Vision-Language Models: Reinforcement Learning for Compositional Visual ReasoningVision Verification Enhanced Fusion of VLMs for Efficient Visual ReasoningFrom Hindsight to Foresight: Self-Encouraged Hindsight Distillation for Knowledge-based Visual Question AnsweringContext-Aware Multi-Turn Visual-Textual Reasoning in LVLMs via Dynamic Memory and Adaptive Visual GuidanceMV-CoRe: Multimodal Visual-Conceptual Reasoning for Complex Visual Question AnsweringCoherent Multimodal Reasoning with Iterative Self-Evaluation for Vision-Language ModelsHierarchical Contextual Grounding LVLM: Enhancing Fine-Grained Visual-Language Understanding with Robust GroundingSee the Forest and the Trees: A Synergistic Reasoning Framework for Knowledge-Based Visual Question AnsweringNLKI: A Lightweight Natural Language Knowledge Integration Framework For Improving Small Vlms In Commonsense VQA TasksBelieving Without Seeing: Quality Scores For Contextualizing Vision-language Model ExplanationsCross Domain Evaluation Of Multimodal Chain-of-thought Reasoning Of Different Datasets Into The Amazon Cot FrameworkRetrieval-Based Interleaved Visual Chain-of-Thought in Real-World
Driving ScenariosKnowledge Condensation and Reasoning for Knowledge-based VQAII-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in
Visual Question AnsweringA Simple Baseline for Knowledge-Based Visual Question AnsweringRetrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits
Multimodal Reasoning