AOKVQA
Emerging19papers using it
2022first seen
AOKVQA is a dataset used to evaluate commonsense visual-question answering by providing questions that require external knowledge not present in the images or questions themselves.
Papers using AOKVQA (19)
- Self-Questioning Vision-Language Models: Reinforcement Learning for Compositional Visual ReasoningVision Verification Enhanced Fusion of VLMs for Efficient Visual ReasoningFrom Hindsight to Foresight: Self-Encouraged Hindsight Distillation for Knowledge-based Visual Question AnsweringContext-Aware Multi-Turn Visual-Textual Reasoning in LVLMs via Dynamic Memory and Adaptive Visual GuidanceMV-CoRe: Multimodal Visual-Conceptual Reasoning for Complex Visual Question AnsweringCoherent Multimodal Reasoning with Iterative Self-Evaluation for Vision-Language ModelsHierarchical Contextual Grounding LVLM: Enhancing Fine-Grained Visual-Language Understanding with Robust GroundingSee the Forest and the Trees: A Synergistic Reasoning Framework for Knowledge-Based Visual Question AnsweringNLKI: A Lightweight Natural Language Knowledge Integration Framework For Improving Small Vlms In Commonsense VQA TasksBelieving Without Seeing: Quality Scores For Contextualizing Vision-language Model ExplanationsCross Domain Evaluation Of Multimodal Chain-of-thought Reasoning Of Different Datasets Into The Amazon Cot FrameworkRetrieval-Based Interleaved Visual Chain-of-Thought in Real-World
Driving ScenariosMultimodal Chain-of-Thought Reasoning in Language ModelsKnowledge Condensation and Reasoning for Knowledge-based VQAA-OKVQA: A Benchmark for Visual Question Answering using World KnowledgeZero-shot Visual Question Answering with Language Model FeedbackII-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in
Visual Question AnsweringA Simple Baseline for Knowledge-Based Visual Question AnsweringRetrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits
Multimodal Reasoning