OK-VQA
Canonical22papers using it
2023first seen
Papers using OK-VQA (22)
- Hyper-ICL: Attention Calibration with Hyperbolic Anchor Distillation for Multimodal In-Context LearningHierarchical Pre-Training of Vision Encoders with Large Language ModelsWhen RAG Hurts: Diagnosing and Mitigating Attention Distraction in Retrieval-Augmented LVLMsCC-VQA: Conflict- and Correlation-Aware Method for Mitigating Knowledge Conflict in Knowledge-Based Visual Question AnsweringFrom Hindsight to Foresight: Self-Encouraged Hindsight Distillation for Knowledge-based Visual Question AnsweringProgressive Multimodal Search and Reasoning for Knowledge-Intensive Visual Question AnsweringWhen Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMsMV-CoRe: Multimodal Visual-Conceptual Reasoning for Complex Visual Question AnsweringSee the Forest and the Trees: A Synergistic Reasoning Framework for Knowledge-Based Visual Question AnsweringExplanation-driven Counterfactual Testing For Faithfulness In Vision-language Model ExplanationsCross Domain Evaluation Of Multimodal Chain-of-thought Reasoning Of Different Datasets Into The Amazon Cot FrameworkFRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question AnsweringFilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQAFine-grained Late-interaction Multi-modal Retrieval for Retrieval
Augmented Visual Question AnsweringHow to Configure Good In-Context Sequence for Visual Question AnsweringKnowledge Condensation and Reasoning for Knowledge-based VQALearning to Compress Contexts for Efficient Knowledge-based Visual
Question AnsweringVLIS: Unimodal Language Models Guide Multimodal Language GenerationA Simple Baseline for Knowledge-Based Visual Question AnsweringText as Images: Can Multimodal Large Language Models Follow Printed
Instructions in Pixels?Mixture of Rationale: Multi-Modal Reasoning Mixture for Visual Question
AnsweringSecuring Vision-Language Models with a Robust Encoder Against Jailbreak
and Adversarial Attacks