DocVQA

Emerging

13papers using it

2020first seen

The 'DocVQA' dataset is a benchmark that contains documents and associated questions used to evaluate the performance of models in understanding and extracting information from visual documents.

🔎 Find this dataset

Papers using DocVQA (11)

Chain-of-Thought Compression Should Not Be Blind: V-Skip for Efficient Multimodal Reasoning via Dual-Path Anchoring2026

Simple Vision-language Math Reasoning Via Rendered Text2025

Qianfan-vl: Domain-enhanced Universal Vision-language Models2025

Describe Anything Model for Visual Question Answering on Text-rich Images2025

Spatially Grounded Explanations in Vision Language Models for Document Visual Question Answering2025

Constructive Distortion: Improving Mllms With Attention-guided Image Warping2025

Interpret, Prune And Distill Donut : Towards Lightweight Vlms For VQA On Document2025

MGA-VQA: Secure And Interpretable Graph-augmented Visual Question Answering With Memory-guided Protection Against Unauthorized Knowledge Use2025

Wilddoc: How Far Are We From Achieving Comprehensive And Robust Document Understanding In The Wild?2025

LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding2020 · 59 cites

LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer2024