DocVQA
Emerging12papers using it
2024first seen
Papers using DocVQA (12)
- Chain-of-Thought Compression Should Not Be Blind: V-Skip for Efficient Multimodal Reasoning via Dual-Path AnchoringDescribe Anything Model for Visual Question Answering on Text-rich ImagesSpatially Grounded Explanations in Vision Language Models for Document Visual Question AnsweringInput-adaptive Visual Preprocessing For Efficient Fast Vision-language Model InferenceConstructive Distortion: Improving Mllms With Attention-guided Image WarpingQianfan-vl: Domain-enhanced Universal Vision-language ModelsInterpret, Prune And Distill Donut : Towards Lightweight Vlms For VQA On DocumentMGA-VQA: Secure And Interpretable Graph-augmented Visual Question Answering With Memory-guided Protection Against Unauthorized Knowledge UseSimple Vision-language Math Reasoning Via Rendered TextText-guided Semantic Image EncoderWilddoc: How Far Are We From Achieving Comprehensive And Robust Document Understanding In The Wild?LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via
Hierarchical Window Transformer