TextVQA
Canonical12papers using it
2023first seen
Papers using TextVQA (12)
- Hidden Clones: Exposing and Fixing Family Bias in Vision-Language Model EnsemblesVOILA: Value-of-Information Guided Fidelity Selection for Cost-Aware Multimodal Question AnsweringLinMU: Multimodal Understanding Made LinearText-VQA Aug: Pipelined Harnessing of Large Multimodal Models for Automated SynthesisFusion to Enhance: Fusion Visual Encoder to Enhance Multimodal Language ModelWhen Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMsFast or Slow? Integrating Fast Intuition and Deliberate Thinking for Enhancing Visual Question AnsweringASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLMConstructive Distortion: Improving Mllms With Attention-guided Image WarpingInstruction-Aligned Visual Attention for Mitigating Hallucinations in
Large Vision-Language ModelsTowards a Unified Multimodal Reasoning FrameworkEnhancing Instruction-Following Capability of Visual-Language Models by
Reducing Image Redundancy