XT-VQA
Emerging4papers using it
2022first seen
XT-VQA (Cross-Lingual Text-Rich Visual Question Answering) is a benchmark that evaluates how large vision-language models handle language inconsistency between text in images and the language of the questions, integrating multiple text-rich VQA datasets and a newly collected dataset, XPaperQA.
Papers using XT-VQA (4)
- On Advances in Text Generation from Images Beyond Captioning: A Case
Study in Self-RationalizationBenchmarking Faithfulness: Towards Accurate Natural Language
Explanations in Vision-Language TasksFrom Wrong To Right: A Recursive Approach Towards Vision-Language
ExplanationCross-Lingual Text-Rich Visual Comprehension: An Information Theory
Perspective