XT-VQA

Emerging

4papers using it

2022first seen

XT-VQA (Cross-Lingual Text-Rich Visual Question Answering) is a benchmark that evaluates how large vision-language models handle language inconsistency between text in images and the language of the questions, integrating multiple text-rich VQA datasets and a newly collected dataset, XPaperQA.

🔎 Find this dataset

Papers using XT-VQA (4)

On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization2022

Benchmarking Faithfulness: Towards Accurate Natural Language Explanations in Vision-Language Tasks2023

From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation2023

Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective2024