← all datasets

XT-VQA

Emerging
4papers using it
2022first seen

XT-VQA (Cross-Lingual Text-Rich Visual Question Answering) is a benchmark that evaluates how large vision-language models handle language inconsistency between text in images and the language of the questions, integrating multiple text-rich VQA datasets and a newly collected dataset, XPaperQA.

Papers using XT-VQA (4)

XT-VQA β€” datasets β€” multimodal