STE-VQA

Emerging

6papers using it

2022first seen

The 'STE-VQA' dataset/benchmark is used to evaluate models on their ability to perform Document Visual Question Answering by assessing their understanding of textual semantics, spatial layout, and visual features.

🔎 Find this dataset

Papers using STE-VQA (6)

When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMs2025

MGA-VQA: Secure And Interpretable Graph-augmented Visual Question Answering With Memory-guided Protection Against Unauthorized Knowledge Use2025

Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering2022 · 8 cites

TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation2022 · 7 cites

SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering2022 · 1 cites

Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA2023 · 1 cites