3D Visual Question Answering (VQA)

Emerging

6papers using it

2022first seen

3D Visual Question Answering (VQA) is a benchmark that evaluates the ability of models to understand and reason about 3D scenes by answering questions based on visual inputs, typically involving spatial and contextual information.

🔎 Find this dataset

Papers using 3D Visual Question Answering (VQA) (6)

Occ-VLM: Occupancy Grounded Vision Language Model for Indoor Scene Understanding2026

Do Large Vision-language Models Distinguish Between The Actual And Apparent Features Of Illusions?2025

CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks2022 · 19 cites

Large Language Models are Visual Reasoning Coordinators2023 · 14 cites

Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks2022 · 8 cites

CAVL: Learning Contrastive and Adaptive Representations of Vision and Language2023 · 1 cites