3D Visual Question Answering (VQA)
Emerging6papers using it
2022first seen
3D Visual Question Answering (VQA) is a benchmark that evaluates the ability of models to understand and reason about 3D scenes by answering questions based on visual inputs, typically involving spatial and contextual information.
Papers using 3D Visual Question Answering (VQA) (6)
- Occ-VLM: Occupancy Grounded Vision Language Model for Indoor Scene UnderstandingDo Large Vision-language Models Distinguish Between The Actual And Apparent Features Of Illusions?CLIP-TD: CLIP Targeted Distillation for Vision-Language TasksLarge Language Models are Visual Reasoning CoordinatorsMultimodal Adaptive Distillation for Leveraging Unimodal Encoders for
Vision-Language TasksCAVL: Learning Contrastive and Adaptive Representations of Vision and
Language