Visual Commonsense Reasoning (VCR)
Emerging5papers using it
2022first seen
Visual Commonsense Reasoning (VCR) is a question-answering task that evaluates a model's ability to understand and reason about visual content in images.
Papers using Visual Commonsense Reasoning (VCR) (5)
- CLIP-TD: CLIP Targeted Distillation for Vision-Language TasksMultimodal Adaptive Distillation for Leveraging Unimodal Encoders for
Vision-Language TasksVL-InterpreT: An Interactive Visualization Tool for Interpreting
Vision-Language TransformersCAVL: Learning Contrastive and Adaptive Representations of Vision and
LanguageMERLOT Reserve: Neural Script Knowledge through Vision and Language and
Sound