NLVR2
Canonical9papers using it
31HF downloads
2HF likes
2022first seen
The Natural Language for Visual Reasoning corpora are two language grounding datasets containing natural language sentences grounded in images. The task is to determine whether a sentence is true about a visual input. The data was collected through crowdsourcings, and solving the task requires reasoning about sets of o
π€ Hugging Faceβ cc-by-4.0
Papers using NLVR2 (9)
- Quizzard@inova Challenge 2025 -- Track A: Plug-and-play Technique In Interleaved Multi-image ModelImage as a Foreign Language: BEiT Pretraining for All Vision and
Vision-Language TasksAnswer-Me: Multi-Task Open-Vocabulary Visual Question AnsweringMixGen: A New Multi-Modal Data AugmentationUnsupervised Vision-and-Language Pre-training via Retrieval-based
Multi-Granular AlignmentGRILL: Grounded Vision-language Pre-training via Aligning Text and Image
RegionsEfficientVLM: Fast and Accurate Vision-Language Models via Knowledge
Distillation and Modal-adaptive PruningTraining Vision-Language Models with Less Bimodal SupervisionMixture of Rationale: Multi-Modal Reasoning Mixture for Visual Question
Answering