R-2R
Emerging24papers using it
2023first seen
The R2R dataset is a benchmark used to evaluate Vision-and-Language Navigation (VLN) systems by providing a set of navigation tasks that require agents to follow natural language instructions in real-world indoor environments.
Papers using R-2R (24)
- TagaVLM: Topology-Aware Global Action Reasoning for Vision-Language NavigationProFocus: Proactive Perception and Focused Reasoning in Vision-and-Language NavigationImplicit Geometry Representations for Vision-and-Language Navigation from Web VideosTrajectory-Diversity-Driven Robust Vision-and-Language NavigationBeyond Textual Knowledge-Leveraging Multimodal Knowledge Bases for Enhancing Vision-and-Language NavigationWhen and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial ReasoningpFedNavi: Structure-Aware Personalized Federated Vision-Language Navigation for Embodied AIEnhancing Vision-Language Navigation with Multimodal Event Knowledge from Real-World Indoor Tour VideosVision-and-Language Navigation with Analogical Textual Descriptions in LLMsThink Hierarchically, Act Dynamically: Hierarchical Multi-modal Fusion
and Reasoning for Vision-and-Language NavigationFine-Grained Instruction-Guided Graph Reasoning for Vision-and-Language NavigationPanoGen++: Domain-Adapted Text-Guided Panoramic Environment Generation
for Vision-and-Language NavigationGeneral Scene Adaptation for Vision-and-Language NavigationVLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language
NavigationPrompt-based Context- and Domain-aware Pretraining for Vision and
Language NavigationDAP: Domain-aware Prompt Learning for Vision-and-Language NavigationNavHint: Vision and Language Navigation Agent with a Hint GeneratorCausality-based Cross-Modal Representation Learning for
Vision-and-Language NavigationDELAN: Dual-Level Alignment for Vision-and-Language Navigation by
Cross-Modal Contrastive LearningVision-and-Language Navigation via Causal LearningWhy Only Text: Empowering Vision-and-Language Navigation with
Multi-modal PromptsSeeing is Believing? Enhancing Vision-Language Navigation using Visual
PerturbationsNAVCON: A Cognitively Inspired and Linguistically Grounded Corpus for
Vision and Language NavigationMAGIC: Meta-Ability Guided Interactive Chain-of-Distillation for
Effective-and-Efficient Vision-and-Language Navigation