Open-world 3D Scene Graph Generation For Retrieval-augmented Reasoning
2025 Β· Fei Yu, Quan Deng, Shengeng Tang, et al.
Abstract
Understanding 3D scenes in open-world settings poses fundamental challenges for vision and robotics, particularly due to the limitations of closed-vocabulary supervision and static annotations. To address this, we propose a unified framework for Open-World 3D Scene Graph Generation with Retrieval-Augmented Reasoning, which enables generalizable and interactive 3D scene understanding. Our method integrates Vision-Language Models (VLMs) with retrieval-based reasoning to support multimodal exploration and language-guided interaction. The framework comprises two key components: (1) a dynamic scene graph generation module that detects objects and infers semantic relationships without fixed label sets, and (2) a retrieval-augmented reasoning pipeline that encodes scene graphs into a vector database to support text/image-conditioned queries. We evaluate our method on 3DSSG and Replica benchmarks across four tasks-scene question answering, visual grounding, instance retrieval, and task plannin
Authors
(none)
Tags
Stats
Related papers
- Learning 3D Semantic Scene Graphs From 3D Indoor Reconstructions (2020)17.18
- R4: Retrieval-augmented Reasoning For Vision-language Models In 4D Spatio-temporal Space (2025)0.00
- "where Am I?" Scene Retrieval With Language (2024)7.50
- Remote Sensing Retrieval-augmented Generation: Bridging Remote Sensing Imagery And Comprehensive Knowledge With A Multi-modal Dataset And Retrieval-augmented Generation Model (2025)2.26
- Leveraging Retrieval-augmented Tags For Large Vision-language Understanding In Complex Scenes (2024)0.00
- SCENIR: Visual Semantic Clarity Through Unsupervised Scene Graph Retrieval (2025)0.00
- VISOR: Agentic Visual Retrieval-augmented Generation Via Iterative Search And Over-horizon Reasoning (2026)0.00
- Scene Graph Based Image Retrieval -- A Case Study On The CLEVR Dataset (2019)0.00