SHREC 2025: Retrieval Of Optimal Objects For Multi-modal Enhanced Language And Spatial Assistance (ROOMELSA)
2025 Β· Trong-Thuan Nguyen, Viet-Tham Huynh, Quang-Thuc Nguyen, et al.
Abstract
Recent 3D retrieval systems are typically designed for simple, controlled scenarios, such as identifying an object from a cropped image or a brief description. However, real-world scenarios are more complex, often requiring the recognition of an object in a cluttered scene based on a vague, free-form description. To this end, we present ROOMELSA, a new benchmark designed to evaluate a system's ability to interpret natural language. Specifically, ROOMELSA attends to a specific region within a panoramic room image and accurately retrieves the corresponding 3D model from a large database. In addition, ROOMELSA includes over 1,600 apartment scenes, nearly 5,200 rooms, and more than 44,000 targeted queries. Empirically, while coarse object retrieval is largely solved, only one top-performing model consistently ranked the correct match first across nearly all test cases. Notably, a lightweight CLIP-based model also performed well, although it struggled with subtle variations in materials, pa
Authors
(none)
Tags
Stats
Related papers
- SAMURAI: Shape-aware Multimodal Retrieval For 3D Object Identification (2025)0.00
- OSCAR: Open-set CAD Retrieval From A Language Prompt And A Single Image (2026)0.00
- Hotelmatch-llm: Joint Multi-task Training Of Small And Large Language Models For Efficient Multimodal Hotel Retrieval (2025)0.00
- Describe, Adapt And Combine: Empowering CLIP Encoders For Open-set 3D Object Retrieval (2025)2.51
- Spatialmem: Metric-aligned Long-horizon Video Memory For Language Grounding And QA (2026)0.00
- MRSE: An Efficient Multi-modality Retrieval System For Large Scale E-commerce (2024)0.00
- Retrieval And Localization With Observation Constraints (2021)5.24
- SORCE: Small Object Retrieval In Complex Environments (2025)0.00