OSCAR: Open-set CAD Retrieval From A Language Prompt And A Single Image
2026 · Tessa Pulli, Jean-Baptiste Weibel, Peter Hönig, et al.
Abstract
6D object pose estimation plays a crucial role in scene understanding for applications such as robotics and augmented reality. To support the needs of ever-changing object sets in such context, modern zero-shot object pose estimators were developed to not require object-specific training but only rely on CAD models. Such models are hard to obtain once deployed, and a continuously changing and growing set of objects makes it harder to reliably identify the instance model of interest. To address this challenge, we introduce an Open-Set CAD Retrieval from a Language Prompt and a Single Image (OSCAR), a novel training-free method that retrieves a matching object model from an unlabeled 3D object database. During onboarding, OSCAR generates multi-view renderings of database models and annotates them with descriptive captions using an image captioning model. At inference, GroundedSAM detects the queried object in the input image, and multi-modal embeddings are computed for both the Region-of
Authors
(none)
Tags
Stats
Related papers
- ROCA: Robust CAD Model Retrieval And Alignment From A Single Image (2021)12.61
- Describe, Adapt And Combine: Empowering CLIP Encoders For Open-set 3D Object Retrieval (2025)2.51
- Mask2cad: 3D Shape Prediction By Learning To Segment And Retrieve (2020)12.87
- SAMURAI: Shape-aware Multimodal Retrieval For 3D Object Identification (2025)0.00
- Object-centric Open-vocabulary Image-retrieval With Aggregated Features (2023)0.00
- Weakly-supervised End-to-end CAD Retrieval To Scan Objects (2022)0.00
- SHREC 2025: Retrieval Of Optimal Objects For Multi-modal Enhanced Language And Spatial Assistance (ROOMELSA) (2025)3.58
- Fastcad: Real-time CAD Retrieval And Alignment From Scans And Videos (2024)6.34