Resedis: A Dataset For Referring-based Object Search Across Large-scale Image Collections
2025 Β· Ziling Huang, Yidan Zhang, Shin'Ichi Satoh
Abstract
Large-scale visual search engines are expected to solve a dual problem at once: (i) locate every image that truly contains the object described by a sentence and (ii) identify the object's bounding box or exact pixels within each hit. Existing techniques address only one side of this challenge. Visual grounding yields tight boxes and masks but rests on the unrealistic assumption that the object is present in every test image, producing a flood of false alarms when applied to web-scale collections. Text-to-image retrieval excels at sifting through massive databases to rank relevant images, yet it stops at whole-image matches and offers no fine-grained localization. We introduce Referring Search and Discovery (ReSeDis), the first task that unifies corpus-level retrieval with pixel-level grounding. Given a free-form description, a ReSeDis model must decide whether the queried object appears in each image and, if so, where it is, returning bounding boxes or segmentation masks. To enable ri
Authors
(none)
Tags
Stats
Related papers
- Referring Expression Instance Retrieval And A Strong End-to-end Baseline (2025)0.00
- Detect-to-retrieve: Efficient Regional Aggregation For Image Search (2018)24.71
- Lrvs-fashion: Extending Visual Search With Referring Instructions (2023)0.00
- Deepimagesearch: Benchmarking Multimodal Agents For Context-aware Image Retrieval In Visual Histories (2026)0.00
- Seesaw: Interactive Ad-hoc Search Over Image Databases (2022)5.24
- Revisit Anything: Visual Place Recognition Via Image Segment Retrieval (2024)13.05
- RASR: Retrieval-augmented Super Resolution For Practical Reference-based Image Restoration (2025)0.00
- REJEPA: A Novel Joint-embedding Predictive Architecture For Efficient Remote Sensing Image Retrieval (2025)2.26