Composite Sketch+text Queries For Retrieving Objects With Elusive Names And Complex Interactions
2025 Β· Prajwal Gatti, Kshitij Parikh, Dhriti Prasanna Paul, et al.
Abstract
Non-native speakers with limited vocabulary often struggle to name specific objects despite being able to visualize them, e.g., people outside Australia searching for numbats. Further, users may want to search for such elusive objects with difficult-to-sketch interactions, e.g., numbat digging in the ground. In such common but complex situations, users desire a search interface that accepts composite multimodal queries comprising hand-drawn sketches of difficult-to-name but easy-to-draw objects and text describing difficult-to-sketch but easy-to-verbalize object attributes or interaction with the scene. This novel problem statement distinctly differs from the previously well-researched TBIR (text-based image retrieval) and SBIR (sketch-based image retrieval) problems. To study this under-explored task, we curate a dataset, CSTBIR (Composite Sketch+Text Based Image Retrieval), consisting of approx. 2M queries and 108K natural scene images. Further, as a solution to this problem, we prop
Authors
(none)
Tags
Stats
Related papers
- Sketch And Text Synergy: Fusing Structural Contours And Descriptive Attributes For Fine-grained Image Retrieval (2026)0.00
- A Sketch Is Worth A Thousand Words: Image Retrieval With Text And Sketch (2022)10.35
- Back To The Drawing Board: Rethinking Scene-level Sketch-based Image Retrieval (2025)0.00
- Query By Semantic Sketch (2019)0.00
- A Sketch+text Composed Image Retrieval Dataset For Thangka (2026)0.78
- You'll Never Walk Alone: A Sketch And Text Duet For Fine-grained Image Retrieval (2024)9.41
- FS-COCO: Towards Understanding Of Freehand Sketches Of Common Objects In Context (2022)11.93
- Sketch Less For More: On-the-fly Fine-grained Sketch Based Image Retrieval (2020)15.28