Semantic Image Retrieval Via Active Grounding Of Visual Situations
2017 Β· Max H. Quinn, Erik Conser, Jordan M. Witte, et al.
Abstract
We describe a novel architecture for semantic image retrieval---in particular, retrieval of instances of visual situations. Visual situations are concepts such as "a boxing match," "walking the dog," "a crowd waiting for a bus," or "a game of ping-pong," whose instantiations in images are linked more by their common spatial and semantic structure than by low-level visual similarity. Given a query situation description, our architecture---called Situate---learns models capturing the visual features of expected objects as well the expected spatial configuration of relationships among objects. Given a new image, Situate uses these models in an attempt to ground (i.e., to create a bounding box locating) each expected component of the situation in the image via an active search procedure. Situate uses the resulting grounding to compute a score indicating the degree to which the new image is judged to contain an instance of the situation. Such scores can be used to rank images in a collectio
Authors
(none)
Tags
Stats
Related papers
- Bridging The Gap Between Local Semantic Concepts And Bag Of Visual Words For Natural Scene Image Retrieval (2022)2.26
- Beyond Semantic Search: Towards Referential Anchoring In Composed Image Retrieval (2026)0.00
- Cross-modal Semantic Enhanced Interaction For Image-sentence Retrieval (2022)12.33
- Back To The Drawing Board: Rethinking Scene-level Sketch-based Image Retrieval (2025)0.00
- SCENIR: Visual Semantic Clarity Through Unsupervised Scene Graph Retrieval (2025)0.00
- Semantic Image Retrieval By Uniting Deep Neural Networks And Cognitive Architectures (2018)7.16
- Deepimagesearch: Benchmarking Multimodal Agents For Context-aware Image Retrieval In Visual Histories (2026)0.00
- Dynamic Spatial Verification For Large-scale Object-level Image Retrieval (2019)0.00