Infocir: Multimedia Analysis For Composed Image Retrieval
2026 Β· Ioannis Dravilas, Ioannis Kapetangeorgis, Anastasios Latsoudis, et al.
Abstract
Composed Image Retrieval (CIR) allows users to search for images by combining a reference image with a text prompt that describes desired modifications. While vision-language models like CLIP have popularized this task by embedding multiple modalities into a joint space, developers still lack tools that reveal how these multimodal prompts interact with embedding spaces and why small wording changes can dramatically alter the results. We present InfoCIR, a visual analytics system that closes this gap by coupling retrieval, explainability, and prompt engineering in a single, interactive dashboard. InfoCIR integrates a state-of-the-art CIR back-end (SEARLE arXiv:2303.15247) with a six-panel interface that (i) lets users compose image + text queries, (ii) projects the top-k results into a low-dimensional space using Uniform Manifold Approximation and Projection (UMAP) for spatial reasoning, (iii) overlays similarity-based saliency maps and gradient-derived token-attribution bars for local
Authors
(none)
Tags
Stats
Related papers
- HINT: Composed Image Retrieval With Dual-path Compositional Contextualized Network (2026)0.78
- TMCIR: Token Merge Benefits Composed Image Retrieval (2025)0.00
- CSMCIR: Cot-enhanced Symmetric Alignment With Memory Bank For Composed Image Retrieval (2026)0.00
- A Sanity Check On Composed Image Retrieval (2026)0.00
- Instance-level Composed Image Retrieval (2025)0.00
- Context-cir: Learning From Concepts In Text For Composed Image Retrieval (2025)4.67
- Scaling Prompt Instructed Zero Shot Composed Image Retrieval With Image-only Data (2025)0.00
- Cir-cot: Towards Interpretable Composed Image Retrieval Via End-to-end Chain-of-thought Reasoning (2025)0.00