Candidate Set Re-ranking For Composed Image Retrieval With Dual Multi-modal Encoder
2023 Β· Zheyuan Liu, Weixuan Sun, Damien Teney, et al.
Abstract
Composed image retrieval aims to find an image that best matches a given multi-modal user query consisting of a reference image and text pair. Existing methods commonly pre-compute image embeddings over the entire corpus and compare these to a reference image embedding modified by the query text at test time. Such a pipeline is very efficient at test time since fast vector distances can be used to evaluate candidates, but modifying the reference image embedding guided only by a short textual description can be difficult, especially independent of potential candidates. An alternative approach is to allow interactions between the query and every possible candidate, i.e., reference-text-candidate triplets, and pick the best from the entire set. Though this approach is more discriminative, for large-scale datasets the computational cost is prohibitive since pre-computation of candidate embeddings is no longer possible. We propose to combine the merits of both schemes using a two-stage mode
Authors
(none)
Tags
Stats
Related papers
- Probabilistic Compositional Embeddings For Multimodal Image Retrieval (2022)13.80
- Bi-directional Training For Composed Image Retrieval Via Text Prompt Learning (2023)15.63
- Mcot-re: Multi-faceted Chain-of-thought And Re-ranking For Training-free Zero-shot Composed Image Retrieval (2025)0.00
- From Mapping To Composing: A Two-stage Framework For Zero-shot Composed Image Retrieval (2025)0.00
- Comparing Neighbors Together Makes It Easy: Jointly Comparing Multiple Candidates For Efficient And Effective Retrieval (2024)4.52
- Retrieve Fast, Rerank Smart: Cooperative And Joint Approaches For Improved Cross-modal Retrieval (2021)10.97
- Unicvr: From Alignment To Reranking For Unified Zero-shot Composed Visual Retrieval (2026)0.00
- Compositional Learning Of Image-text Query For Image Retrieval (2020)17.87