Zero-shot Everything Sketch-based Image Retrieval, And In Explainable Style
2023 Β· Fengyin Lin, Mingkang Li, da Li, et al.
Abstract
This paper studies the problem of zero-short sketch-based image retrieval (ZS-SBIR), however with two significant differentiators to prior art (i) we tackle all variants (inter-category, intra-category, and cross datasets) of ZS-SBIR with just one network (``everything''), and (ii) we would really like to understand how this sketch-photo matching operates (``explainable''). Our key innovation lies with the realization that such a cross-modal matching problem could be reduced to comparisons of groups of key local patches -- akin to the seasoned ``bag-of-words'' paradigm. Just with this change, we are able to achieve both of the aforementioned goals, with the added benefit of no longer requiring external semantic knowledge. Technically, ours is a transformer-based cross-modal network, with three novel components (i) a self-attention module with a learnable tokenizer to produce visual tokens that correspond to the most informative local regions, (ii) a cross-attention module to compute lo
Authors
(none)
Tags
Stats
Related papers
- An Efficient Framework For Zero-shot Sketch-based Image Retrieval (2021)13.65
- Semantic Adversarial Network For Zero-shot Sketch-based Image Retrieval (2019)10.74
- Stacked Semantic-guided Network For Zero-shot Sketch-based Image Retrieval (2019)0.00
- CLIP For All Things Zero-shot Sketch-based Image Retrieval, Fine-grained Or Not (2023)15.54
- Domain-smoothing Network For Zero-shot Sketch-based Image Retrieval (2021)13.92
- Doodle To Search: Practical Zero-shot Sketch-based Image Retrieval (2019)16.75
- Adapt And Align To Improve Zero-shot Sketch-based Image Retrieval (2023)0.00
- Cross-modal Attention Alignment Network With Auxiliary Text Description For Zero-shot Sketch-based Image Retrieval (2024)4.52