Learning With Multi-modal Gradient Attention For Explainable Composed Image Retrieval
2023 Β· Prateksha Udhayanan, Srikrishna Karanam, Balaji Vasan Srinivasan
Abstract
We consider the problem of composed image retrieval that takes an input query consisting of an image and a modification text indicating the desired changes to be made on the image and retrieves images that match these changes. Current state-of-the-art techniques that address this problem use global features for the retrieval, resulting in incorrect localization of the regions of interest to be modified because of the global nature of the features, more so in cases of real-world, in-the-wild images. Since modifier texts usually correspond to specific local changes in an image, it is critical that models learn local features to be able to both localize and retrieve better. To this end, our key novelty is a new gradient-attention-based learning objective that explicitly forces the model to focus on the local regions of interest being modified in each retrieval step. We achieve this by first proposing a new visual image attention computation technique, which we call multi-modal gradient at
Authors
(none)
Tags
Stats
Related papers
- Bi-directional Training For Composed Image Retrieval Via Text Prompt Learning (2023)15.63
- Compositional Learning Of Image-text Query For Image Retrieval (2020)17.87
- Composed Image Retrieval With Text Feedback Via Multi-grained Uncertainty Regularization (2022)0.00
- Composed Image Retrieval Using Contrastive Learning And Task-oriented Clip-based Features (2023)16.84
- Image Search With Text Feedback By Additive Attention Compositional Learning (2022)0.00
- Visual Similarity Attention (2019)0.00
- NEUCORE: Neural Concept Reasoning For Composed Image Retrieval (2023)0.00
- Compositional Image Retrieval Via Instruction-aware Contrastive Learning (2024)0.00