Image Search With Text Feedback By Additive Attention Compositional Learning
2022 Β· Yuxin Tian, Shawn Newsam, Kofi Boakye
Abstract
Effective image retrieval with text feedback stands to impact a range of real-world applications, such as e-commerce. Given a source image and text feedback that describes the desired modifications to that image, the goal is to retrieve the target images that resemble the source yet satisfy the given modifications by composing a multi-modal (image-text) query. We propose a novel solution to this problem, Additive Attention Compositional Learning (AACL), that uses a multi-modal transformer-based architecture and effectively models the image-text contexts. Specifically, we propose a novel image-text composition module based on additive attention that can be seamlessly plugged into deep neural networks. We also introduce a new challenging benchmark derived from the Shopping100k dataset. AACL is evaluated on three large-scale datasets (FashionIQ, Fashion200k, and Shopping100k), each with strong baselines. Extensive experiments show that AACL achieves new state-of-the-art results on all thr
Authors
(none)
Tags
Stats
Related papers
- Modality-agnostic Attention Fusion For Visual Search With Text Feedback (2020)0.00
- Compositional Learning Of Image-text Query For Image Retrieval (2020)17.87
- SAC: Semantic Attention Composition For Text-conditioned Image Retrieval (2020)11.49
- Composing Text And Image For Image Retrieval - An Empirical Odyssey (2018)18.71
- Zero-shot Composed Text-image Retrieval (2023)0.00
- Training And Challenging Models For Text-guided Fashion Image Retrieval (2022)0.00
- Benchmarking Robustness Of Text-image Composed Retrieval (2023)2.23
- Bi-directional Training For Composed Image Retrieval Via Text Prompt Learning (2023)15.63