IMRAM: Iterative Matching With Recurrent Attention Memory For Cross-modal Image-text Retrieval
2020 Β· Hui Chen, Guiguang Ding, Xudong Liu, et al.
Abstract
Enabling bi-directional retrieval of images and texts is important for understanding the correspondence between vision and language. Existing methods leverage the attention mechanism to explore such correspondence in a fine-grained manner. However, most of them consider all semantics equally and thus align them uniformly, regardless of their diverse complexities. In fact, semantics are diverse (i.e. involving different kinds of semantic concepts), and humans usually follow a latent structure to combine them into understandable languages. It may be difficult to optimally capture such sophisticated correspondences in existing methods. In this paper, to address such a deficiency, we propose an Iterative Matching with Recurrent Attention Memory (IMRAM) method, in which correspondences between images and texts are captured with multiple steps of alignments. Specifically, we introduce an iterative matching scheme to explore such fine-grained correspondence progressively. A memory distillatio
Authors
(none)
Tags
Stats
Related papers
- Cross-modal Implicit Relation Reasoning And Aligning For Text-to-image Person Retrieval (2023)18.15
- ARTEMIS: Attention-based Retrieval With Text-explicit Matching And Implicit Similarity (2022)0.00
- CAMP: Cross-modal Adaptive Message Passing For Text-image Retrieval (2019)18.38
- Improving Image Recognition By Retrieving From Web-scale Image-text Data (2023)9.41
- Multilingual Text-to-image Person Retrieval Via Bidirectional Relation Reasoning And Aligning (2025)2.35
- Revisiting Cross Modal Retrieval (2018)0.00
- Cross-modal Coherence For Text-to-image Retrieval (2021)6.77
- Embedding Arithmetic Of Multimodal Queries For Image Retrieval (2021)9.03