Expressing Objects Just Like Words: Recurrent Visual Embedding For Image-text Matching
2020 Β· Tianlang Chen, Jiebo Luo
Abstract
Existing image-text matching approaches typically infer the similarity of an image-text pair by capturing and aggregating the affinities between the text and each independent object of the image. However, they ignore the connections between the objects that are semantically related. These objects may collectively determine whether the image corresponds to a text or not. To address this problem, we propose a Dual Path Recurrent Neural Network (DP-RNN) which processes images and sentences symmetrically by recurrent neural networks (RNN). In particular, given an input image-text pair, our model reorders the image objects based on the positions of their most related words in the text. In the same way as extracting the hidden features from word embeddings, the model leverages RNN to extract high-level object features from the reordered object inputs. We validate that the high-level object features contain useful joint information of semantically related objects, which benefit the retrieval
Authors
(none)
Tags
Stats
Related papers
- Deep Multimodal Image-text Embeddings For Automatic Cross-media Retrieval (2020)0.00
- Visual Semantic Reasoning For Image-text Matching (2019)25.23
- Webly Supervised Joint Embedding For Cross-modal Image-text Retrieval (2018)13.17
- Learning To Embed Semantic Similarity For Joint Image-text Retrieval (2022)7.50
- Transformer Reasoning Network For Image-text Matching And Retrieval (2020)16.15
- Deep Boosting Learning: A Brand-new Cooperative Approach For Image-text Matching (2024)9.73
- IMRAM: Iterative Matching With Recurrent Attention Memory For Cross-modal Image-text Retrieval (2020)19.22
- Modeling Text With Graph Convolutional Network For Cross-modal Information Retrieval (2018)11.85