A Deep Local And Global Scene-graph Matching For Image-text Retrieval
2021 Β· Manh-Duy Nguyen, Binh T. Nguyen, Cathal Gurrin
Abstract
Conventional approaches to image-text retrieval mainly focus on indexing visual objects appearing in pictures but ignore the interactions between these objects. Such objects occurrences and interactions are equivalently useful and important in this field as they are usually mentioned in the text. Scene graph presentation is a suitable method for the image-text matching challenge and obtained good results due to its ability to capture the inter-relationship information. Both images and text are represented in scene graph levels and formulate the retrieval challenge as a scene graph matching challenge. In this paper, we introduce the Local and Global Scene Graph Matching (LGSGM) model that enhances the state-of-the-art method by integrating an extra graph convolution network to capture the general information of a graph. Specifically, for a pair of scene graphs of an image and its caption, two separate models are used to learn the features of each graph's nodes and edges. Then a Siamese-
Authors
(none)
Tags
Stats
Related papers
- Scene Graph Based Fusion Network For Image-text Retrieval (2023)4.52
- Visual Semantic Reasoning For Image-text Matching (2019)25.23
- Multi-modal Reasoning Graph For Scene-text Based Fine-grained Image Classification And Retrieval (2020)11.29
- Scene Text Retrieval Via Joint Text Detection And Similarity Learning (2021)16.16
- Stacmr: Scene-text Aware Cross-modal Retrieval (2020)10.48
- Image-to-image Retrieval By Learning Similarity Between Scene Graphs (2020)12.02
- Remote Sensing Cross-modal Text-image Retrieval Based On Global And Local Information (2022)19.48
- Modeling Text With Graph Convolutional Network For Cross-modal Information Retrieval (2018)11.85