Multi-modal Retrieval Using Graph Neural Networks
2020 Β· Aashish Kumar Misraa, Ajinkya Kale, Pranav Aggarwal, et al.
Abstract
Most real world applications of image retrieval such as Adobe Stock, which is a marketplace for stock photography and illustrations, need a way for users to find images which are both visually (i.e. aesthetically) and conceptually (i.e. containing the same salient objects) as a query image. Learning visual-semantic representations from images is a well studied problem for image retrieval. Filtering based on image concepts or attributes is traditionally achieved with index-based filtering (e.g. on textual tags) or by re-ranking after an initial visual embedding based retrieval. In this paper, we learn a joint vision and concept embedding in the same high-dimensional space. This joint model gives the user fine-grained control over the semantics of the result set, allowing them to explore the catalog of images more rapidly. We model the visual and concept relationships as a graph structure, which captures the rich information through node neighborhood. This graph structure helps us learn
Authors
(none)
Tags
Stats
Related papers
- Modeling Text With Graph Convolutional Network For Cross-modal Information Retrieval (2018)11.85
- Multi-modal Image Retrieval With Random Walk On Multi-layer Graphs (2016)6.77
- Query By Semantic Sketch (2019)0.00
- Multi-modal Reasoning Graph For Scene-text Based Fine-grained Image Classification And Retrieval (2020)11.29
- Invgc: Robust Cross-modal Retrieval By Inverse Graph Convolution (2023)3.95
- Visual Model Checking: Graph-based Inference Of Visual Routines For Image Retrieval (2026)0.00
- Image-to-image Retrieval By Learning Similarity Between Scene Graphs (2020)12.02
- SCENIR: Visual Semantic Clarity Through Unsupervised Scene Graph Retrieval (2025)0.00