Graph Neural Networks In Vision-language Image Understanding: A Survey
2023 Β· Henry Senior, Gregory Slabaugh, Shanxin Yuan, et al.
Abstract
2D image understanding is a complex problem within computer vision, but it holds the key to providing human-level scene comprehension. It goes further than identifying the objects in an image, and instead, it attempts to understand the scene. Solutions to this problem form the underpinning of a range of tasks, including image captioning, visual question answering (VQA), and image retrieval. Graphs provide a natural way to represent the relational arrangement between objects in an image, and thus, in recent years graph neural networks (GNNs) have become a standard component of many 2D image understanding pipelines, becoming a core architectural component, especially in the VQA group of tasks. In this survey, we review this rapidly evolving field and we provide a taxonomy of graph types used in 2D image understanding approaches, a comprehensive list of the GNN models used in this domain, and a roadmap of future potential developments. To the best of our knowledge, this is the first compr
Authors
(none)
Tags
Stats
Related papers
- Hypergraph Vision Transformers: Images Are More Than Nodes, More Than Edges (2025)4.52
- Multi-modal Retrieval Using Graph Neural Networks (2020)0.00
- Learning 3D Semantic Scene Graphs From 3D Indoor Reconstructions (2020)17.18
- Visualsem: A High-quality Knowledge Graph For Vision And Language (2020)14.39
- Visual Model Checking: Graph-based Inference Of Visual Routines For Image Retrieval (2026)0.00
- Answering Visual-relational Queries In Web-extracted Knowledge Graphs (2017)0.00
- Deep Learning For Fine-grained Image Analysis: A Survey (2019)0.00
- SCENIR: Visual Semantic Clarity Through Unsupervised Scene Graph Retrieval (2025)0.00