SCENIR: Visual Semantic Clarity Through Unsupervised Scene Graph Retrieval
2025 Β· Nikolaos Chaidos, Angeliki Dimitriou, Maria Lymperaiou, et al.
Abstract
Despite the dominance of convolutional and transformer-based architectures in image-to-image retrieval, these models are prone to biases arising from low-level visual features, such as color. Recognizing the lack of semantic understanding as a key limitation, we propose a novel scene graph-based retrieval framework that emphasizes semantic content over superficial image characteristics. Prior approaches to scene graph retrieval predominantly rely on supervised Graph Neural Networks (GNNs), which require ground truth graph pairs driven from image captions. However, the inconsistency of caption-based supervision stemming from variable text encodings undermine retrieval reliability. To address these, we present SCENIR, a Graph Autoencoder-based unsupervised retrieval framework, which eliminates the dependence on labeled training data. Our model demonstrates superior performance across metrics and runtime efficiency, outperforming existing vision-based, multimodal, and supervised GNN appro
Authors
(none)
Tags
Stats
Related papers
- Image-to-image Retrieval By Learning Similarity Between Scene Graphs (2020)12.02
- Back To The Drawing Board: Rethinking Scene-level Sketch-based Image Retrieval (2025)0.00
- Scene Graph Based Image Retrieval -- A Case Study On The CLEVR Dataset (2019)0.00
- Scene Graph Embeddings Using Relative Similarity Supervision (2021)7.50
- Through The Prism: Importance-aware Scene Graphs For Image Retrieval (2025)0.00
- Learning 3D Semantic Scene Graphs From 3D Indoor Reconstructions (2020)17.18
- Beyond Visual Semantics: Exploring The Role Of Scene Text In Image Understanding (2019)9.59
- Stacmr: Scene-text Aware Cross-modal Retrieval (2020)10.48