Invgc: Robust Cross-modal Retrieval By Inverse Graph Convolution
2023 Β· Xiangru Jian, Yimu Wang
Abstract
Over recent decades, significant advancements in cross-modal retrieval are mainly driven by breakthroughs in visual and linguistic modeling. However, a recent study shows that multi-modal data representations tend to cluster within a limited convex cone (as representation degeneration problem), which hinders retrieval performance due to the inseparability of these representations. In our study, we first empirically validate the presence of the representation degeneration problem across multiple cross-modal benchmarks and methods. Next, to address it, we introduce a novel method, called InvGC, a post-processing technique inspired by graph convolution and average pooling. Specifically, InvGC defines the graph topology within the datasets and then applies graph convolution in a subtractive manner. This method effectively separates representations by increasing the distances between data points. To improve the efficiency and effectiveness of InvGC, we propose an advanced graph topology, Lo
Authors
(none)
Tags
Stats
Related papers
- Modeling Text With Graph Convolutional Network For Cross-modal Information Retrieval (2018)11.85
- Graph Convolution Based Efficient Re-ranking For Visual Retrieval (2023)9.92
- Multi-modal Retrieval Using Graph Neural Networks (2020)0.00
- Look, Imagine And Match: Improving Textual-visual Cross-modal Retrieval With Generative Models (2017)18.52
- Retrieval-guided Cross-view Image Synthesis (2024)0.00
- Cross-modality Sub-image Retrieval Using Contrastive Multimodal Image Representations (2022)6.32
- Revisiting Cross Modal Retrieval (2018)0.00
- Image Retrieval For Structure-from-motion Via Graph Convolutional Network (2020)9.59