HADA: A Graph-based Amalgamation Framework In Image-text Retrieval
2023 Β· Manh-Duy Nguyen, Binh T. Nguyen, Cathal Gurrin
Abstract
Many models have been proposed for vision and language tasks, especially the image-text retrieval task. All state-of-the-art (SOTA) models in this challenge contained hundreds of millions of parameters. They also were pretrained on a large external dataset that has been proven to make a big improvement in overall performance. It is not easy to propose a new model with a novel architecture and intensively train it on a massive dataset with many GPUs to surpass many SOTA models, which are already available to use on the Internet. In this paper, we proposed a compact graph-based framework, named HADA, which can combine pretrained models to produce a better result, rather than building from scratch. First, we created a graph structure in which the nodes were the features extracted from the pretrained models and the edges connecting them. The graph structure was employed to capture and fuse the information from every pretrained model with each other. Then a graph neural network was applied
Authors
(none)
Tags
Stats
Related papers
- HGAN: Hierarchical Graph Alignment Network For Image-text Retrieval (2022)11.93
- Scene Graph Based Fusion Network For Image-text Retrieval (2023)4.52
- Hyperdimensional Cross-modal Alignment Of Frozen Language And Image Models For Efficient Image Captioning (2026)0.00
- DAFM: Dynamic Adaptive Fusion For Multi-model Collaboration In Composed Image Retrieval (2025)0.00
- Improving Image Recognition By Retrieving From Web-scale Image-text Data (2023)9.41
- ALADIN: Distilling Fine-grained Alignment Scores For Efficient Image-text Matching And Retrieval (2022)14.00
- Fine-grained Video-text Retrieval With Hierarchical Graph Reasoning (2020)18.27
- Mambahash: Visual State Space Deep Hashing Model For Large-scale Image Retrieval (2025)3.95