Image-text Retrieval: A Survey On Recent Research And Development
2022 Β· Min Cao, Shiping Li, Juntao Li, et al.
Abstract
In the past few years, cross-modal image-text retrieval (ITR) has experienced increased interest in the research community due to its excellent research value and broad real-world application. It is designed for the scenarios where the queries are from one modality and the retrieval galleries from another modality. This paper presents a comprehensive and up-to-date survey on the ITR approaches from four perspectives. By dissecting an ITR system into two processes: feature extraction and feature alignment, we summarize the recent advance of the ITR approaches from these two perspectives. On top of this, the efficiency-focused study on the ITR system is introduced as the third perspective. To keep pace with the times, we also provide a pioneering overview of the cross-modal pre-training ITR approaches as the fourth perspective. Finally, we outline the common benchmark datasets and valuation metric for ITR, and conduct the accuracy comparison among the representative ITR approaches. Some
Authors
(none)
Tags
Stats
Related papers
- Fico-itr: Bridging Fine-grained And Coarse-grained Image-text Retrieval For Comparative Performance Analysis (2024)3.58
- Anatomy-aware Conditional Image-text Retrieval (2025)0.00
- Self-supervised Cross-modal Text-image Time Series Retrieval In Remote Sensing (2025)3.58
- Lexlip: Lexicon-bottlenecked Language-image Pre-training For Large-scale Image-text Retrieval (2023)10.85
- Benchmark Granularity And Model Robustness For Image-text Retrieval (2024)0.00
- Rethinking Benchmarks For Cross-modal Image-text Retrieval (2023)13.11
- Integrating Listwise Ranking Into Pairwise-based Image-text Retrieval (2023)9.16
- HGAN: Hierarchical Graph Alignment Network For Image-text Retrieval (2022)11.93