Learnable Pillar-based Re-ranking For Image-text Retrieval
2023 Β· Leigang Qu, Meng Liu, Wenjie Wang, et al.
Abstract
Image-text retrieval aims to bridge the modality gap and retrieve cross-modal content based on semantic similarities. Prior work usually focuses on the pairwise relations (i.e., whether a data sample matches another) but ignores the higher-order neighbor relations (i.e., a matching structure among multiple data samples). Re-ranking, a popular post-processing practice, has revealed the superiority of capturing neighbor relations in single-modality retrieval tasks. However, it is ineffective to directly extend existing re-ranking algorithms to image-text retrieval. In this paper, we analyze the reason from four perspectives, i.e., generalization, flexibility, sparsity, and asymmetry, and propose a novel learnable pillar-based re-ranking paradigm. Concretely, we first select top-ranked intra- and inter-modal neighbors as pillars, and then reconstruct data samples with the neighbor relations between them and the pillars. In this way, each sample can be mapped into a multimodal pillar space
Authors
(none)
Tags
Stats
Related papers
- Chain-of-thought Re-ranking For Image Retrieval Tasks (2025)1.81
- When Vision Meets Texts In Listwise Reranking (2026)0.00
- Rethinking Benchmarks For Cross-modal Image-text Retrieval (2023)13.11
- Retrieve Fast, Rerank Smart: Cooperative And Joint Approaches For Improved Cross-modal Retrieval (2021)10.97
- Contextual Similarity Aggregation With Self-attention For Visual Re-ranking (2021)0.00
- Discriminative Multi-view Privileged Information Learning For Image Re-ranking (2018)8.60
- Integrating Listwise Ranking Into Pairwise-based Image-text Retrieval (2023)9.16
- Matching Images And Text With Multi-modal Tensor Fusion And Re-ranking (2019)19.77