Fico-itr: Bridging Fine-grained And Coarse-grained Image-text Retrieval For Comparative Performance Analysis
2024 Β· Mikel Williams-Lekuona, Georgina Cosma
Abstract
In the field of Image-Text Retrieval (ITR), recent advancements have leveraged large-scale Vision-Language Pretraining (VLP) for Fine-Grained (FG) instance-level retrieval, achieving high accuracy at the cost of increased computational complexity. For Coarse-Grained (CG) category-level retrieval, prominent approaches employ Cross-Modal Hashing (CMH) to prioritise efficiency, albeit at the cost of retrieval performance. Due to differences in methodologies, FG and CG models are rarely compared directly within evaluations in the literature, resulting in a lack of empirical data quantifying the retrieval performance-efficiency tradeoffs between the two. This paper addresses this gap by introducing the FiCo-ITR library, which standardises evaluation methodologies for both FG and CG models, facilitating direct comparisons. We conduct empirical evaluations of representative models from both subfields, analysing precision, recall, and computational complexity across varying data scales. Our fi
Authors
(none)
Tags
Stats
Related papers
- Benchmark Granularity And Model Robustness For Image-text Retrieval (2024)0.00
- Hivlp: Hierarchical Vision-language Pre-training For Fast Image-text Retrieval (2022)0.00
- Image-text Retrieval: A Survey On Recent Research And Development (2022)13.93
- CFIR: Fast And Effective Long-text To Image Retrieval For Large Corpora (2024)7.16
- DVF: Advancing Robust And Accurate Fine-grained Image Retrieval With Retrieval Guidelines (2024)9.03
- FIGROTD: A Friendly-to-handle Dataset For Image Guided Retrieval With Optional Text (2025)0.00
- Lexlip: Lexicon-bottlenecked Language-image Pre-training For Large-scale Image-text Retrieval (2023)10.85
- Dynamic Contrastive Distillation For Image-text Retrieval (2022)11.76