Flickr30k-cfq: A Compact And Fragmented Query Dataset For Text-image Retrieval
2024 Β· Haoyu Liu, Yaoxian Song, Xuwu Wang, et al.
Abstract
With the explosive growth of multi-modal information on the Internet, unimodal search cannot satisfy the requirement of Internet applications. Text-image retrieval research is needed to realize high-quality and efficient retrieval between different modalities. Existing text-image retrieval research is mostly based on general vision-language datasets (e.g. MS-COCO, Flickr30K), in which the query utterance is rigid and unnatural (i.e. verbosity and formality). To overcome the shortcoming, we construct a new Compact and Fragmented Query challenge dataset (named Flickr30K-CFQ) to model text-image retrieval task considering multiple query content and style, including compact and fine-grained entity-relation corpus. We propose a novel query-enhanced text-image retrieval method using prompt engineering based on LLM. Experiments show that our proposed Flickr30-CFQ reveals the insufficiency of existing vision-language datasets in realistic text-image tasks. Our LLM-based Query-enhanced method a
Authors
(none)
Tags
Stats
Related papers
- CFIR: Fast And Effective Long-text To Image Retrieval For Large Corpora (2024)7.16
- Training And Challenging Models For Text-guided Fashion Image Retrieval (2022)0.00
- Recqr: Incorporating Conversational Query Rewriting To Improve Multimodal Image Retrieval (2026)0.00
- Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models And Vision Language Models (2024)8.82
- Few Shots Text To Image Retrieval: New Benchmarking Dataset And Optimization Methods (2026)0.00
- Rethinking Benchmarks For Cross-modal Image-text Retrieval (2023)13.11
- Benchmark Granularity And Model Robustness For Image-text Retrieval (2024)0.00
- Data Roaming And Quality Assessment For Composed Image Retrieval (2023)11.39