EDIS: Entity-driven Image Search Over Multimodal Web Content
2023 Β· Siqi Liu, Weixi Feng, Tsu-Jui Fu, et al.
Abstract
Making image retrieval methods practical for real-world search applications requires significant progress in dataset scales, entity comprehension, and multimodal information fusion. In this work, we introduce \textbf\{E\}ntity-\textbf\{D\}riven \textbf\{I\}mage \textbf\{S\}earch (EDIS), a challenging dataset for cross-modal image search in the news domain. EDIS consists of 1 million web images from actual search engine results and curated datasets, with each image paired with a textual description. Unlike datasets that assume a small set of single-modality candidates, EDIS reflects real-world web image search scenarios by including a million multimodal image-text pairs as candidates. EDIS encourages the development of retrieval models that simultaneously address cross-modal information fusion and matching. To achieve accurate ranking results, a model must: 1) understand named entities and events from text queries, 2) ground entities onto images or text descriptions, and 3) effectively
Authors
(none)
Tags
Stats
Related papers
- Entity Image And Mixed-modal Image Retrieval Datasets (2025)1.56
- Deepimagesearch: Benchmarking Multimodal Agents For Context-aware Image Retrieval In Visual Histories (2026)0.00
- Unifying Multimodal Retrieval Via Document Screenshot Embedding (2024)9.41
- Uniecs: Unified Multimodal E-commerce Search Framework With Gated Cross-modal Fusion (2025)2.60
- IDMR: Towards Instance-driven Precise Visual Correspondence In Multimodal Retrieval (2025)2.29
- Rethinking Composed Image Retrieval Evaluation: A Fine-grained Benchmark From Image Editing (2026)0.00
- Transformer-empowered Multi-modal Item Embedding For Enhanced Image Search In E-commerce (2023)4.52
- Resedis: A Dataset For Referring-based Object Search Across Large-scale Image Collections (2025)0.00