Atomic: An Image/text Retrieval Test Collection To Support Multimedia Content Creation
2023 Β· Jheng-Hong Yang, Carlos Lassance, Rafael Sampaio de Rezende, et al.
Abstract
This paper presents the AToMiC (Authoring Tools for Multimedia Content) dataset, designed to advance research in image/text cross-modal retrieval. While vision-language pretrained transformers have led to significant improvements in retrieval effectiveness, existing research has relied on image-caption datasets that feature only simplistic image-text relationships and underspecified user models of retrieval tasks. To address the gap between these oversimplified settings and real-world applications for multimedia content creation, we introduce a new approach for building retrieval test collections. We leverage hierarchical structures and diverse domains of texts, styles, and types of images, as well as large-scale image-document associations embedded in Wikipedia. We formulate two tasks based on a realistic user model and validate our dataset through retrieval experiments using baseline models. AToMiC offers a testbed for scalable, diverse, and reproducible multimedia retrieval research
Authors
(none)
Tags
Stats
Related papers
- Embedding Arithmetic Of Multimodal Queries For Image Retrieval (2021)9.03
- Tevatron 2.0: Unified Document Retrieval Toolkit Across Scale, Language, And Modality (2025)3.58
- Entity Image And Mixed-modal Image Retrieval Datasets (2025)1.56
- Scimmir: Benchmarking Scientific Multi-modal Information Retrieval (2024)8.07
- Docmmir: A Framework For Document Multi-modal Information Retrieval (2025)3.46
- Rethinking Benchmarks For Cross-modal Image-text Retrieval (2023)13.11
- Learning Audio-video Modalities From Image Captions (2022)12.54
- Stacmr: Scene-text Aware Cross-modal Retrieval (2020)10.48