Towards Cross-modal Retrieval In Chinese Cultural Heritage Documents: Dataset And Solution
2025 Β· Junyi Yuan, Jian Zhang, Fangyu Wu, et al.
Abstract
China has a long and rich history, encompassing a vast cultural heritage that includes diverse multimodal information, such as silk patterns, Dunhuang murals, and their associated historical narratives. Cross-modal retrieval plays a pivotal role in understanding and interpreting Chinese cultural heritage by bridging visual and textual modalities to enable accurate text-to-image and image-to-text retrieval. However, despite the growing interest in multimodal research, there is a lack of specialized datasets dedicated to Chinese cultural heritage, limiting the development and evaluation of cross-modal learning models in this domain. To address this gap, we propose a multimodal dataset named CulTi, which contains 5,726 image-text pairs extracted from two series of professional documents, respectively related to ancient Chinese silk and Dunhuang murals. Compared to existing general-domain multimodal datasets, CulTi presents a challenge for cross-modal retrieval: the difficulty of local ali
Authors
(none)
Tags
Stats
Related papers
- Evaluating Perspectival Biases In Cross-modal Retrieval (2025)0.00
- Multilingual Diversity Improves Vision-language Representations (2024)2.26
- M3DR: Towards Universal Multilingual Multimodal Document Retrieval (2025)0.00
- Revisiting Cross Modal Retrieval (2018)0.00
- Qilin: A Multimodal Information Retrieval Dataset With App-level User Sessions (2025)3.58
- Docmmir: A Framework For Document Multi-modal Information Retrieval (2025)3.46
- Image Search Using Multilingual Texts: A Cross-modal Learning Approach Between Image And Text (2019)0.00
- Cross-modal Retrieval: A Systematic Review Of Methods And Future Directions (2023)12.81