Cross-modal Coherence For Text-to-image Retrieval
2021 Β· Malihe Alikhani, Fangda Han, Hareesh Ravi, et al.
Abstract
Common image-text joint understanding techniques presume that images and the associated text can universally be characterized by a single implicit model. However, co-occurring images and text can be related in qualitatively different ways, and explicitly modeling it could improve the performance of current joint understanding models. In this paper, we train a Cross-Modal Coherence Modelfor text-to-image retrieval task. Our analysis shows that models trained with image--text coherence relations can retrieve images originally paired with target text more often than coherence-agnostic models. We also show via human evaluation that images retrieved by the proposed coherence-aware model are preferred over a coherence-agnostic baseline by a huge margin. Our findings provide insights into the ways that different modalities communicate and the role of coherence relations in capturing commonsense inferences in text and imagery.
Authors
(none)
Tags
Stats
Related papers
- Revisiting Cross Modal Retrieval (2018)0.00
- Intra-modal Constraint Loss For Image-text Retrieval (2022)8.33
- Image Search Using Multilingual Texts: A Cross-modal Learning Approach Between Image And Text (2019)0.00
- Look, Imagine And Match: Improving Textual-visual Cross-modal Retrieval With Generative Models (2017)18.52
- Tsvc:tripartite Learning With Semantic Variation Consistency For Robust Image-text Retrieval (2025)3.58
- Preserving Semantic Neighborhoods For Robust Cross-modal Retrieval (2020)10.07
- Cross-modal Implicit Relation Reasoning And Aligning For Text-to-image Person Retrieval (2023)18.15
- CAMP: Cross-modal Adaptive Message Passing For Text-image Retrieval (2019)18.38