Understanding, Categorizing And Predicting Semantic Image-text Relations
2019 Β· Christian Otto, Matthias Springstein, Avishek Anand, et al.
Abstract
Two modalities are often used to convey information in a complementary and beneficial manner, e.g., in online news, videos, educational resources, or scientific publications. The automatic understanding of semantic correlations between text and associated images as well as their interplay has a great potential for enhanced multimodal web search and recommender systems. However, automatic understanding of multimodal information is still an unsolved research problem. Recent approaches such as image captioning focus on precisely describing visual content and translating it to text, but typically address neither semantic interpretations nor the specific role or purpose of an image-text constellation. In this paper, we go beyond previous work and investigate, inspired by research in visual communication, useful semantic image-text relations for multimodal information retrieval. We derive a categorization of eight semantic image-text classes (e.g., "illustration" or "anchorage") and show how
Authors
(none)
Tags
Stats
Related papers
- Beyond Visual Semantics: Exploring The Role Of Scene Text In Image Understanding (2019)9.59
- Preserving Semantic Neighborhoods For Robust Cross-modal Retrieval (2020)10.07
- Cross-modal Semantic Enhanced Interaction For Image-sentence Retrieval (2022)12.33
- Image Understanding And The Web (2020)8.09
- Cross-modal Coherence For Text-to-image Retrieval (2021)6.77
- Image-text Retrieval Via Preserving Main Semantics Of Vision (2023)10.22
- How To Read Paintings: Semantic Art Understanding With Multi-modal Retrieval (2018)13.93
- Visual Semantic Reasoning For Image-text Matching (2019)25.23