Image-text Retrieval With Binary And Continuous Label Supervision
2022 Β· Zheng Li, Caili Guo, Zerun Feng, et al.
Abstract
Most image-text retrieval work adopts binary labels indicating whether a pair of image and text matches or not. Such a binary indicator covers only a limited subset of image-text semantic relations, which is insufficient to represent relevance degrees between images and texts described by continuous labels such as image captions. The visual-semantic embedding space obtained by learning binary labels is incoherent and cannot fully characterize the relevance degrees. In addition to the use of binary labels, this paper further incorporates continuous pseudo labels (generally approximated by text similarity between captions) to indicate the relevance degrees. To learn a coherent embedding space, we propose an image-text retrieval framework with Binary and Continuous Label Supervision (BCLS), where binary labels are used to guide the retrieval model to learn limited binary correlations, and continuous labels are complementary to the learning of image-text semantic relations. For the learnin
Authors
(none)
Tags
Stats
Related papers
- Image-text Retrieval Via Preserving Main Semantics Of Vision (2023)10.22
- Webly Supervised Joint Embedding For Cross-modal Image-text Retrieval (2018)13.17
- Revising Image-text Retrieval Via Multi-modal Entailment (2022)0.00
- Text-video Retrieval With Global-local Semantic Consistent Learning (2024)8.75
- Constructing Phrase-level Semantic Labels To Form Multi-grained Supervision For Image-text Retrieval (2021)8.09
- Beyond Visual Semantics: Exploring The Role Of Scene Text In Image Understanding (2019)9.59
- Using Text To Teach Image Retrieval (2020)5.24
- Tsvc:tripartite Learning With Semantic Variation Consistency For Robust Image-text Retrieval (2025)3.58