Integrating Information Theory And Adversarial Learning For Cross-modal Retrieval
2021 Β· Wei Chen, Yu Liu, Erwin M. Bakker, et al.
Abstract
Accurately matching visual and textual data in cross-modal retrieval has been widely studied in the multimedia community. To address these challenges posited by the heterogeneity gap and the semantic gap, we propose integrating Shannon information theory and adversarial learning. In terms of the heterogeneity gap, we integrate modality classification and information entropy maximization adversarially. For this purpose, a modality classifier (as a discriminator) is built to distinguish the text and image modalities according to their different statistical properties. This discriminator uses its output probabilities to compute Shannon information entropy, which measures the uncertainty of the modality classification it performs. Moreover, feature encoders (as a generator) project uni-modal features into a commonly shared space and attempt to fool the discriminator by maximizing its output information entropy. Thus, maximizing information entropy gradually reduces the distribution discrep
Authors
(none)
Tags
Stats
Related papers
- Maximal Matching Matters: Preventing Representation Collapse For Robust Cross-modal Retrieval (2025)2.26
- Cross-modal Image Retrieval With Deep Mutual Information Maximization (2021)9.59
- Multimodal Representation Alignment For Cross-modal Information Retrieval (2025)0.00
- Simple To Complex Cross-modal Learning To Rank (2017)13.84
- Adversarial Cross-modal Retrieval Via Learning And Transferring Single-modal Similarities (2019)8.60
- Joint Fusion And Encoding: Advancing Multimodal Retrieval From The Ground Up (2025)0.00
- Preserving Semantic Neighborhoods For Robust Cross-modal Retrieval (2020)10.07
- Cross-modal Search Method Of Technology Video Based On Adversarial Learning And Feature Fusion (2022)0.00