Cross-modal Image Retrieval With Deep Mutual Information Maximization
2021 Β· Chunbin Gu, Jiajun Bu, Xixi Zhou, et al.
Abstract
In this paper, we study the cross-modal image retrieval, where the inputs contain a source image plus some text that describes certain modifications to this image and the desired image. Prior work usually uses a three-stage strategy to tackle this task: 1) extract the features of the inputs; 2) fuse the feature of the source image and its modified text to obtain fusion feature; 3) learn a similarity metric between the desired image and the source image + modified text by using deep metric learning. Since classical image/text encoders can learn the useful representation and common pair-based loss functions of distance metric learning are enough for cross-modal retrieval, people usually improve retrieval accuracy by designing new fusion networks. However, these methods do not successfully handle the modality gap caused by the inconsistent distribution and representation of the features of different modalities, which greatly influences the feature fusion and similarity learning. To allevi
Authors
(none)
Tags
Stats
Related papers
- Multi-modal Mutual Information Maximization: A Novel Approach For Unsupervised Deep Cross-modal Hashing (2021)12.02
- Deep Multimodal Image-text Embeddings For Automatic Cross-media Retrieval (2020)0.00
- Revisiting Cross Modal Retrieval (2018)0.00
- Intra-modal Constraint Loss For Image-text Retrieval (2022)8.33
- Maximal Matching Matters: Preventing Representation Collapse For Robust Cross-modal Retrieval (2025)2.26
- Joint Fusion And Encoding: Advancing Multimodal Retrieval From The Ground Up (2025)0.00
- Integrating Information Theory And Adversarial Learning For Cross-modal Retrieval (2021)10.97
- Mire: Enhancing Multimodal Queries Representation Via Fusion-free Modality Interaction For Multimodal Retrieval (2024)3.81