COBRA: Contrastive Bi-modal Representation Algorithm
2020 Β· Vishaal Udandarao, Abhishek Maiti, Deepak Srivatsav, et al.
Abstract
There are a wide range of applications that involve multi-modal data, such as cross-modal retrieval, visual question-answering, and image captioning. Such applications are primarily dependent on aligned distributions of the different constituent modalities. Existing approaches generate latent embeddings for each modality in a joint fashion by representing them in a common manifold. However these joint embedding spaces fail to sufficiently reduce the modality gap, which affects the performance in downstream tasks. We hypothesize that these embeddings retain the intra-class relationships but are unable to preserve the inter-class dynamics. In this paper, we present a novel framework COBRA that aims to train two modalities (image and text) in a joint fashion inspired by the Contrastive Predictive Coding (CPC) and Noise Contrastive Estimation (NCE) paradigms which preserve both inter and intra-class relationships. We empirically show that this framework reduces the modality gap significant
Authors
(none)
Tags
Stats
Related papers
- Probabilistic Embeddings For Cross-modal Retrieval (2021)21.70
- COBRA: Combinatorial Retrieval Augmentation For Few-shot Adaptation (2024)2.26
- Cross-modality Sub-image Retrieval Using Contrastive Multimodal Image Representations (2022)6.32
- Multimodal Contrastive Training For Visual Representation Learning (2021)16.32
- Cobit: A Contrastive Bi-directional Image-text Generation Model (2023)0.00
- Multi-modal Alignment Using Representation Codebook (2022)12.74
- CODER: Coupled Diversity-sensitive Momentum Contrastive Learning For Image-text Retrieval (2022)13.72
- Cross-modal Coherence For Text-to-image Retrieval (2021)6.77