Medclip: Contrastive Learning From Unpaired Medical Images And Text
2022 Β· Zifeng Wang, Zhenbang Wu, Dinesh Agarwal, et al.
Abstract
Existing vision-text contrastive learning like CLIP aims to match the paired image and caption embeddings while pushing others apart, which improves representation transferability and supports zero-shot prediction. However, medical image-text datasets are orders of magnitude below the general images and captions from the internet. Moreover, previous methods encounter many false negatives, i.e., images and reports from separate patients probably carry the same semantics but are wrongly treated as negatives. In this paper, we decouple images and texts for multimodal contrastive learning thus scaling the usable training data in a combinatorial magnitude with low cost. We also propose to replace the InfoNCE loss with semantic matching loss based on medical knowledge to eliminate false negatives in contrastive learning. We prove that MedCLIP is a simple yet effective framework: it outperforms state-of-the-art methods on zero-shot prediction, supervised classification, and image-text retriev
Authors
(none)
Tags
Stats
Related papers
- Contextclip: Contextual Alignment Of Image-text Pairs On CLIP Visual Representations (2022)5.84
- Multi-task Cross-modal Learning For Chest X-ray Image Retrieval (2026)0.00
- Efficient Medical Vision-language Alignment Through Adapting Masked Vision Models (2025)5.74
- The Effect Of Negation On CLIP In Medical Imaging: Limitations Of Contrastive Language-image Pretraining (2025)0.00
- Semantic Compositions Enhance Vision-language Contrastive Learning (2024)0.00
- Advancing Myopia To Holism: Fully Contrastive Language-image Pre-training (2024)0.00
- C-CLIP: Contrastive Image-text Encoders To Close The Descriptive-commentative Gap (2023)0.00
- Optimizing CLIP Models For Image Retrieval With Maintained Joint-embedding Alignment (2024)6.34