The Effect Of Negation On CLIP In Medical Imaging: Limitations Of Contrastive Language-image Pretraining
2025 Β· Jasmine Vu, Shivanand Sheshappanavar
Abstract
Large vision-language models like CLIP are increasingly used in medical imaging tasks due to their ability to align images and text without the need for extensive labeled data. This makes them particularly useful for applications like image retrieval, report generation, and classification in clinical settings. A potential issue to this approach is that CLIP-based models often under perform when interpreting negated phrases, which is especially problematic in the context of medical diagnosing. In this study, we evaluate the Stanford AIMI CheXagent model on its ability to correctly retrieve chest X-ray images using prompts with and without negation. The goal of this project is to understand where this model fails and then use it as a base model to improve its retrieval accuracy by fine tuning methods outlined in previous work. Results from this study show improvement in handling of negation in the CLIP model with a slight decrease in accuracy of positive prompt evaluation. Alongside retr
Authors
(none)
Tags
Stats
Related papers
- Contrastive Vision-language Learning With Paraphrasing And Negation (2025)0.00
- Medclip: Contrastive Learning From Unpaired Medical Images And Text (2022)26.02
- Tng-clip:training-time Negation Data Generation For Negation Awareness Of CLIP (2025)0.00
- Multi-task Cross-modal Learning For Chest X-ray Image Retrieval (2026)0.00
- Tripletclip: Improving Compositional Reasoning Of CLIP Via Synthetic Vision-language Negatives (2024)4.52
- Efficient Medical Vision-language Alignment Through Adapting Masked Vision Models (2025)5.74
- Contrasting Intra-modal And Ranking Cross-modal Hard Negatives To Enhance Visio-linguistic Compositional Understanding (2023)12.11
- Spacevlm: Sub-space Modeling Of Negation In Vision-language Models (2025)0.00