Medprobclip: Probabilistic Adaptation Of Vision-language Foundation Model For Reliable Radiograph-report Retrieval
2026 Β· Ahmad Elallaf, Yu Zhang, Yuktha Priya Masupalli, et al.
Abstract
Vision-language foundation models have emerged as powerful general-purpose representation learners with strong potential for multimodal understanding, but their deterministic embeddings often fail to provide the reliability required for high-stakes biomedical applications. This work introduces MedProbCLIP, a probabilistic vision-language learning framework for chest X-ray and radiology report representation learning and bidirectional retrieval. MedProbCLIP models image and text representations as Gaussian embeddings through a probabilistic contrastive objective that explicitly captures uncertainty and many-to-many correspondences between radiographs and clinical narratives. A variational information bottleneck mitigates overconfident predictions, while MedProbCLIP employs multi-view radiograph encoding and multi-section report encoding during training to provide fine-grained supervision for clinically aligned correspondence, yet requires only a single radiograph and a single report at
Authors
(none)
Tags
Stats
Related papers
- Multi-task Cross-modal Learning For Chest X-ray Image Retrieval (2026)0.00
- Probvlm: Probabilistic Adapter For Frozen Vision-language Models (2023)13.41
- Exploring The Capabilities Of LLM Encoders For Image-text Retrieval In Chest X-rays (2025)0.00
- Prototype-enhanced Confidence Modeling For Cross-modal Medical Image-report Retrieval (2025)0.00
- Vision-language Modelling For Radiological Imaging And Reports In The Low Data Regime (2023)0.00
- Benchmarking Robustness Of Contrastive Learning Models For Medical Image-report Retrieval (2025)0.00
- Priorclip: Visual Prior Guided Vision-language Model For Remote Sensing Image-text Retrieval (2024)0.00
- X-TRA: Improving Chest X-ray Tasks With Cross-modal Retrieval Augmentation (2023)8.09