Masked Contrastive Reconstruction For Cross-modal Medical Image-report Retrieval
2023 Β· Zeqiang Wei, Kai Jin, Xiuzhuang Zhou
Abstract
Cross-modal medical image-report retrieval task plays a significant role in clinical diagnosis and various medical generative tasks. Eliminating heterogeneity between different modalities to enhance semantic consistency is the key challenge of this task. The current Vision-Language Pretraining (VLP) models, with cross-modal contrastive learning and masked reconstruction as joint training tasks, can effectively enhance the performance of cross-modal retrieval. This framework typically employs dual-stream inputs, using unmasked data for cross-modal contrastive learning and masked data for reconstruction. However, due to task competition and information interference caused by significant differences between the inputs of the two proxy tasks, the effectiveness of representation learning for intra-modal and cross-modal features is limited. In this paper, we propose an efficient VLP framework named Masked Contrastive and Reconstruction (MCR), which takes masked data as the sole input for bot
Authors
(none)
Tags
Stats
Related papers
- Efficient Medical Vision-language Alignment Through Adapting Masked Vision Models (2025)5.74
- Benchmarking Robustness Of Contrastive Learning Models For Medical Image-report Retrieval (2025)0.00
- Multi-task Cross-modal Learning For Chest X-ray Image Retrieval (2026)0.00
- Benchmarking Vision-language Contrastive Methods For Medical Representation Learning (2024)0.00
- More: Multi-modal Contrastive Pre-training With Transformers On X-rays, Ecgs, And Diagnostic Report (2024)0.00
- Vision-language Modelling For Radiological Imaging And Reports In The Low Data Regime (2023)0.00
- Prototype-enhanced Confidence Modeling For Cross-modal Medical Image-report Retrieval (2025)0.00
- Masked Contrastive Pre-training For Efficient Video-text Retrieval (2022)5.84