More: Multi-modal Contrastive Pre-training With Transformers On X-rays, Ecgs, And Diagnostic Report
2024 Β· Samrajya Thapa, Koushik Howlader, Subhankar Bhattacharjee, et al.
Abstract
In this paper, we introduce a novel Multi-Modal Contrastive Pre-training Framework that synergistically combines X-rays, electrocardiograms (ECGs), and radiology/cardiology reports. Our approach leverages transformers to encode these diverse modalities into a unified representation space, aiming to enhance diagnostic accuracy and facilitate comprehensive patient assessments. We utilize LoRA-Peft to significantly reduce trainable parameters in the LLM and incorporate recent linear attention dropping strategy in the Vision Transformer(ViT) for smoother attention. Furthermore, we provide novel multimodal attention explanations and retrieval for our model. To the best of our knowledge, we are the first to propose an integrated model that combines X-ray, ECG, and Radiology/Cardiology Report with this approach. By utilizing contrastive loss, MoRE effectively aligns modality-specific features into a coherent embedding, which supports various downstream tasks such as zero-shot classification a
Authors
(none)
Tags
Stats
Related papers
- Masked Contrastive Reconstruction For Cross-modal Medical Image-report Retrieval (2023)0.00
- X-TRA: Improving Chest X-ray Tasks With Cross-modal Retrieval Augmentation (2023)8.09
- Selip: Similarity Enhanced Contrastive Language Image Pretraining For Multi-modal Head MRI (2025)3.58
- Exploring The Capabilities Of LLM Encoders For Image-text Retrieval In Chest X-rays (2025)0.00
- Multi-task Cross-modal Learning For Chest X-ray Image Retrieval (2026)0.00
- Unsupervised Multimodal Representation Learning Across Medical Images And Reports (2018)0.00
- Multimodal Contrastive Training For Visual Representation Learning (2021)16.32
- Decoupling The Role Of Data, Attention, And Losses In Multimodal Transformers (2021)13.88