Unsupervised Multimodal Representation Learning Across Medical Images And Reports
2018 Β· Tzu-Ming Harry Hsu, Wei-Hung Weng, Willie Boag, et al.
Abstract
Joint embeddings between medical imaging modalities and associated radiology reports have the potential to offer significant benefits to the clinical community, ranging from cross-domain retrieval to conditional generation of reports to the broader goals of multimodal representation learning. In this work, we establish baseline joint embedding results measured via both local and global retrieval methods on the soon to be released MIMIC-CXR dataset consisting of both chest X-ray images and the associated radiology reports. We examine both supervised and unsupervised methods on this task and show that for document retrieval tasks with the learned representations, only a limited amount of supervision is needed to yield results comparable to those of fully-supervised methods.
Authors
(none)
Tags
Stats
Related papers
- Learning Visual-semantic Embeddings For Reporting Abnormal Findings On Chest X-rays (2020)9.76
- M3ret: Unleashing Zero-shot Multimodal Medical Image Retrieval Via Self-supervision (2025)0.00
- Radir: A Scalable Framework For Multi-grained Medical Image Retrieval Via Radiology Report Mining (2025)0.00
- X-TRA: Improving Chest X-ray Tasks With Cross-modal Retrieval Augmentation (2023)8.09
- Grounded Multimodal Retrieval-augmented Drafting Of Radiology Impressions Using Case-based Similarity Search (2026)0.00
- Cross-modality Sub-image Retrieval Using Contrastive Multimodal Image Representations (2022)6.32
- Prototype-enhanced Confidence Modeling For Cross-modal Medical Image-report Retrieval (2025)0.00
- MRIS: A Multi-modal Retrieval Approach For Image Synthesis On Diverse Modalities (2023)3.58