M3ret: Unleashing Zero-shot Multimodal Medical Image Retrieval Via Self-supervision
2025 Β· Che Liu, Zheng Jiang, Chengyu Fang, et al.
Abstract
Medical image retrieval is essential for clinical decision-making and translational research, relying on discriminative visual representations. Yet, current methods remain fragmented, relying on separate architectures and training strategies for 2D, 3D, and video-based medical data. This modality-specific design hampers scalability and inhibits the development of unified representations. To enable unified learning, we curate a large-scale hybrid-modality dataset comprising 867,653 medical imaging samples, including 2D X-rays and ultrasounds, RGB endoscopy videos, and 3D CT scans. Leveraging this dataset, we train M3Ret, a unified visual encoder without any modality-specific customization. It successfully learns transferable representations using both generative (MAE) and contrastive (SimDINO) self-supervised learning (SSL) paradigms. Our approach sets a new state-of-the-art in zero-shot image-to-image retrieval across all individual modalities, surpassing strong baselines such as DINOv
Authors
(none)
Tags
Stats
Related papers
- Unsupervised Multimodal Representation Learning Across Medical Images And Reports (2018)0.00
- M3DR: Towards Universal Multilingual Multimodal Document Retrieval (2025)0.00
- M3retrieve: Benchmarking Multimodal Retrieval For Medicine (2025)2.16
- 3D-MIR: A Benchmark And Empirical Study On 3D Medical Image Retrieval In Radiology (2023)0.00
- Radiomicsretrieval: A Customizable Framework For Medical Image Retrieval Using Radiomics Features (2025)2.29
- Medimageinsight: An Open-source Embedding Model For General Domain Medical Imaging (2024)0.00
- Universal Model For Multi-domain Medical Image Retrieval (2020)0.00
- Enhancing Medical Cross-modal Hashing Retrieval Using Dropout-voting Mixture-of-experts Fusion (2025)0.00