Sparse Autoencoders For Interpretable Medical Image Representation Learning
2026 Β· Philipp Wesp, Robbie Holland, Vasiliki Sideri-Lampretsa, et al.
Abstract
Vision foundation models (FMs) achieve state-of-the-art performance in medical imaging. However, they encode information in abstract latent representations that clinicians cannot interrogate or verify. The goal of this study is to investigate Sparse Autoencoders (SAEs) for replacing opaque FM image representations with human-interpretable, sparse features. We train SAEs on embeddings from BiomedParse (biomedical) and DINOv3 (general-purpose) using 909,873 CT and MRI 2D image slices from the TotalSegmentator dataset. We find that learned sparse features: (a) reconstruct original embeddings with high fidelity (R2 up to 0.941) and recover up to 87.8% of downstream performance using only 10 features (99.4% dimensionality reduction), (b) preserve semantic fidelity in image retrieval tasks, (c) correspond to specific concepts that can be expressed in language using large language model (LLM)-based auto-interpretation. (d) bridge clinical language and abstract latent representations in zero-s
Authors
(none)
Tags
Stats
Related papers
- Learning Retrieval Models With Sparse Autoencoders (2026)0.00
- Decoding Dense Embeddings: Sparse Autoencoders For Interpreting And Discretizing Dense Retrieval (2025)0.00
- Medimageinsight: An Open-source Embedding Model For General Domain Medical Imaging (2024)0.00
- Stacked Autoencoders For Medical Image Search (2016)10.48
- Learning Autoencoded Radon Projections (2017)4.52
- Interpret And Control Dense Retrieval With Sparse Latent Features (2024)2.26
- Visual Words Meet BM25: Sparse Auto-encoder Visual Word Scoring For Image Retrieval (2026)0.00
- Med3dvlm: An Efficient Vision-language Model For 3D Medical Image Analysis (2025)12.60