Benchmarking Pretrained Vision Embeddings For Near- And Duplicate Detection In Medical Images
2023 Β· Tuan Truong, Farnaz Khun Jush, Matthias Lenga
Abstract
Near- and duplicate image detection is a critical concern in the field of medical imaging. Medical datasets often contain similar or duplicate images from various sources, which can lead to significant performance issues and evaluation biases, especially in machine learning tasks due to data leakage between training and testing subsets. In this paper, we present an approach for identifying near- and duplicate 3D medical images leveraging publicly available 2D computer vision embeddings. We assessed our approach by comparing embeddings extracted from two state-of-the-art self-supervised pretrained models and two different vector index structures for similarity retrieval. We generate an experimental benchmark based on the publicly available Medical Segmentation Decathlon dataset. The proposed method yields promising results for near- and duplicate image detection achieving a mean sensitivity and specificity of 0.9645 and 0.8559, respectively.
Authors
(none)
Tags
Stats
Related papers
- Medical Image Retrieval Using Pretrained Embeddings (2023)7.81
- Benchmarking Unsupervised Near-duplicate Image Detection (2019)10.85
- Content-based Image Retrieval For Multi-class Volumetric Radiology Images: A Benchmark Study (2024)5.24
- Medimageinsight: An Open-source Embedding Model For General Domain Medical Imaging (2024)0.00
- Benchmarking Vision-language Contrastive Methods For Medical Representation Learning (2024)0.00
- Corrembed: Evaluating Pre-trained Model Image Similarity Efficacy With A Novel Metric (2023)5.24
- Geometric Visual Similarity Learning In 3D Medical Image Self-supervised Pre-training (2023)16.30
- Dataset And Case Studies For Visual Near-duplicates Detection In The Context Of Social Media (2022)0.00