Benchmarking Image Embeddings For E-commerce: Evaluating Off-the Shelf Foundation Models, Fine-tuning Strategies And Practical Trade-offs
2025 Β· Urszula Czerwinska, Cenk Bircanoglu, Jeremy Chamoux
Abstract
We benchmark foundation models image embeddings for classification and retrieval in e-Commerce, evaluating their suitability for real-world applications. Our study spans embeddings from pre-trained convolutional and transformer models trained via supervised, self-supervised, and text-image contrastive learning. We assess full fine-tuning and transfer learning (top-tuning) on six diverse e-Commerce datasets: fashion, consumer goods, cars, food, and retail. Results show full fine-tuning consistently performs well, while text-image and self-supervised embeddings can match its performance with less training. While supervised embeddings remain stable across architectures, SSL and contrastive embeddings vary significantly, often benefiting from top-tuning. Top-tuning emerges as an efficient alternative to full fine-tuning, reducing computational costs. We also explore cross-tuning, noting its impact depends on dataset characteristics. Our findings offer practical guidelines for embedding sel
Authors
(none)
Tags
Stats
Related papers
- Visual Product Search Benchmark (2026)0.00
- Lookbench: A Live And Holistic Open Benchmark For Fashion Image Retrieval (2026)0.00
- Transformer-empowered Multi-modal Item Embedding For Enhanced Image Search In E-commerce (2023)4.52
- Mine And Refine: Optimizing Graded Relevance In E-commerce Search Retrieval (2026)0.00
- Improving Embedding With Contrastive Fine-tuning On Small Datasets With Expert-augmented Scores (2024)0.00
- Self-enhancement Improves Text-image Retrieval In Foundation Visual-language Models (2023)1.56
- ACE-BERT: Adversarial Cross-modal Enhanced BERT For E-commerce Retrieval (2021)0.00
- Optimizing Product Deduplication In E-commerce With Multimodal Embeddings (2025)0.00