General Image Descriptors For Open World Image Retrieval Using Vit CLIP
2022 Β· Marcos V. Conde, Ivan Aerlic, Simon JΓ©gou
Abstract
The Google Universal Image Embedding (GUIE) Challenge is one of the first competitions in multi-domain image representations in the wild, covering a wide distribution of objects: landmarks, artwork, food, etc. This is a fundamental computer vision problem with notable applications in image retrieval, search engines and e-commerce. In this work, we explain our 4th place solution to the GUIE Challenge, and our "bag of tricks" to fine-tune zero-shot Vision Transformers (ViT) pre-trained using CLIP.
Authors
(none)
Tags
Stats
Related papers
- Efficient And Discriminative Image Feature Extraction For Universal Image Retrieval (2024)4.94
- Unicom: Universal And Compact Representation Learning For Image Retrieval (2023)5.70
- Object-centric Open-vocabulary Image-retrieval With Aggregated Features (2023)0.00
- 3rd Place Solution To "google Landmark Retrieval 2020" (2020)0.00
- Priorclip: Visual Prior Guided Vision-language Model For Remote Sensing Image-text Retrieval (2024)0.00
- Enhancing Image Retrieval : A Comprehensive Study On Photo Search Using The CLIP Mode (2024)0.00
- FIGROTD: A Friendly-to-handle Dataset For Image Guided Retrieval With Optional Text (2025)0.00
- Training Vision Transformers For Image Retrieval (2021)0.00