Retailklip : Finetuning Openclip Backbone Using Metric Learning On A Single GPU For Zero-shot Retail Product Image Classification
2023 Β· Muktabh Mayank Srivastava
Abstract
Retail product or packaged grocery goods images need to classified in various computer vision applications like self checkout stores, supply chain automation and retail execution evaluation. Previous works explore ways to finetune deep models for this purpose. But because of the fact that finetuning a large model or even linear layer for a pretrained backbone requires to run at least a few epochs of gradient descent for every new retail product added in classification range, frequent retrainings are needed in a real world scenario. In this work, we propose finetuning the vision encoder of a CLIP model in a way that its embeddings can be easily used for nearest neighbor based classification, while also getting accuracy close to or exceeding full finetuning. A nearest neighbor based classifier needs no incremental training for new products, thus saving resources and wait time.
Authors
(none)
Tags
Stats
Related papers
- RECLIP: Resource-efficient CLIP By Training With Small Images (2023)0.00
- Optimizing CLIP Models For Image Retrieval With Maintained Joint-embedding Alignment (2024)6.34
- Mobileclip: Fast Image-text Models Through Multi-modal Reinforced Training (2023)18.12
- A Deep Learning Pipeline For Product Recognition On Store Shelves (2018)11.85
- Koo-fu CLIP: Closed-form Adaptation Of Vision-language Models Via Fukunaga-koontz Linear Discriminant Analysis (2026)0.00
- Liteembed: Adapting CLIP To Rare Classes (2026)0.00
- Fine-grained Apparel Classification And Retrieval Without Rich Annotations (2018)0.00
- Fitclip: Refining Large-scale Pretrained Image-text Models For Zero-shot Video Understanding Tasks (2022)1.91