Personalization Toolkit: Training Free Personalization Of Large Vision Language Models
2026 Β· Soroush Seifi, Vaggelis Dorovatas, Matteo Cassinelli, et al.
Abstract
arXiv:2502.02452v4 Announce Type: replace Abstract: Personalization of Large Vision-Language Models (LVLMs) involves customizing models to recognize specific users or object instances and to generate contextually tailored responses. Existing approaches rely on time-consuming training for each item, making them impractical for real-world deployment, as reflected in current personalization benchmarks limited to object-centric single-concept evaluations. In this paper, we present a novel training-free approach to LVLM personalization called \ours. We introduce a comprehensive, real-world benchmark designed to rigorously evaluate various aspects of the personalization task. \ours leverages pre-trained vision foundation models to extract distinctive features, applies retrieval-augmented generation (RAG) techniques to identify instances within visual inputs, and employs visual prompting strategies to guide model outputs. Our model-agnostic vision toolkit enables efficient and flexible multi
Authors
(none)
Tags
Stats
Related papers
- Meta-personalizing Vision-language Models To Find Named Instances In Video (2023)8.60
- "this Is My Unicorn, Fluffy": Personalizing Frozen Vision-language Representations (2022)12.81
- Improving Personalized Search With Regularized Low-rank Parameter Updates (2025)0.00
- Infusing Fine-grained Visual Knowledge To Vision-language Models (2025)0.00
- RAVEN: Multitask Retrieval Augmented Vision-language Learning (2024)0.00
- 12-in-1: Multi-task Vision And Language Representation Learning (2019)17.85
- VLMAE: Vision-language Masked Autoencoder (2022)0.00
- Vision-language Modelling For Radiological Imaging And Reports In The Low Data Regime (2023)0.00