Human Re-id Meets Lvlms: What Can We Expect?
2025 · Kailash Hambarde, Pranita Samale, Hugo Proença
Abstract
Large vision-language models (LVLMs) have been regarded as a breakthrough advance in an astoundingly variety of tasks, from content generation to virtual assistants and multimodal search or retrieval. However, for many of these applications, the performance of these methods has been widely criticized, particularly when compared with state-of-the-art methods and technologies in each specific domain. In this work, we compare the performance of the leading large vision-language models in the human re-identification task, using as baseline the performance attained by state-of-the-art AI models specifically designed for this problem. We compare the results due to ChatGPT-4o, Gemini-2.0-Flash, Claude 3.5 Sonnet, and Qwen-VL-Max to a baseline ReID PersonViT model, using the well-known Market1501 dataset. Our evaluation pipeline includes the dataset curation, prompt engineering, and metric selection to assess the models' performance. Results are analyzed from many different perspectives: simil
Authors
(none)
Tags
Stats
Related papers
- Instruct-reid++: Towards Universal Purpose Instruction-guided Person Re-identification (2024)9.13
- Person Re-identification: Past, Present And Future (2016)0.00
- Toward Automatic Relevance Judgment Using Vision--language Models For Image--text Retrieval Evaluation (2024)0.00
- A Little More Like This: Text-to-image Retrieval With Vision-language Models Using Relevance Feedback (2025)0.00
- RAVEN: Multitask Retrieval Augmented Vision-language Learning (2024)0.00
- Video-based Visible-infrared Person Re-identification With Auxiliary Samples (2023)13.49
- A Convolutional Baseline For Person Re-identification Using Vision And Language Descriptions (2020)0.00
- Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models And Vision Language Models (2024)8.82