Sa-person: Text-based Person Retrieval With Scene-aware Re-ranking
2025 Β· Yingjia Xu, Jinlin Wu, Daming Gao, et al.
Abstract
Text-based person retrieval aims to identify a target individual from an image gallery using a natural language description. Existing methods primarily focus on appearance-driven cross-modal retrieval, yet face significant challenges due to the visual complexity of scenes and the inherent ambiguity of textual descriptions. The contextual information, such as landmarks and relational cues, provides complementary cues that can offer valuable complementary insights for retrieval, but remains underexploited in current approaches. Motivated by this limitation, we propose a novel paradigm: scene-aware text-based person retrieval, which explicitly integrates both individual appearance and global scene context to improve retrieval accuracy. To support this, we first introduce ScenePerson-13W, a large-scale benchmark dataset comprising over 100,000 real-world scenes with rich annotations encompassing both pedestrian attributes and scene context. Based on this dataset, we further present SA-Pers
Authors
(none)
Tags
Stats
Related papers
- Stacmr: Scene-text Aware Cross-modal Retrieval (2020)10.48
- Semi-supervised Text-based Person Search (2024)3.58
- Multi-path Exploration And Feedback Adjustment For Text-to-image Person Retrieval (2024)0.00
- Text-based Person Search With Limited Data (2021)15.69
- Person Retrieval In Surveillance Using Textual Query: A Review (2021)0.00
- Text-based Aerial-ground Person Retrieval (2025)2.08
- Scene Text Retrieval Via Joint Text Detection And Similarity Learning (2021)16.16
- CAIBC: Capturing All-round Information Beyond Color For Text-based Person Retrieval (2022)15.37