Text-guided Image Restoration And Semantic Enhancement For Text-to-image Person Retrieval
2023 Β· Delong Liu, Haiwen Li, Zhicheng Zhao, et al.
Abstract
The goal of Text-to-Image Person Retrieval (TIPR) is to retrieve specific person images according to the given textual descriptions. A primary challenge in this task is bridging the substantial representational gap between visual and textual modalities. The prevailing methods map texts and images into unified embedding space for matching, while the intricate semantic correspondences between texts and images are still not effectively constructed. To address this issue, we propose a novel TIPR framework to build fine-grained interactions and alignment between person images and the corresponding texts. Specifically, via fine-tuning the Contrastive Language-Image Pre-training (CLIP) model, a visual-textual dual encoder is firstly constructed, to preliminarily align the image and text features. Secondly, a Text-guided Image Restoration (TIR) auxiliary task is proposed to map abstract textual entities to specific image regions, improving the alignment between local textual and visual embeddi
Authors
(none)
Tags
Stats
Related papers
- Cross-modal Full-mode Fine-grained Alignment For Text-to-image Person Retrieval (2025)2.23
- Multilingual Text-to-image Person Retrieval Via Bidirectional Relation Reasoning And Aligning (2025)2.35
- Beat: Bi-directional One-to-many Embedding Alignment For Text-based Person Retrieval (2024)10.85
- Cross-modal Implicit Relation Reasoning And Aligning For Text-to-image Person Retrieval (2023)18.15
- Up-person: Unified Parameter-efficient Transfer Learning For Text-based Person Retrieval (2025)4.26
- TIPCB: A Simple But Effective Part-based Convolutional Baseline For Text-based Person Search (2021)20.24
- Enhancing Visual Representation For Text-based Person Searching (2024)1.69
- Text-based Aerial-ground Person Retrieval (2025)2.08