Contrastive Transformer Learning With Proximity Data Generation For Text-based Person Search
2023 Β· Hefeng Wu, Weifeng Chen, Zhibin Liu, et al.
Abstract
Given a descriptive text query, text-based person search (TBPS) aims to retrieve the best-matched target person from an image gallery. Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data. To better align the two modalities, most existing works focus on introducing sophisticated network structures and auxiliary tasks, which are complex and hard to implement. In this paper, we propose a simple yet effective dual Transformer model for text-based person search. By exploiting a hardness-aware contrastive learning strategy, our model achieves state-of-the-art performance without any special design for local feature alignment or side information. Moreover, we propose a proximity data generation (PDG) module to automatically produce more diverse data for cross-modal training. The PDG module first introduces an automatic generation algorithm based on a text-to-image diffusion model, which generates
Authors
(none)
Tags
Stats
Related papers
- Text-based Person Search With Limited Data (2021)15.69
- Semi-supervised Text-based Person Search (2024)3.58
- Boosting Weak Positives For Text Based Person Search (2025)0.00
- TIPCB: A Simple But Effective Part-based Convolutional Baseline For Text-based Person Search (2021)20.24
- Improving Text-based Person Search Via Part-level Cross-modal Correspondence (2024)0.00
- Beat: Bi-directional One-to-many Embedding Alignment For Text-based Person Retrieval (2024)10.85
- Enhancing Visual Representation For Text-based Person Searching (2024)1.69
- Up-person: Unified Parameter-efficient Transfer Learning For Text-based Person Retrieval (2025)4.26