Multi-path Exploration And Feedback Adjustment For Text-to-image Person Retrieval
2024 Β· Bin Kang, Bin Chen, Junjie Wang, et al.
Abstract
Text-based person retrieval aims to identify the specific persons using textual descriptions as queries. Existing ad vanced methods typically depend on vision-language pre trained (VLP) models to facilitate effective cross-modal alignment. However, the inherent constraints of VLP mod-els, which include the global alignment biases and insuffi-cient self-feedback regulation, impede optimal retrieval per formance. In this paper, we propose MeFa, a Multi-Pathway Exploration, Feedback, and Adjustment framework, which deeply explores intrinsic feedback of intra and inter-modal to make targeted adjustment, thereby achieving more precise person-text associations. Specifically, we first design an intra modal reasoning pathway that generates hard negative sam ples for cross-modal data, leveraging feedback from these samples to refine intra-modal reasoning, thereby enhancing sensitivity to subtle discrepancies. Subsequently, we intro duce a cross-modal refinement pathway that utilizes both global
Authors
(none)
Tags
Stats
Related papers
- Cross-modal Full-mode Fine-grained Alignment For Text-to-image Person Retrieval (2025)2.23
- Enhancing Visual Representation For Text-based Person Searching (2024)1.69
- Beat: Bi-directional One-to-many Embedding Alignment For Text-based Person Retrieval (2024)10.85
- See Finer, See More: Implicit Modality Alignment For Text-based Person Retrieval (2022)18.39
- Improving Text-based Person Search Via Part-level Cross-modal Correspondence (2024)0.00
- Decoupled Cross-modal Alignment Network For Text-rgbt Person Retrieval And A High-quality Benchmark (2025)0.00
- Cross-modal Implicit Relation Reasoning And Aligning For Text-to-image Person Retrieval (2023)18.15
- Look Before You Leap: Improving Text-based Person Retrieval By Learning A Consistent Cross-modal Common Manifold (2022)15.34