Multilingual Text-to-image Person Retrieval Via Bidirectional Relation Reasoning And Aligning
2025 Β· Min Cao, Xinyu Zhou, Ding Jiang, et al.
Abstract
Text-to-image person retrieval (TIPR) aims to identify the target person using textual descriptions, facing challenge in modality heterogeneity. Prior works have attempted to address it by developing cross-modal global or local alignment strategies. However, global methods typically overlook fine-grained cross-modal differences, whereas local methods require prior information to explore explicit part alignments. Additionally, current methods are English-centric, restricting their application in multilingual contexts. To alleviate these issues, we pioneer a multilingual TIPR task by developing a multilingual TIPR benchmark, for which we leverage large language models for initial translations and refine them by integrating domain-specific knowledge. Correspondingly, we propose Bi-IRRA: a Bidirectional Implicit Relation Reasoning and Aligning framework to learn alignment across languages and modalities. Within Bi-IRRA, a bidirectional implicit relation reasoning module enables bidirection
Authors
(none)
Tags
Stats
Related papers
- Cross-modal Implicit Relation Reasoning And Aligning For Text-to-image Person Retrieval (2023)18.15
- Cross-modal Full-mode Fine-grained Alignment For Text-to-image Person Retrieval (2025)2.23
- Text-guided Image Restoration And Semantic Enhancement For Text-to-image Person Retrieval (2023)9.00
- Beat: Bi-directional One-to-many Embedding Alignment For Text-based Person Retrieval (2024)10.85
- See Finer, See More: Implicit Modality Alignment For Text-based Person Retrieval (2022)18.39
- Decoupled Cross-modal Alignment Network For Text-rgbt Person Retrieval And A High-quality Benchmark (2025)0.00
- IMRAM: Iterative Matching With Recurrent Attention Memory For Cross-modal Image-text Retrieval (2020)19.22
- Cross-modal Adaptive Dual Association For Text-to-image Person Retrieval (2023)12.02