Decoupled Cross-modal Alignment Network For Text-rgbt Person Retrieval And A High-quality Benchmark
2025 Β· Yifei Deng, Chenglong Li, Zhenyu Chen, et al.
Abstract
The performance of traditional text-image person retrieval task is easily affected by lighting variations due to imaging limitations of visible spectrum sensors. In recent years, cross-modal information fusion has emerged as an effective strategy to enhance retrieval robustness. By integrating complementary information from different spectral modalities, it becomes possible to achieve more stable person recognition and matching under complex real-world conditions. Motivated by this, we introduce a novel task: Text-RGBT Person Retrieval, which incorporates cross-spectrum information fusion by combining the complementary cues from visible and thermal modalities for robust person retrieval in challenging environments. The key challenge of Text-RGBT person retrieval lies in aligning text with multi-modal visual features. However, the inherent heterogeneity between visible and thermal modalities may interfere with the alignment between vision and language. To handle this problem, we propose
Authors
(none)
Tags
Stats
Related papers
- Cross-modal Implicit Relation Reasoning And Aligning For Text-to-image Person Retrieval (2023)18.15
- Beat: Bi-directional One-to-many Embedding Alignment For Text-based Person Retrieval (2024)10.85
- See Finer, See More: Implicit Modality Alignment For Text-based Person Retrieval (2022)18.39
- Improving Text-based Person Search Via Part-level Cross-modal Correspondence (2024)0.00
- Multi-path Exploration And Feedback Adjustment For Text-to-image Person Retrieval (2024)0.00
- Cross-modal Full-mode Fine-grained Alignment For Text-to-image Person Retrieval (2025)2.23
- CAIBC: Capturing All-round Information Beyond Color For Text-based Person Retrieval (2022)15.37
- Bridging The Gap: Multi-level Cross-modality Joint Alignment For Visible-infrared Person Re-identification (2023)11.29