Text-based Aerial-ground Person Retrieval
2025 Β· Xinyu Zhou, Yu Wu, Jiayao Ma, et al.
Abstract
This work introduces Text-based Aerial-Ground Person Retrieval (TAG-PR), which aims to retrieve person images from heterogeneous aerial and ground views with textual descriptions. Unlike traditional Text-based Person Retrieval (T-PR), which focuses solely on ground-view images, TAG-PR introduces greater practical significance and presents unique challenges due to the large viewpoint discrepancy across images. To support this task, we contribute: (1) TAG-PEDES dataset, constructed from public benchmarks with automatically generated textual descriptions, enhanced by a diversified text generation paradigm to ensure robustness under view heterogeneity; and (2) TAG-CLIP, a novel retrieval framework that addresses view heterogeneity through a hierarchically-routed mixture of experts module to learn view-specific and view-agnostic features and a viewpoint decoupling strategy to decouple view-specific features for better cross-modal alignment. We evaluate the effectiveness of TAG-CLIP on both
Authors
(none)
Tags
Stats
Related papers
- Text-guided Image Restoration And Semantic Enhancement For Text-to-image Person Retrieval (2023)9.00
- TVPR: Text-to-video Person Retrieval And A New Benchmark (2023)2.26
- Sa-person: Text-based Person Retrieval With Scene-aware Re-ranking (2025)0.00
- Up-person: Unified Parameter-efficient Transfer Learning For Text-based Person Retrieval (2025)4.26
- Beat: Bi-directional One-to-many Embedding Alignment For Text-based Person Retrieval (2024)10.85
- Text-based Person Search With Limited Data (2021)15.69
- Multilingual Text-to-image Person Retrieval Via Bidirectional Relation Reasoning And Aligning (2025)2.35
- Semi-supervised Text-based Person Search (2024)3.58