Camouflage-aware Image-text Retrieval Via Expert Collaboration
2026 Β· Yao Jiang, Zhongkuan Mao, Xuan Wu, et al.
Abstract
Camouflaged scene understanding (CSU) has attracted significant attention due to its broad practical implications. However, in this field, robust image-text cross-modal alignment remains under-explored, hindering deeper understanding of camouflaged scenarios and their related applications. To this end, we focus on the typical image-text retrieval task, and formulate a new task dubbed ``camouflage-aware image-text retrieval'' (CA-ITR). We first construct a dedicated camouflage image-text retrieval dataset (CamoIT), comprising \(\sim\)10.5K samples with multi-granularity textual annotations. Benchmark results conducted on CamoIT reveal the underlying challenges of CA-ITR for existing cutting-edge retrieval techniques, which are mainly caused by objects' camouflage properties as well as those complex image contents. As a solution, we propose a camouflage-expert collaborative network (CECNet), which features a dual-branch visual encoder: one branch captures holistic image representations,
Authors
(none)
Tags
Stats
Related papers
- CAMP: Cross-modal Adaptive Message Passing For Text-image Retrieval (2019)18.38
- Stacmr: Scene-text Aware Cross-modal Retrieval (2020)10.48
- Tsvc:tripartite Learning With Semantic Variation Consistency For Robust Image-text Retrieval (2025)3.58
- Context-cir: Learning From Concepts In Text For Composed Image Retrieval (2025)4.67
- Cross-modal And Uni-modal Soft-label Alignment For Image-text Retrieval (2024)15.75
- Robust Remote Sensing Image-text Retrieval With Noisy Correspondence (2026)1.24
- Collaborative Group: Composed Image Retrieval Via Consensus Learning From Noisy Annotations (2023)0.00
- Scene Text Retrieval Via Joint Text Detection And Similarity Learning (2021)16.16