Abstract
Due to its cost-effective and high-efficiency retrieval advantages, deep hashing has attracted extensive attention in the field of cross-modal retrieval. However, despite significant progress in existing deep cross-modal hashing methods, several limitations persist: they struggle to establish consistent mapping relationships across different modalities, fail to effectively bridge the semantic gap between heterogeneous data, and consequently suffer from semantic information loss and incomplete semantic understanding during cross-modal learning. To address these challenges, this paper proposes a Feature Fusion-based Cross-modal Proxy Hashing (FFCPH) retrieval method. This approach integrates multi-modal semantic information through a feature fusion module to generate discriminative and robust fused features. Furthermore, a novel joint loss function, which comprises cross-modal proxy loss, cross-modal irrelevant loss, and cross-modal consistency loss, is designed to preserve inter-sample similarity ranking accuracy and mitigate the semantic gap across modalities. Experimental results on three widely used benchmark datasets demonstrate that the proposed method significantly outperforms state-of-the-art approaches in retrieval performance.