Cross-modal And Uni-modal Soft-label Alignment For Image-text Retrieval
2024 Β· Hailang Huang, Zhijie Nie, Ziqiao Wang, et al.
Abstract
Current image-text retrieval methods have demonstrated impressive performance in recent years. However, they still face two problems: the inter-modal matching missing problem and the intra-modal semantic loss problem. These problems can significantly affect the accuracy of image-text retrieval. To address these challenges, we propose a novel method called Cross-modal and Uni-modal Soft-label Alignment (CUSA). Our method leverages the power of uni-modal pre-trained models to provide soft-label supervision signals for the image-text retrieval model. Additionally, we introduce two alignment techniques, Cross-modal Soft-label Alignment (CSA) and Uni-modal Soft-label Alignment (USA), to overcome false negatives and enhance similarity recognition between uni-modal samples. Our method is designed to be plug-and-play, meaning it can be easily applied to existing image-text retrieval models without changing their original architectures. Extensive experiments on various image-text retrieval mode
Authors
(none)
Tags
Stats
Related papers
- Towards Fast And Accurate Image-text Retrieval With Self-supervised Fine-grained Alignment (2023)11.99
- A New Fine-grained Alignment Method For Image-text Matching (2023)0.00
- Correspondence-free Domain Alignment For Unsupervised Cross-domain Image Retrieval (2023)9.23
- Learning Relation Alignment For Calibrated Cross-modal Retrieval (2021)8.82
- Semi-supervised Cross-modal Retrieval With Label Prediction (2018)11.29
- COTS: Collaborative Two-stream Vision-language Pre-training Model For Cross-modal Retrieval (2022)13.60
- Multimodal Representation Alignment For Cross-modal Information Retrieval (2025)0.00
- Camouflage-aware Image-text Retrieval Via Expert Collaboration (2026)1.24