Enhancing Image-text Matching With Adaptive Feature Aggregation
2024 Β· Zuhui Wang, Yunting Yin, I. V. Ramakrishnan
Abstract
Image-text matching aims to find matched cross-modal pairs accurately. While current methods often rely on projecting cross-modal features into a common embedding space, they frequently suffer from imbalanced feature representations across different modalities, leading to unreliable retrieval results. To address these limitations, we introduce a novel Feature Enhancement Module that adaptively aggregates single-modal features for more balanced and robust image-text retrieval. Additionally, we propose a new loss function that overcomes the shortcomings of original triplet ranking loss, thereby significantly improving retrieval performance. The proposed model has been evaluated on two public datasets and achieves competitive retrieval performance when compared with several state-of-the-art models. Implementation codes can be found here.
Authors
(none)
Tags
Stats
Related papers
- A New Fine-grained Alignment Method For Image-text Matching (2023)0.00
- Deep Boosting Learning: A Brand-new Cooperative Approach For Image-text Matching (2024)9.73
- Matching Images And Text With Multi-modal Tensor Fusion And Re-ranking (2019)19.77
- Self-enhancement Improves Text-image Retrieval In Foundation Visual-language Models (2023)1.56
- Intra-modal Constraint Loss For Image-text Retrieval (2022)8.33
- Deep Multimodal Image-text Embeddings For Automatic Cross-media Retrieval (2020)0.00
- Adversarial Representation Learning For Text-to-image Matching (2019)17.32
- Look, Imagine And Match: Improving Textual-visual Cross-modal Retrieval With Generative Models (2017)18.52