DEMO: A Statistical Perspective For Efficient Image-text Matching
2024 Β· Fan Zhang, Xian-Sheng Hua, Chong Chen, et al.
Abstract
Image-text matching has been a long-standing problem, which seeks to connect vision and language through semantic understanding. Due to the capability to manage large-scale raw data, unsupervised hashing-based approaches have gained prominence recently. They typically construct a semantic similarity structure using the natural distance, which subsequently provides guidance to the model optimization process. However, the similarity structure could be biased at the boundaries of semantic distributions, causing error accumulation during sequential optimization. To tackle this, we introduce a novel hashing approach termed Distribution-based Structure Mining with Consistency Learning (DEMO) for efficient image-text matching. From a statistical view, DEMO characterizes each image using multiple augmented views, which are considered as samples drawn from its intrinsic semantic distribution. Then, we employ a non-parametric distribution divergence to ensure a robust and precise similarity stru
Authors
(none)
Tags
Stats
Related papers
- ALADIN: Distilling Fine-grained Alignment Scores For Efficient Image-text Matching And Retrieval (2022)14.00
- Deep Boosting Learning: A Brand-new Cooperative Approach For Image-text Matching (2024)9.73
- Learning Image-text Matching With Optimal Partial Transport (2026)0.00
- A New Fine-grained Alignment Method For Image-text Matching (2023)0.00
- One Loss For Quantization: Deep Hashing With Discrete Wasserstein Distributional Matching (2022)12.40
- Visual Semantic Reasoning For Image-text Matching (2019)25.23
- Enhancing Image-text Matching With Adaptive Feature Aggregation (2024)6.34
- Exploring Auxiliary Context: Discrete Semantic Transfer Hashing For Scalable Image Retrieval (2019)15.88