Dynamic Weighted Combiner For Mixed-modal Image Retrieval
2023 Β· Fuxiang Huang, Lei Zhang, Xiaowei Fu, et al.
Abstract
Mixed-Modal Image Retrieval (MMIR) as a flexible search paradigm has attracted wide attention. However, previous approaches always achieve limited performance, due to two critical factors are seriously overlooked. 1) The contribution of image and text modalities is different, but incorrectly treated equally. 2) There exist inherent labeling noises in describing users' intentions with text in web datasets from diverse real-world scenarios, giving rise to overfitting. We propose a Dynamic Weighted Combiner (DWC) to tackle the above challenges, which includes three merits. First, we propose an Editable Modality De-equalizer (EMD) by taking into account the contribution disparity between modalities, containing two modality feature editors and an adaptive weighted combiner. Second, to alleviate labeling noises and data bias, we propose a dynamic soft-similarity label generator (SSG) to implicitly improve noisy supervision. Finally, to bridge modality gaps and facilitate similarity learning,
Authors
(none)
Tags
Stats
Related papers
- Entity Image And Mixed-modal Image Retrieval Datasets (2025)1.56
- DAFM: Dynamic Adaptive Fusion For Multi-model Collaboration In Composed Image Retrieval (2025)0.00
- Modality Curation: Building Universal Embeddings For Advanced Multimodal Information Retrieval (2025)0.00
- Cross-modal Image Retrieval With Deep Mutual Information Maximization (2021)9.59
- Docmmir: A Framework For Document Multi-modal Information Retrieval (2025)3.46
- IDMR: Towards Instance-driven Precise Visual Correspondence In Multimodal Retrieval (2025)2.29
- Training-free Zero-shot Composed Image Retrieval Via Weighted Modality Fusion And Similarity (2024)5.84
- Modal-aware Features For Multimodal Hashing (2019)0.00