Generalized Contrastive Learning For Multi-modal Retrieval And Ranking
2024 Β· Tianyu Zhu, Myong Chol Jung, Jesse Clark
Abstract
Contrastive learning has gained widespread adoption for retrieval tasks due to its minimal requirement for manual annotations. However, popular training frameworks typically learn from binary (positive/negative) relevance, making them ineffective at incorporating desired rankings. As a result, the poor ranking performance of these models forces systems to employ a re-ranker, which increases complexity, maintenance effort and inference time. To address this, we introduce Generalized Contrastive Learning (GCL), a training framework designed to learn from continuous ranking scores beyond binary relevance. GCL encodes both relevance and ranking information into a unified embedding space by applying ranking scores to the loss function. This enables a single-stage retrieval system. In addition, during our research, we identified a lack of public multi-modal datasets that benchmark both retrieval and ranking capabilities. To facilitate this and future research for ranked retrieval, we curated
Authors
(none)
Tags
Stats
Related papers
- Generalized Contrastive Learning For Universal Multimodal Retrieval (2025)0.00
- Supervised Fine-tuning Or Contrastive Learning? Towards Better Multimodal LLM Reranking (2025)0.00
- Normalized Contrastive Learning For Text-video Retrieval (2022)6.77
- Improving The Consistency In Cross-lingual Cross-modal Retrieval With 1-to-k Contrastive Learning (2024)5.84
- Simple To Complex Cross-modal Learning To Rank (2017)13.84
- X-CLIP: End-to-end Multi-grained Contrastive Learning For Video-text Retrieval (2022)18.12
- Re-ranking The Context For Multimodal Retrieval Augmented Generation (2025)0.00
- Beyond Global Similarity: Towards Fine-grained, Multi-condition Multimodal Retrieval (2026)2.20