ECI: Effective Contrastive Information To Evaluate Hard-negatives
2026 Β· Aarush Sinha, Rahul Seetharaman, Aman Bansal
Abstract
Hard negatives play a critical role in training and fine-tuning dense retrieval models, as they are semantically similar to positive documents yet non-relevant, and correctly distinguishing them is essential for improving retrieval accuracy. However, identifying effective hard negatives typically requires extensive ablation studies involving repeated fine-tuning with different negative sampling strategies and hyperparameters, resulting in substantial computational cost. In this paper, we introduce ECI: Effective Contrastive Information , a theoretically grounded metric grounded in Information Theory and Information Retrieval principles that enables practitioners to assess the quality of hard negatives prior to model fine-tuning. ECI evaluates negatives by optimizing the trade-off between Information Capacity the logarithmic bound on mutual information determined by set size and Discriminative Efficiency, a harmonic balance of Signal Magnitude (Hardness) and Safety (Max-Margin). Unlike
Authors
(none)
Tags
Stats
Related papers
- Bica: Effective Biomedical Dense Retrieval With Citation-aware Hard Negatives (2025)0.00
- Improve Multi-modal Embedding Learning Via Explicit Hard Negative Gradient Amplifying (2025)2.80
- Optimizing Dense Retrieval Model Training With Hard Negatives (2021)16.34
- Hard Negatives, Hard Lessons: Revisiting Training Data Quality For Robust Information Retrieval With Llms (2025)2.26
- Enhancing Retrieval Performance: An Ensemble Approach For Hard Negative Mining (2024)0.00
- Nv-retriever: Improving Text Embedding Models With Effective Hard-negative Mining (2024)0.00
- VSE++: Improving Visual-semantic Embeddings With Hard Negatives (2017)0.00
- Syneg: Llm-driven Synthetic Hard-negatives For Dense Retrieval (2024)0.00