Nv-retriever: Improving Text Embedding Models With Effective Hard-negative Mining
2024 Β· Gabriel de Souza P. Moreira, Radek Osmulski, Mengyao Xu, et al.
Abstract
Text embedding models have been popular for information retrieval applications such as semantic search and Question-Answering systems based on Retrieval-Augmented Generation (RAG). Those models are typically Transformer models that are fine-tuned with contrastive learning objectives. One of the challenging aspects of fine-tuning embedding models is the selection of high quality hard-negative passages for contrastive learning. In this paper we introduce a family of positive-aware mining methods that use the positive relevance score as an anchor for effective false negative removal, leading to faster training and more accurate retrieval models. We provide an ablation study on hard-negative mining methods over their configurations, exploring different teacher and base models. We further demonstrate the efficacy of our proposed mining methods at scale with the NV-Retriever-v1 model, which scores 60.9 on MTEB Retrieval (BEIR) benchmark and placed 1st when it was published to the MTEB Retrie
Authors
(none)
Tags
Stats
Related papers
- Enhancing Retrieval Performance: An Ensemble Approach For Hard Negative Mining (2024)0.00
- VSE++: Improving Visual-semantic Embeddings With Hard Negatives (2017)0.00
- Bica: Effective Biomedical Dense Retrieval With Citation-aware Hard Negatives (2025)0.00
- Gistembed: Guided In-sample Selection Of Training Negatives For Text Embedding Fine-tuning (2024)0.00
- Improve Multi-modal Embedding Learning Via Explicit Hard Negative Gradient Amplifying (2025)2.80
- Optimizing Legal Document Retrieval In Vietnamese With Semi-hard Negative Mining (2025)0.00
- Learning Video Retrieval Models With Relevance-aware Online Mining (2022)6.07
- Hard Negatives, Hard Lessons: Revisiting Training Data Quality For Robust Information Retrieval With Llms (2025)2.26