Enhancing Dense Retrievers' Robustness With Group-level Reweighting
2023 Β· Peixuan Han, Zhenghao Liu, Zhiyuan Liu, et al.
Abstract
The anchor-document data derived from web graphs offers a wealth of paired information for training dense retrieval models in an unsupervised manner. However, unsupervised data contains diverse patterns across the web graph and often exhibits significant imbalance, leading to suboptimal performance in underrepresented or difficult groups. In this paper, we introduce WebDRO, an efficient approach for clustering the web graph data and optimizing group weights to enhance the robustness of dense retrieval models. Initially, we build an embedding model for clustering anchor-document pairs. Specifically, we contrastively train the embedding model for link prediction, which guides the embedding model in capturing the document features behind the web graph links. Subsequently, we employ the group distributional robust optimization to recalibrate the weights across different clusters of anchor-document pairs during training retrieval models. During training, we direct the model to assign higher
Authors
(none)
Tags
Stats
Related papers
- Unsupervised Dense Retrieval Training With Web Anchors (2023)3.81
- Enhancing The Ranking Context Of Dense Retrieval Methods Through Reciprocal Nearest Neighbors (2023)4.52
- Approximate Cluster-based Sparse Document Retrieval With Segmented Maximum Term Weights (2024)0.00
- Unsupervised Graph-based Rank Aggregation For Improved Retrieval (2019)9.03
- More Robust Dense Retrieval With Contrastive Dual Learning (2021)11.88
- Dreditor: An Time-efficient Approach For Building A Domain-specific Dense Retrieval Model (2024)0.00
- Learning To Retrieve: How To Train A Dense Retrieval Model Effectively And Efficiently (2020)0.00
- CODER: An Efficient Framework For Improving Retrieval Through Contextual Document Embedding Reranking (2021)7.16