Abstract

The anchor-document data derived from web graphs offers a wealth of paired information for training dense retrieval models in an unsupervised manner. However, unsupervised data contains diverse patterns across the web graph and often exhibits significant imbalance, leading to suboptimal performance in underrepresented or difficult groups. In this paper, we introduce WebDRO, an efficient approach for clustering the web graph data and optimizing group weights to enhance the robustness of dense retrieval models. Initially, we build an embedding model for clustering anchor-document pairs. Specifically, we contrastively train the embedding model for link prediction, which guides the embedding model in capturing the document features behind the web graph links. Subsequently, we employ the group distributional robust optimization to recalibrate the weights across different clusters of anchor-document pairs during training retrieval models. During training, we direct the model to assign higher

Authors

(none)

Tags

  • Uncategorized

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keyhan2023enhancing

Related papers