Unsupervised Dense Retrieval Training With Web Anchors
2023 Β· Yiqing Xie, Xiao Liu, Chenyan Xiong
Abstract
In this work, we present an unsupervised retrieval method with contrastive learning on web anchors. The anchor text describes the content that is referenced from the linked page. This shows similarities to search queries that aim to retrieve pertinent information from relevant documents. Based on their commonalities, we train an unsupervised dense retriever, Anchor-DR, with a contrastive learning task that matches the anchor text and the linked document. To filter out uninformative anchors (such as ``homepage'' or other functional anchors), we present a novel filtering technique to only select anchors that contain similar types of information as search queries. Experiments show that Anchor-DR outperforms state-of-the-art methods on unsupervised dense retrieval by a large margin (e.g., by 5.3% NDCG@10 on MSMARCO). The gain of our method is especially significant for search and question answering tasks. Our analysis further reveals that the pattern of anchor-document pairs is similar to
Authors
(none)
Tags
Stats
Related papers
- Unsupervised Dense Information Retrieval With Contrastive Learning (2021)0.00
- Enhancing Dense Retrievers' Robustness With Group-level Reweighting (2023)0.00
- Unsupervised Dense Retrieval With Conterfactual Contrastive Learning (2024)0.00
- Unsupervised Dense Retrieval With Relevance-aware Contrastive Pre-training (2023)10.44
- Approximate Nearest Neighbor Negative Contrastive Learning For Dense Text Retrieval (2020)0.00
- Noise-robust Dense Retrieval Via Contrastive Alignment Post Training (2023)0.00
- Black-box Adversarial Attacks Against Dense Retrieval Models: A Multi-view Contrastive Learning Method (2023)9.92
- Pre-training For Ad-hoc Retrieval: Hyperlink Is Also You Need (2021)10.35