Noise-robust Dense Retrieval Via Contrastive Alignment Post Training
2023 Β· Daniel Campos, Chengxiang Zhai, Alessandro Magnani
Abstract
The success of contextual word representations and advances in neural information retrieval have made dense vector-based retrieval a standard approach for passage and document ranking. While effective and efficient, dual-encoders are brittle to variations in query distributions and noisy queries. Data augmentation can make models more robust but introduces overhead to training set generation and requires retraining and index regeneration. We present Contrastive Alignment POst Training (CAPOT), a highly efficient finetuning method that improves model robustness without requiring index regeneration, the training set optimization, or alteration. CAPOT enables robust retrieval by freezing the document encoder while the query encoder learns to align noisy queries with their unaltered root. We evaluate CAPOT noisy variants of MSMARCO, Natural Questions, and Trivia QA passage retrieval, finding CAPOT has a similar impact as data augmentation with none of its overhead.
Authors
(none)
Tags
Stats
Related papers
- CODER: An Efficient Framework For Improving Retrieval Through Contextual Document Embedding Reranking (2021)7.16
- Analysing The Robustness Of Dual Encoders For Dense Retrieval Against Misspellings (2022)9.59
- More Robust Dense Retrieval With Contrastive Dual Learning (2021)11.88
- Unsupervised Dense Retrieval With Conterfactual Contrastive Learning (2024)0.00
- Improving The Robustness Of Dense Retrievers Against Typos Via Multi-positive Contrastive Learning (2024)5.84
- Noisy Self-training With Synthetic Queries For Dense Retrieval (2023)0.00
- Unsupervised Dense Retrieval Training With Web Anchors (2023)3.81
- Unsupervised Dense Information Retrieval With Contrastive Learning (2021)0.00