Analysing The Robustness Of Dual Encoders For Dense Retrieval Against Misspellings
2022 Β· Georgios Sidiropoulos, Evangelos Kanoulas
Abstract
Dense retrieval is becoming one of the standard approaches for document and passage ranking. The dual-encoder architecture is widely adopted for scoring question-passage pairs due to its efficiency and high performance. Typically, dense retrieval models are evaluated on clean and curated datasets. However, when deployed in real-life applications, these models encounter noisy user-generated text. That said, the performance of state-of-the-art dense retrievers can substantially deteriorate when exposed to noisy text. In this work, we study the robustness of dense retrievers against typos in the user question. We observe a significant drop in the performance of the dual-encoder model when encountering typos and explore ways to improve its robustness by combining data augmentation with contrastive learning. Our experiments on two large-scale passage ranking and open-domain question answering datasets show that our proposed approach outperforms competing approaches. Additionally, we perform
Authors
(none)
Tags
Stats
Related papers
- Improving The Robustness Of Dense Retrievers Against Typos Via Multi-positive Contrastive Learning (2024)5.84
- Typo-robust Representation Learning For Dense Retrieval (2023)7.50
- Typos-aware Bottlenecked Pre-training For Robust Dense Retrieval (2023)5.84
- Noise-robust Dense Retrieval Via Contrastive Alignment Post Training (2023)0.00
- What Are You Token About? Dense Retrieval As Distributions Over The Vocabulary (2022)8.09
- Pre-train A Discriminative Text Encoder For Dense Retrieval Via Contrastive Span Prediction (2022)10.21
- Towards Robust Ranker For Text Retrieval (2022)5.84
- More Robust Dense Retrieval With Contrastive Dual Learning (2021)11.88