BERM: Training The Balanced And Extractable Representation For Matching To Improve Generalization Ability Of Dense Retrieval
2023 Β· Shicheng Xu, Liang Pang, Huawei Shen, et al.
Abstract
Dense retrieval has shown promise in the first-stage retrieval process when trained on in-domain labeled datasets. However, previous studies have found that dense retrieval is hard to generalize to unseen domains due to its weak modeling of domain-invariant and interpretable feature (i.e., matching signal between two texts, which is the essence of information retrieval). In this paper, we propose a novel method to improve the generalization of dense retrieval via capturing matching signal called BERM. Fully fine-grained expression and query-oriented saliency are two properties of the matching signal. Thus, in BERM, a single passage is segmented into multiple units and two unit-level requirements are proposed for representation as the constraint in training to obtain the effective matching signal. One is semantic unit balance and the other is essential matching unit extractability. Unit-level view and balanced semantics make representation express the text in a fine-grained manner. Esse
Authors
(none)
Tags
Stats
Related papers
- Learning To Retrieve: How To Train A Dense Retrieval Model Effectively And Efficiently (2020)0.00
- A Dense Representation Framework For Lexical And Semantic Matching (2022)11.13
- Unsupervised Dense Information Retrieval With Contrastive Learning (2021)0.00
- Disentangled Modeling Of Domain And Relevance For Adaptable Dense Retrieval (2022)0.00
- Training For Compositional Sensitivity Reduces Dense Retrieval Generalization (2026)0.00
- Pseudo-relevance Feedback For Multiple Representation Dense Retrieval (2021)12.93
- Lexsembridge: Fine-grained Dense Representation Enhancement Through Token-aware Embedding Augmentation (2025)2.35
- Large Dual Encoders Are Generalizable Retrievers (2021)14.69