L^2R: Lifelong Learning For First-stage Retrieval With Backward-compatible Representations
2023 Β· Yinqiong Cai, Keping Bi, Yixing Fan, et al.
Abstract
First-stage retrieval is a critical task that aims to retrieve relevant document candidates from a large-scale collection. While existing retrieval models have achieved impressive performance, they are mostly studied on static data sets, ignoring that in the real-world, the data on the Web is continuously growing with potential distribution drift. Consequently, retrievers trained on static old data may not suit new-coming data well and inevitably produce sub-optimal results. In this work, we study lifelong learning for first-stage retrieval, especially focusing on the setting where the emerging documents are unlabeled since relevance annotation is expensive and may not keep up with data emergence. Under this setting, we aim to develop model updating with two goals: (1) to effectively adapt to the evolving distribution with the unlabeled new-coming data, and (2) to avoid re-inferring all embeddings of old documents to efficiently update the index each time the model is updated. We fir
Authors
(none)
Tags
Stats
Related papers
- Learning To Retrieve: How To Train A Dense Retrieval Model Effectively And Efficiently (2020)0.00
- Advancing Continual Lifelong Learning In Neural Information Retrieval: Definition, Dataset, Framework, And Empirical Evaluation (2023)6.77
- Scalingnote: Scaling Up Retrievers With Large Language Models For Real-world Dense Retrieval (2024)0.00
- CSPLADE: Learned Sparse Retrieval With Causal Language Models (2025)0.00
- Metric Compatible Training For Online Backfilling In Large-scale Retrieval (2023)2.26
- Llm-augmented Retrieval: Enhancing Retrieval Models Through Language Models And Doc-level Embedding (2024)0.00
- Lifelong Learning For Text Retrieval And Recognition In Historical Handwritten Document Collections (2019)5.24
- Forward Compatible Training For Large-scale Embedding Retrieval Systems (2021)8.09