MURR: Model Updating With Regularized Replay For Searching A Document Stream
2025 Β· Eugene Yang, Nicola Tonellotto, Dawn Lawrie, et al.
Abstract
The Internet produces a continuous stream of new documents and user-generated queries. These naturally change over time based on events in the world and the evolution of language. Neural retrieval models that were trained once on a fixed set of query-document pairs will quickly start misrepresenting newly-created content and queries, leading to less effective retrieval. Traditional statistical sparse retrieval can update collection statistics to reflect these changes in the use of language in documents and queries. In contrast, continued fine-tuning of the language model underlying neural retrieval approaches such as DPR and ColBERT creates incompatibility with previously-encoded documents. Re-encoding and re-indexing all previously-processed documents can be costly. In this work, we explore updating a neural dual encoder retrieval model without reprocessing past documents in the stream. We propose MURR, a model updating strategy with regularized replay, to ensure the model can still f
Authors
(none)
Tags
Stats
Related papers
- Imrnns: An Efficient Method For Interpretable Dense Retrieval Via Embedding Modulation (2026)0.00
- Llm-augmented Retrieval: Enhancing Retrieval Models Through Language Models And Doc-level Embedding (2024)0.00
- CREAM: Continual Retrieval On Dynamic Streaming Corpora With Adaptive Soft Memory (2026)0.00
- L^2R: Lifelong Learning For First-stage Retrieval With Backward-compatible Representations (2023)5.24
- Query Drift Compensation: Enabling Compatibility In Continual Learning Of Retrieval Embedding Models (2025)1.20
- Retrieval-augmented Memory For Online Learning (2025)0.00
- Noisy Self-training With Synthetic Queries For Dense Retrieval (2023)0.00
- CODER: An Efficient Framework For Improving Retrieval Through Contextual Document Embedding Reranking (2021)7.16