CREAM: Continual Retrieval On Dynamic Streaming Corpora With Adaptive Soft Memory
2026 Β· Huijeong Son, Hyeongu Kang, Sunho Kim, et al.
Abstract
Information retrieval (IR) in dynamic data streams is a crucial task, as shifts in data distribution degrade the performance of AI-powered IR systems. To mitigate this issue, memory-based continual learning has been widely adopted for IR. However, existing methods rely on a fixed set of queries with ground-truth documents, which limits generalization to unseen data, making them impractical for real-world applications. To enable more effective learning with unseen topics of a new corpus without ground-truth labels, we propose CREAM, a self-supervised framework for memory-based continual retrieval. CREAM captures the evolving semantics of streaming queries and documents into dynamically structured soft memory and leverages it to adapt to both seen and unseen topics in an unsupervised setting. We realize this through three key techniques: fine-grained similarity estimation, regularized cluster prototyping, and stratified coreset sampling. Experiments on two benchmark datasets demonstrate
Authors
(none)
Tags
Stats
Related papers
- Continual Learning For Generative Retrieval Over Dynamic Corpora (2023)11.49
- Retrieval-augmented Memory For Online Learning (2025)0.00
- LUMA-RAG: Lifelong Multimodal Agents With Provably Stable Streaming Alignment (2025)0.00
- L^2R: Lifelong Learning For First-stage Retrieval With Backward-compatible Representations (2023)5.24
- Going Down Memory Lane: Scaling Tokens For Video Stream Understanding With Dynamic Kv-cache Memory (2026)0.00
- A Dynamic Retrieval-augmented Generation System With Selective Memory And Remembrance (2026)0.00
- CREM: Compression-driven Representation Enhancement For Multimodal Retrieval And Comprehension (2026)0.00
- MURR: Model Updating With Regularized Replay For Searching A Document Stream (2025)0.00