Query Drift Compensation: Enabling Compatibility In Continual Learning Of Retrieval Embedding Models
2025 · Dipam Goswami, Liying Wang, Bartłomiej Twardowski, et al.
Abstract
Text embedding models enable semantic search, powering several NLP applications like Retrieval Augmented Generation by efficient information retrieval (IR). However, text embedding models are commonly studied in scenarios where the training data is static, thus limiting its applications to dynamic scenarios where new training data emerges over time. IR methods generally encode a huge corpus of documents to low-dimensional embeddings and store them in a database index. During retrieval, a semantic search over the corpus is performed and the document whose embedding is most similar to the query embedding is returned. When updating an embedding model with new training data, using the already indexed corpus is suboptimal due to the non-compatibility issue, since the model which was used to obtain the embeddings of the corpus has changed. While re-indexing of old corpus documents using the updated model enables compatibility, it requires much higher computation and time. Thus, it is critica
Authors
(none)
Tags
Stats
Related papers
- Drift-adapter: A Practical Approach To Near Zero-downtime Embedding Model Upgrades In Vector Databases (2025)0.00
- Forward Compatible Training For Large-scale Embedding Retrieval Systems (2021)8.09
- L^2R: Lifelong Learning For First-stage Retrieval With Backward-compatible Representations (2023)5.24
- Llm-augmented Retrieval: Enhancing Retrieval Models Through Language Models And Doc-level Embedding (2024)0.00
- Efficient Fine-tuning Methodology Of Text Embedding Models For Information Retrieval: Contrastive Learning Penalty (clp) (2024)2.16
- Continual Learning For Generative Retrieval Over Dynamic Corpora (2023)11.49
- Disentangled Modeling Of Domain And Relevance For Adaptable Dense Retrieval (2022)0.00
- Query Expansion With Locally-trained Word Embeddings (2016)16.14