CITADEL: Conditional Token Interaction Via Dynamic Lexical Routing For Efficient And Effective Multi-vector Retrieval
2022 Β· Minghan Li, Sheng-Chieh Lin, Barlas Oguz, et al.
Abstract
Multi-vector retrieval methods combine the merits of sparse (e.g. BM25) and dense (e.g. DPR) retrievers and have achieved state-of-the-art performance on various retrieval tasks. These methods, however, are orders of magnitude slower and need much more space to store their indices compared to their single-vector counterparts. In this paper, we unify different multi-vector retrieval models from a token routing viewpoint and propose conditional token interaction via dynamic lexical routing, namely CITADEL, for efficient and effective multi-vector retrieval. CITADEL learns to route different token vectors to the predicted lexical ``keys'' such that a query token vector only interacts with document token vectors routed to the same key. This design significantly reduces the computation cost while maintaining high accuracy. Notably, CITADEL achieves the same or slightly better performance than the previous state of the art, ColBERT-v2, on both in-domain (MS MARCO) and out-of-domain (BEIR) ev
Authors
(none)
Tags
Stats
Related papers
- Rethinking The Role Of Token Retrieval In Multi-vector Retrieval (2023)0.00
- Reducing The Footprint Of Multi-vector Retrieval With Minimal Performance Impact Via Token Pooling (2024)0.00
- SLIM: Sparsified Late Interaction For Multi-vector Retrieval With Inverted Indexes (2023)7.50
- Pylate: Flexible Training And Retrieval For Late Interaction Models (2025)3.58
- Selroute: Query-type-aware Routing For Long-term Conversational Memory Retrieval (2026)0.00
- Investigating Multi-layer Representations For Dense Passage Retrieval (2025)0.00
- What Are You Token About? Dense Retrieval As Distributions Over The Vocabulary (2022)8.09
- CODER: An Efficient Framework For Improving Retrieval Through Contextual Document Embedding Reranking (2021)7.16