Col-bandit: Zero-shot Query-time Pruning For Late-interaction Retrieval
2026 · Roi Pony, Adi Raz, Oshri Naparstek, et al.
Abstract
Multi-vector late-interaction retrievers such as ColBERT achieve state-of-the-art retrieval quality, but their query-time cost is dominated by exhaustively computing token-level MaxSim interactions for every candidate document. While approximating late interaction with single-vector representations reduces cost, it often incurs substantial accuracy loss. We introduce Col-Bandit, a query-time pruning algorithm that reduces this computational burden by casting reranking as a finite-population Top-\(K\) identification problem. Col-Bandit maintains uncertainty-aware bounds over partially observed document scores and adaptively reveals only the (document, query token) MaxSim entries needed to determine the top results under statistical decision bounds with a tunable relaxation. Unlike coarse-grained approaches that prune entire documents or tokens offline, Col-Bandit sparsifies the interaction matrix on the fly. It operates as a zero-shot, drop-in layer over standard multi-vector systems, r
Authors
(none)
Tags
Stats
Related papers
- An Analysis On Matching Mechanisms And Token Pruning For Late-interaction Models (2024)5.24
- Colbertv2: Effective And Efficient Retrieval Via Lightweight Late Interaction (2021)17.46
- SLIM: Sparsified Late Interaction For Multi-vector Retrieval With Inverted Indexes (2023)7.50
- A Voronoi Cell Formulation For Principled Token Pruning In Late-interaction Retrieval Models (2026)0.00
- Rethinking The Role Of Token Retrieval In Multi-vector Retrieval (2023)0.00
- Pylate: Flexible Training And Retrieval For Late Interaction Models (2025)3.58
- Reducing The Footprint Of Multi-vector Retrieval With Minimal Performance Impact Via Token Pooling (2024)0.00
- Introducing Neural Bag Of Whole-words With Colberter: Contextualized Late Interactions Using Enhanced Reduction (2022)0.00