Look In The Middle: Structural Anchor Pruning For Scalable Visual RAG Indexing
2026 Β· Zhuchenyang Liu, Ziyu Hu, Yao Zhang, et al.
Abstract
Recent Vision-Language Models (e.g., ColPali) enable fine-grained Visual Document Retrieval (VDR) but incur prohibitive index vector size overheads. Training-free pruning solutions (e.g., EOS-attention based methods) can reduce index vector size by approximately 60% without model adaptation, but often underperform random selection in high-compression scenarios (> 80%). Prior research (e.g., Light-ColPali) attributes this to the conclusion that visual token importance is inherently query-dependent, thereby questioning the feasibility of training-free pruning. In this work, we propose Structural Anchor Pruning (SAP), a training-free pruning method that identifies key visual patches from middle layers to achieve high performance compression. We also introduce Oracle Score Retention (OSR) protocol to evaluate how layer-wise information affects compression efficiency. Evaluations on the ViDoRe benchmark demonstrate that SAP reduces index vectors by over 90% while maintaining robust retrieva
Authors
(none)
Tags
Stats
Related papers
- Docpruner: A Storage-efficient Framework For Multi-vector Visual Document Retrieval Via Adaptive Patch-level Embedding Pruning (2025)0.00
- Sculpting The Vector Space: Towards Efficient Multi-vector Visual Document Retrieval Via Prune-then-merge Framework (2026)0.00
- Beyond Patch Aggregation: 3-pass Pyramid Indexing For Vision-enhanced Document Retrieval (2025)0.00
- Structured Pruning For Efficient Visual Place Recognition (2024)2.26
- Modernvbert: Towards Smaller Visual Document Retrievers (2025)0.00
- Visual Late Chunking: An Empirical Study Of Contextual Chunking For Efficient Visual Document Retrieval (2026)0.00
- A Voronoi Cell Formulation For Principled Token Pruning In Late-interaction Retrieval Models (2026)0.00
- Colpali: Efficient Document Retrieval With Vision Language Models (2024)0.00