Visual Late Chunking: An Empirical Study Of Contextual Chunking For Efficient Visual Document Retrieval
2026 Β· Yibo Yan, Mingdong Ou, Yi Cao, et al.
Abstract
Multi-vector models dominate Visual Document Retrieval (VDR) due to their fine-grained matching capabilities, but their high storage and computational costs present a major barrier to practical deployment. In this paper, we propose ColChunk, a plug-and-play framework that introduces multimodal late chunking to construct efficient, contextualized multi-vectors. Unlike existing pruning or fixed-token approaches, ColChunk employs hierarchical clustering on patch-level embeddings, fused with a 2D position prior to ensure spatial-semantic coherence. This adaptive grouping allows for a content-aware representation that preserves global context while drastically reducing the vector count. Evaluations across 24 VDR datasets demonstrate ColChunk achieves over a 90% reduction in storage requirements while simultaneously delivering a 9-point average improvement in nDCG@5 across representative single-vector models. ColChunk provides a practical solution for balancing retrieval accuracy and efficie
Authors
(none)
Tags
Stats
Related papers
- Docpruner: A Storage-efficient Framework For Multi-vector Visual Document Retrieval Via Adaptive Patch-level Embedding Pruning (2025)0.00
- Sculpting The Vector Space: Towards Efficient Multi-vector Visual Document Retrieval Via Prune-then-merge Framework (2026)0.00
- Modernvbert: Towards Smaller Visual Document Retrievers (2025)0.00
- Late Chunking: Contextual Chunk Embeddings Using Long-context Embedding Models (2024)0.00
- Colpali: Efficient Document Retrieval With Vision Language Models (2024)0.00
- Hybrid-vector Retrieval For Visually Rich Documents: Combining Single-vector Efficiency And Multi-vector Accuracy (2025)2.23
- Reproducibility, Replicability, And Insights Into Visual Document Retrieval With Late Interaction (2025)2.26
- Beyond Chunk-then-embed: A Comprehensive Taxonomy And Evaluation Of Document Chunking Strategies For Information Retrieval (2026)0.00