Evo-retriever: Llm-guided Curriculum Evolution With Viewpoint-pathway Collaboration For Multimodal Document Retrieval
2026 Β· Weiqing Li, Jinyue Guo, Yaqi Wang, et al.
Abstract
Visual-language models (VLMs) excel at data mappings, but real-world document heterogeneity and unstructuredness disrupt the consistency of cross-modal embeddings. Recent late-interaction methods enhance image-text alignment through multi-vector representations, yet traditional training with limited samples and static strategies cannot adapt to the model's dynamic evolution, causing cross-modal retrieval confusion. To overcome this, we introduce Evo-Retriever, a retrieval framework featuring an LLM-guided curriculum evolution built upon a novel Viewpoint-Pathway collaboration. First, we employ multi-view image alignment to enhance fine-grained matching via multi-scale and multi-directional perspectives. Then, a bidirectional contrastive learning strategy generates "hard queries" and establishes complementary learning paths for visual and textual disambiguation to rebalance supervision. Finally, the model-state summary from the above collaboration is fed into an LLM meta-controller, whi
Authors
(none)
Tags
Stats
Related papers
- Verve: Versatile Retrieval For Videos Via Unified Embeddings (2026)0.00
- Realign: Optimizing The Visual Document Retriever With Reasoning-guided Fine-grained Alignment (2026)2.20
- Unlocking Multimodal Document Intelligence: From Current Triumphs To Future Frontiers Of Visual Document Retrieval (2026)0.00
- Evdclip: Improving Vision-language Retrieval With Entity Visual Descriptions From Large Language Models (2025)0.00
- Vldeformer: Vision-language Decomposed Transformer For Fast Cross-modal Retrieval (2021)10.21
- V-retrver: Evidence-driven Agentic Reasoning For Universal Multimodal Retrieval (2026)0.00
- MERLIN: Multimodal Embedding Refinement Via Llm-based Iterative Navigation For Text-video Retrieval-rerank Pipeline (2024)5.84
- MURE: Hierarchical Multi-resolution Encoding Via Vision-language Models For Visual Document Retrieval (2026)0.00