SLQ: Bridging Modalities Via Shared Latent Queries For Retrieval With Frozen Mllms
2026 Β· Haoran Lou, Ziyan Liu, Chunxiao Fan, et al.
Abstract
Multimodal Large Language Models (MLLMs) exhibit strong reasoning and world knowledge, yet adapting them for retrieval remains challenging. Existing approaches rely on invasive parameter updates, such as full fine-tuning and LoRA, which may disrupt the pre-trained semantic space and impair the structured knowledge essential for reasoning. In this work, we argue that adapting MLLMs for retrieval should focus on eliciting pre-trained representations rather than overwriting them. To this end, we propose SLQ, an effective and efficient framework that adapts a frozen MLLM into a retriever through a small set of Shared Latent Queries. Appended to the end of both text and image token sequences, these queries leverage the model's native causal attention to serve as global aggregation interfaces, producing compact embeddings in a unified space while keeping the backbone unchanged. Furthermore, to better evaluate retrieval beyond superficial pattern matching, we construct KARR-Bench, a benchmark
Authors
(none)
Tags
Stats
Related papers
- Indexing Multimodal Language Models For Large-scale Image Retrieval (2026)0.00
- Mm-embed: Universal Multimodal Retrieval With Multimodal Llms (2024)0.00
- RETLLM: Training And Data-free Mllms For Multimodal Information Retrieval (2026)1.57
- An Interactive Multi-modal Query Answering System With Retrieval-augmented Large Language Models (2024)5.84
- CSPLADE: Learned Sparse Retrieval With Causal Language Models (2025)0.00
- Scaling Sparse And Dense Retrieval In Decoder-only Llms (2025)6.34
- Freeret: Mllms As Training-free Retrievers (2025)0.00
- Lamra: Large Multimodal Model As Your Advanced Retrieval Assistant (2024)7.50