Making Large Language Models Efficient Dense Retrievers
2025 Β· Yibin Lei, Shwai He, Ang Li, et al.
Abstract
Recent work has shown that directly fine-tuning large language models (LLMs) for dense retrieval yields strong performance, but their substantial parameter counts make them computationally inefficient. While prior studies have revealed significant layer redundancy in LLMs for generative tasks, it remains unclear whether similar redundancy exists when these models are adapted for retrieval tasks, which require encoding entire sequences into fixed representations rather than generating tokens iteratively. To this end, we conduct a comprehensive analysis of layer redundancy in LLM-based dense retrievers. We find that, in contrast to generative settings, MLP layers are substantially more prunable, while attention layers remain critical for semantic aggregation. Building on this insight, we propose EffiR, a framework for developing efficient retrievers that performs large-scale MLP compression through a coarse-to-fine strategy (coarse-grained depth reduction followed by fine-grained width r
Authors
(none)
Tags
Stats
Related papers
- Scalingnote: Scaling Up Retrievers With Large Language Models For Real-world Dense Retrieval (2024)0.00
- Lightretriever: A Llm-based Text Retrieval Architecture With Extremely Faster Query Inference (2025)0.00
- Expandr: Teaching Dense Retrievers Beyond Queries With LLM Guidance (2025)3.25
- Pseudo Relevance Feedback Is Enough To Close The Gap Between Small And Large Dense Retrieval Models (2025)0.00
- Scaling Sparse And Dense Retrieval In Decoder-only Llms (2025)6.34
- A Comparative Study Of Specialized Llms As Dense Retrievers (2025)2.26
- Freeret: Mllms As Training-free Retrievers (2025)0.00
- Revela: Dense Retriever Learning Via Language Modeling (2025)0.00