Scalingnote: Scaling Up Retrievers With Large Language Models For Real-world Dense Retrieval
2024 Β· Suyuan Huang, Chao Zhang, Yuanyuan Wu, et al.
Abstract
Dense retrieval in most industries employs dual-tower architectures to retrieve query-relevant documents. Due to online deployment requirements, existing real-world dense retrieval systems mainly enhance performance by designing negative sampling strategies, overlooking the advantages of scaling up. Recently, Large Language Models (LLMs) have exhibited superior performance that can be leveraged for scaling up dense retrieval. However, scaling up retrieval models significantly increases online query latency. To address this challenge, we propose ScalingNote, a two-stage method to exploit the scaling potential of LLMs for retrieval while maintaining online query latency. The first stage is training dual towers, both initialized from the same LLM, to unlock the potential of LLMs for dense retrieval. Then, we distill only the query tower using mean squared error loss and cosine similarity to reduce online costs. Through theoretical analysis and comprehensive offline and online experiments,
Authors
(none)
Tags
Stats
Related papers
- Scaling Sparse And Dense Retrieval In Decoder-only Llms (2025)6.34
- Scaling Laws For Dense Retrieval (2024)10.07
- Making Large Language Models Efficient Dense Retrievers (2025)0.00
- Expandr: Teaching Dense Retrievers Beyond Queries With LLM Guidance (2025)3.25
- Lightretriever: A Llm-based Text Retrieval Architecture With Extremely Faster Query Inference (2025)0.00
- Pseudo Relevance Feedback Is Enough To Close The Gap Between Small And Large Dense Retrieval Models (2025)0.00
- CSPLADE: Learned Sparse Retrieval With Causal Language Models (2025)0.00
- Llm-augmented Retrieval: Enhancing Retrieval Models Through Language Models And Doc-level Embedding (2024)0.00