Practical Code RAG At Scale: Task-aware Retrieval Design Choices Under Compute Budgets
2025 Β· Timur Galimzyanov, Olga Kolomyttseva, Egor Bogomolov
Abstract
We study retrieval design for code-focused generation tasks under realistic compute budgets. Using two complementary tasks from Long Code Arena -- code completion and bug localization -- we systematically compare retrieval configurations across various context window sizes along three axes: (i) chunking strategy, (ii) similarity scoring, and (iii) splitting granularity. (1) For PL-PL, sparse BM25 with word-level splitting is the most effective and practical, significantly outperforming dense alternatives while being an order of magnitude faster. (2) For NL-PL, proprietary dense encoders (Voyager-3 family) consistently beat sparse retrievers, however requiring 100x larger latency. (3) Optimal chunk size scales with available context: 32-64 line chunks work best at small budgets, and whole-file retrieval becomes competitive at 16000 tokens. (4) Simple line-based chunking matches syntax-aware splitting across budgets. (5) Retrieval latency varies by up to 200x across configurations; BPE-b
Authors
(none)
Tags
Stats
Related papers
- On The Challenges And Opportunities Of Learned Sparse Retrieval For Code (2026)0.00
- Optimizing Retrieval-augmented Generation: Analysis Of Hyperparameter Impact On Performance And Efficiency (2025)0.00
- Mor: Better Handling Diverse Queries With A Mixture Of Sparse, Dense, And Human Retrievers (2025)2.26
- Frustratingly Simple Retrieval Improves Challenging, Reasoning-intensive Benchmarks (2025)0.00
- Optimizing Retrieval For RAG Via Reinforcement Learning (2025)0.00
- Towards A Generalist Code Embedding Model Based On Massive Data Synthesis (2025)8.13
- From BM25 To Corrective RAG: Benchmarking Retrieval Strategies For Text-and-table Documents (2026)0.00
- Ragsmith: A Framework For Finding The Optimal Composition Of Retrieval-augmented Generation Methods Across Datasets (2025)0.00