Spider: Spatially Informed Dense Embedding Retrieval For Software Issue Localization
2025 Β· Shravan Chaudhari, Rahul Thomas Jacob, Mononito Goswami, et al.
Abstract
Retrieving code functions, classes or files that are relevant in order to solve a given user query, bug report or feature request from large codebases is a fundamental challenge for Large Language Model (LLM)-based coding agents. Agentic approaches typically employ sparse retrieval methods like BM25 or dense embedding strategies to identify semantically relevant units. While embedding-based approaches can outperform BM25 by large margins, they often don't take into consideration the underlying graph-structured characteristics of the codebase. To address this, we propose SpIDER (Spatially Informed Dense Embedding Retrieval), an enhanced dense retrieval approach that integrates LLM-based reasoning along with auxiliary information obtained from graph-based exploration of the codebase. We further introduce SpIDER-Bench, a graph-structured evaluation benchmark curated from SWE-PolyBench, SWEBench-Verified and Multi-SWEBench, spanning codebases from Python, Java, JavaScript and TypeScript pr
Authors
(none)
Tags
Stats
Related papers
- On The Challenges And Opportunities Of Learned Sparse Retrieval For Code (2026)0.00
- Investigating The Scalability Of Approximate Sparse Retrieval Algorithms To Massive Datasets (2025)5.84
- Practical Code RAG At Scale: Task-aware Retrieval Design Choices Under Compute Budgets (2025)0.00
- Learning Deep Semantic Model For Code Search Using Codesearchnet Corpus (2022)3.16
- Llm-augmented Retrieval: Enhancing Retrieval Models Through Language Models And Doc-level Embedding (2024)0.00
- Large Reasoning Embedding Models: Towards Next-generation Dense Retrieval Paradigm (2025)0.00
- Lexsembridge: Fine-grained Dense Representation Enhancement Through Token-aware Embedding Augmentation (2025)2.35
- Evaluating Embedding Apis For Information Retrieval (2023)8.09