{"ID":2827337,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.16956","arxiv_id":"2512.16956","title":"SpIDER: Spatially Informed Dense Embedding Retrieval for Software Issue Localization","abstract":"Retrieving code functions, classes or files that are relevant in order to solve a given user query, bug report or feature request from large codebases is a fundamental challenge for Large Language Model (LLM)-based coding agents. Agentic approaches typically employ sparse retrieval methods like BM25 or dense embedding strategies to identify semantically relevant units. While embedding-based approaches can outperform BM25 by large margins, they often don't take into consideration the underlying graph-structured characteristics of the codebase. To address this, we propose SpIDER (Spatially Informed Dense Embedding Retrieval), an enhanced dense retrieval approach that integrates LLM-based reasoning along with auxiliary information obtained from graph-based exploration of the codebase. We further introduce SpIDER-Bench, a graph-structured evaluation benchmark curated from SWE-PolyBench, SWEBench-Verified and Multi-SWEBench, spanning codebases from Python, Java, JavaScript and TypeScript programming languages. Empirical results show that SpIDER consistently improves dense retrieval performance by at least 13% across programming languages and benchmarks in SpIDER-Bench.","short_abstract":"Retrieving code functions, classes or files that are relevant in order to solve a given user query, bug report or feature request from large codebases is a fundamental challenge for Large Language Model (LLM)-based coding agents. Agentic approaches typically employ sparse retrieval methods like BM25 or dense embedding...","url_abs":"https://arxiv.org/abs/2512.16956","url_pdf":"https://arxiv.org/pdf/2512.16956v2","authors":"[\"Shravan Chaudhari\",\"Rahul Thomas Jacob\",\"Mononito Goswami\",\"Jiajun Cao\",\"Shihab Rashid\",\"Christian Bock\"]","published":"2025-12-18T01:32:25Z","proceeding":"cs.SE","tasks":"[\"cs.SE\",\"cs.LG\"]","methods":"[\"Large Language Model\",\"Language Model\",\"LoRA\"]","has_code":false}
