{"ID":2882763,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.10068","arxiv_id":"2508.10068","title":"SaraCoder: Orchestrating Semantic and Structural Cues for Resource-Optimized Repository-Level Code Completion","abstract":"Despite Retrieval-Augmented Generation improving code completion, traditional retrieval methods struggle with information redundancy and a lack of diversity within limited context windows. To solve this, we propose a resource-optimized retrieval augmentation method, SaraCoder. It maximizes information diversity and representativeness in a limited context window, significantly boosting the accuracy and reliability of repository-level code completion. Its core Hierarchical Feature Optimization module systematically refines candidates by distilling deep semantic relationships, pruning exact duplicates, assessing structural similarity with a novel graph-based metric that weighs edits by their topological importance, and reranking results to maximize both relevance and diversity. Furthermore, an External-Aware Identifier Disambiguator module accurately resolves cross-file symbol ambiguity via dependency analysis. Extensive experiments on the challenging CrossCodeEval and RepoEval-Updated benchmarks demonstrate that SaraCoder outperforms existing baselines across multiple programming languages and models. Our work proves that systematically refining retrieval results across multiple dimensions provides a new paradigm for building more accurate and resource-optimized repository-level code completion systems.","short_abstract":"Despite Retrieval-Augmented Generation improving code completion, traditional retrieval methods struggle with information redundancy and a lack of diversity within limited context windows. To solve this, we propose a resource-optimized retrieval augmentation method, SaraCoder. It maximizes information diversity and rep...","url_abs":"https://arxiv.org/abs/2508.10068","url_pdf":"https://arxiv.org/pdf/2508.10068v2","authors":"[\"Xiaohan Chen\",\"Zhongying Pan\",\"Quan Feng\",\"Yu Tian\",\"Shuqun Yang\",\"Mengru Wang\",\"Lina Gong\",\"Yuxia Geng\",\"Piji Li\",\"Xiang Chen\"]","published":"2025-08-13T11:56:05Z","proceeding":"cs.SE","tasks":"[\"cs.SE\",\"cs.CL\",\"cs.IR\",\"cs.PL\"]","methods":"[\"RAG\"]","has_code":false}
