{"ID":2862148,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.01002","arxiv_id":"2510.01002","title":"Semantics-Aligned, Curriculum-Driven, and Reasoning-Enhanced Vulnerability Repair Framework","abstract":"Current learning-based Automated Vulnerability Repair (AVR) approaches, while promising, often fail to generalize effectively in real-world scenarios. Our diagnostic analysis reveals three fundamental weaknesses in state-of-the-art AVR approaches: (1) limited cross-repository generalization, with performance drops on unseen codebases; (2) inability to capture long-range dependencies, causing a performance degradation on complex, multi-hunk repairs; and (3) over-reliance on superficial lexical patterns, leading to significant performance drops on vulnerabilities with minor syntactic variations like variable renaming. To address these limitations, we propose SeCuRepair, a semantics-aligned, curriculum-driven, and reasoning-enhanced framework for vulnerability repair. At its core, SeCuRepair adopts a reason-then-edit paradigm, requiring the model to articulate why and how a vulnerability should be fixed before generating the patch. This explicit reasoning enforces a genuine understanding of repair logic rather than superficial memorization of lexical patterns. SeCuRepair also moves beyond traditional supervised fine-tuning and employs semantics-aware reinforcement learning, rewarding patches for their syntactic and semantic alignment with the oracle patch rather than mere token overlap. Complementing this, a difficulty-aware curriculum progressively trains the model, starting with simple fixes and advancing to complex, multi-hunk coordinated edits. We evaluate SeCuRepair on strict, repository-level splits of BigVul and newly crafted PrimeVul_AVR datasets. SeCuRepair significantly outperforms all baselines, surpassing the best-performing baselines by 34.52% on BigVul and 31.52% on PrimeVul\\textsubscript{AVR} in terms of CodeBLEU, respectively. Comprehensive ablation studies further confirm that each component of our framework contributes to its final performance.","short_abstract":"Current learning-based Automated Vulnerability Repair (AVR) approaches, while promising, often fail to generalize effectively in real-world scenarios. Our diagnostic analysis reveals three fundamental weaknesses in state-of-the-art AVR approaches: (1) limited cross-repository generalization, with performance drops on u...","url_abs":"https://arxiv.org/abs/2510.01002","url_pdf":"https://arxiv.org/pdf/2510.01002v2","authors":"[\"Chengran Yang\",\"Ting Zhang\",\"Jinfeng Jiang\",\"Xin Zhou\",\"Haoye Tian\",\"Mingzhe Du\",\"Jieke Shi\",\"Junkai Chen\",\"Yikun Li\",\"Eng Lieh Ouh\",\"Lwin Khin Shar\",\"David Lo\"]","published":"2025-10-01T15:09:27Z","proceeding":"cs.SE","tasks":"[\"cs.SE\",\"cs.CR\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
