{"ID":2841934,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.11896","arxiv_id":"2511.11896","title":"VULPO: Context-Aware Vulnerability Detection via On-Policy LLM Optimization","abstract":"Large language models (LLMs) have recently shown strong potential in vulnerability detection (VD). However, accurately detecting vulnerabilities in real-world repositories requires reasoning over complex contextual interactions. Existing LLM-based VD approaches remain limited because current datasets lack complete contextual information and high-quality reasoning supervision, while existing optimization methods primarily rely on coarse outcome-centric supervision signals that fail to model the vulnerability reasoning process. To address these limitations, we first construct ContextVul, a new dataset that augments high-quality function-level vulnerability benchmarks with repository-level contextual information and curated vulnerability reasoning traces. Building upon ContextVul, we introduce a two-stage optimization framework consisting of lightweight cold-start supervised fine-tuning followed by vulnerability-adaptive on-policy optimization (VULPO). VULPO incorporates multidimensional rewards that jointly evaluate vulnerability identification, vulnerability-relevant localization, and causal reasoning quality, along with difficulty-adaptive reward scaling to mitigate reward hacking and improve RL effectiveness. Extensive experiments demonstrate the superiority of VULPO for context-aware VD. Our VULPO-4B, the first specialized vulnerability reasoning LLM, substantially outperforms existing VD baselines, improving Pairwise Pass@1 by 203% relative to Qwen3-4B and achieving competitive performance against a 150% larger-scale LLM, DeepSeek-V3.1.","short_abstract":"Large language models (LLMs) have recently shown strong potential in vulnerability detection (VD). However, accurately detecting vulnerabilities in real-world repositories requires reasoning over complex contextual interactions. Existing LLM-based VD approaches remain limited because current datasets lack complete cont...","url_abs":"https://arxiv.org/abs/2511.11896","url_pdf":"https://arxiv.org/pdf/2511.11896v3","authors":"[\"Youpeng Li\",\"Fuxun Yu\",\"Weiliang Qi\",\"Xinda Wang\"]","published":"2025-11-14T21:57:48Z","proceeding":"cs.CR","tasks":"[\"cs.CR\",\"cs.AI\",\"cs.SE\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}