{"ID":2855329,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.17862","arxiv_id":"2510.17862","title":"When \"Correct\" Is Not Safe: Can We Trust Functionally Correct Patches Generated by Code Agents?","abstract":"Code agents are increasingly trusted to autonomously fix bugs on platforms such as GitHub, yet their security evaluation focuses almost exclusively on functional correctness. In this paper, we reveal a novel type of threat to real-world code agents: Functionally Correct yet Vulnerable (FCV) patches, which pass all test cases but contain vulnerable code. With our proposed FCV-Attack, which can be deliberately crafted by malicious attackers or implicitly introduced by benign developers, we show that SOTA LLMs (e.g., ChatGPT and Claude) and agent scaffolds (e.g., SWE-agent and OpenHands) are all vulnerable to this FCV threat; across 12 agent-model combinations on SWE-Bench, the attack only requires black-box access and a single query to the code agent to perform the attack. For example, for CWE-538 (information exposure vulnerability), the FCV-Attack attains an attack success rate of $40.7\\%$ on GPT-5 Mini + OpenHands. Our results reveal an important security threat overlooked by current evaluation paradigms and urge the development of security-aware defenses for code agents.","short_abstract":"Code agents are increasingly trusted to autonomously fix bugs on platforms such as GitHub, yet their security evaluation focuses almost exclusively on functional correctness. In this paper, we reveal a novel type of threat to real-world code agents: Functionally Correct yet Vulnerable (FCV) patches, which pass all test...","url_abs":"https://arxiv.org/abs/2510.17862","url_pdf":"https://arxiv.org/pdf/2510.17862v1","authors":"[\"Yibo Peng\",\"James Song\",\"Lei Li\",\"Xinyu Yang\",\"Mihai Christodorescu\",\"Ravi Mangal\",\"Corina Pasareanu\",\"Haizhong Zheng\",\"Beidi Chen\"]","published":"2025-10-15T17:16:36Z","proceeding":"cs.CR","tasks":"[\"cs.CR\",\"cs.SE\"]","methods":"[\"Large Language Model\"]","has_code":false}
