{"ID":2823082,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2601.01184","arxiv_id":"2601.01184","title":"SecureCodeRL: Security-Aware Reinforcement Learning for Code Generation with Partial-Credit Rewards","abstract":"Large Language Models (LLMs) can generate plausible code, but in settings that require exact stdin/stdout behavior they frequently produce programs that compile yet fail tests, and in some cases they introduce security-sensitive patterns. This paper presents SecureCodeRL, a reinforcement learning (RL) pipeline for security-aware code generation that optimizes a combined reward R = αRfunc + \\b{eta}Rsec. The key idea is a partial-credit functional reward that assigns intermediate scores for syntactic validity, successful execution, and producing output, reducing reward sparsity that otherwise stalls learning on competitive programming style tasks. I evaluate supervised fine-tuning (SFT) and PPO variants on a small held-out prompt set from APPS+ and observe that PPO with partial credit (using a continued-training variant) improves syntax validity from 45% (SFT) to 60% and achieves the only non-zero test success signal in this pilot evaluation (5% at-least-one-test-pass), while remaining 100% clean under Bandit static analysis. Although Bandit findings were absent in this small evaluation, the security term is integrated into training to discourage insecure shortcuts when they appear.","short_abstract":"Large Language Models (LLMs) can generate plausible code, but in settings that require exact stdin/stdout behavior they frequently produce programs that compile yet fail tests, and in some cases they introduce security-sensitive patterns. This paper presents SecureCodeRL, a reinforcement learning (RL) pipeline for secu...","url_abs":"https://arxiv.org/abs/2601.01184","url_pdf":"https://arxiv.org/pdf/2601.01184v1","authors":"[\"Suryansh Singh Sijwali\",\"Suman Saha\"]","published":"2026-01-03T13:36:36Z","proceeding":"cs.CR","tasks":"[\"cs.CR\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\"]","has_code":false}
