{"ID":3005064,"CreatedAt":"2026-06-03T03:09:48.883664427Z","UpdatedAt":"2026-06-05T07:50:16.0004273Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.03308","arxiv_id":"2606.03308","title":"The Security Budget of Code LLMs: An Information-Theoretic Capacity-Security Bound","abstract":"AI programming assistants make natural-language prompts a software-development interface, so small prompt perturbations become usability and security risks. We study an information-theoretic trade-off for code LLMs between functional capacity, $\\Cap=\\rmI(c^*;c_π)$, and perturbation retention, $\\Sec=\\rmI(c_π;\\tilde c_π)$. Here $\\Sec$ is a retention-channel quantity, not a direct measure of exploit success or vulnerable-code generation. For code completion modeled as $p\\to c_π$ with perturbed prompt $\\tilde p$, we prove $\\Cap+\\Sec\\le \\rmH(c^*)+\\rmI(p;\\tilde p)$, decomposing the budget into task entropy and prompt leakage. A deterministic-embedding corollary gives the hidden-state version, and a tokenizer/gzip companion bound gives a model-agnostic ceiling on sequence-level task entropy. Empirically, we estimate embedded $\\Cap$ and $\\Sec$ from output-only last-token hidden states, excluding prompt context from the $\\Sec$ channel. Six individual validation rows across two models, two datasets, INT4/BF16 precision, and estimator ablations satisfy the embedded check $(\\Cap+\\max_T\\Sec)/(\\rmH(z^*)+\\max_T\\rmI(p;\\tilde p))\\le1$. Saturation is 0.27--0.92 and theorem slack is 2.36--26.94 nats; a separate three-seed stability diagnostic has mean saturation 0.87. A context-mixed cosine, used only as a per-problem generation-prompt alignment signal, correlates with pass@1 on CodeLlama-HumanEval ($ρ{=}0.36$, $p{\u003c}10^{-4}$), Qwen-HumanEval ($ρ{=}0.22$, $p{=}0.005$), and CodeLlama-MBPP ($ρ{=}0.225$, $p{=}0.0038$; all $n{=}164$). Adaptive stress tests with a 23-perturbation pool, a fixed universal suffix, and prompt-embedding PGD all leave positive slack.","short_abstract":"AI programming assistants make natural-language prompts a software-development interface, so small prompt perturbations become usability and security risks. We study an information-theoretic trade-off for code LLMs between functional capacity, $\\Cap=\\rmI(c^*;c_π)$, and perturbation retention, $\\Sec=\\rmI(c_π;\\tilde c_π)...","url_abs":"https://arxiv.org/abs/2606.03308","url_pdf":"https://arxiv.org/pdf/2606.03308v1","authors":"[\"Jianwei Tai\"]","published":"2026-06-02T08:22:14Z","proceeding":"cs.CR","tasks":"[\"cs.CR\"]","methods":"[\"Large Language Model\"]","has_code":false}
