{"ID":2881602,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.11925","arxiv_id":"2508.11925","title":"Optimizing Token Choice for Code Watermarking: An RL Approach","abstract":"Protecting intellectual property on LLM-generated code necessitates effective watermarking systems that can operate within code's highly structured, syntactically constrained nature. In this work, we introduce CodeTracer, an innovative adaptive code watermarking framework underpinned by a novel reinforcement learning training paradigm. At its core, CodeTracer features a policy-driven approach that utilizes a parameterized model to intelligently bias token choices during next-token prediction. This strategy ensures that embedded watermarks maintain code functionality while exhibiting subtle yet statistically detectable deviations from typical token distributions. To facilitate policy learning, we devise a comprehensive reward system that seamlessly integrates execution feedback with watermark embedding signals, balancing process-level and outcome-level rewards. Additionally, we employ Gumbel Top-k reparameterization to enable gradient-based optimization of discrete watermarking decisions. Extensive comparative evaluations demonstrate CodeTracer's significant superiority over state-of-the-art baselines in both watermark detectability and the preservation of generated code's functionality. Our code is available at https://github.com/TimeLovercc/CodeTracer.","short_abstract":"Protecting intellectual property on LLM-generated code necessitates effective watermarking systems that can operate within code's highly structured, syntactically constrained nature. In this work, we introduce CodeTracer, an innovative adaptive code watermarking framework underpinned by a novel reinforcement learning t...","url_abs":"https://arxiv.org/abs/2508.11925","url_pdf":"https://arxiv.org/pdf/2508.11925v3","authors":"[\"Zhimeng Guo\",\"Huaisheng Zhu\",\"Siyuan Xu\",\"Hangfan Zhang\",\"Teng Xiao\",\"Minhao Cheng\"]","published":"2025-08-16T06:11:29Z","proceeding":"cs.CR","tasks":"[\"cs.CR\",\"cs.CL\",\"cs.LG\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\"]","has_code":false,"code_links":[{"ID":610828,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2881602,"paper_url":"https://arxiv.org/abs/2508.11925","paper_title":"Optimizing Token Choice for Code Watermarking: An RL Approach","repo_url":"https://github.com/TimeLovercc/CodeTracer","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}