{"ID":2839139,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.16483","arxiv_id":"2511.16483","title":"Large Language Model-Based Reward Design for Deep Reinforcement Learning-Driven Autonomous Cyber Defense","abstract":"Designing rewards for autonomous cyber attack and defense learning agents in a complex, dynamic environment is a challenging task for subject matter experts. We propose a large language model (LLM)-based reward design approach to generate autonomous cyber defense policies in a deep reinforcement learning (DRL)-driven experimental simulation environment. Multiple attack and defense agent personas were crafted, reflecting heterogeneity in agent actions, to generate LLM-guided reward designs where the LLM was first provided with contextual cyber simulation environment information. These reward structures were then utilized within a DRL-driven attack-defense simulation environment to learn an ensemble of cyber defense policies. Our results suggest that LLM-guided reward designs can lead to effective defense strategies against diverse adversarial behaviors.","short_abstract":"Designing rewards for autonomous cyber attack and defense learning agents in a complex, dynamic environment is a challenging task for subject matter experts. We propose a large language model (LLM)-based reward design approach to generate autonomous cyber defense policies in a deep reinforcement learning (DRL)-driven e...","url_abs":"https://arxiv.org/abs/2511.16483","url_pdf":"https://arxiv.org/pdf/2511.16483v1","authors":"[\"Sayak Mukherjee\",\"Samrat Chatterjee\",\"Emilie Purvine\",\"Ted Fujimoto\",\"Tegan Emerson\"]","published":"2025-11-20T15:54:08Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.MA\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\"]","has_code":false}