{"ID":2877410,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.19488","arxiv_id":"2508.19488","title":"PoolFlip: A Multi-Agent Reinforcement Learning Security Environment for Cyber Defense","abstract":"Cyber defense requires automating defensive decision-making under stealthy, deceptive, and continuously evolving adversarial strategies. The FlipIt game provides a foundational framework for modeling interactions between a defender and an advanced adversary that compromises a system without being immediately detected. In FlipIt, the attacker and defender compete to control a shared resource by performing a Flip action and paying a cost. However, the existing FlipIt frameworks rely on a small number of heuristics or specialized learning techniques, which can lead to brittleness and the inability to adapt to new attacks. To address these limitations, we introduce PoolFlip, a multi-agent gym environment that extends the FlipIt game to allow efficient learning for attackers and defenders. Furthermore, we propose Flip-PSRO, a multi-agent reinforcement learning (MARL) approach that leverages population-based training to train defender agents equipped to generalize against a range of unknown, potentially adaptive opponents. Our empirical results suggest that Flip-PSRO defenders are $2\\times$ more effective than baselines to generalize to a heuristic attack not exposed in training. In addition, our newly designed ownership-based utility functions ensure that Flip-PSRO defenders maintain a high level of control while optimizing performance.","short_abstract":"Cyber defense requires automating defensive decision-making under stealthy, deceptive, and continuously evolving adversarial strategies. The FlipIt game provides a foundational framework for modeling interactions between a defender and an advanced adversary that compromises a system without being immediately detected....","url_abs":"https://arxiv.org/abs/2508.19488","url_pdf":"https://arxiv.org/pdf/2508.19488v1","authors":"[\"Xavier Cadet\",\"Simona Boboila\",\"Sie Hendrata Dharmawan\",\"Alina Oprea\",\"Peter Chin\"]","published":"2025-08-27T00:18:49Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.CR\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
