{"ID":2870669,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.13585","arxiv_id":"2509.13585","title":"Zero-sum turn games using Q-learning: finite computation with security guarantees","abstract":"This paper addresses zero-sum ``turn'' games, in which only one player can make decisions at each state. We show that pure saddle-point state-feedback policies for turn games can be constructed from dynamic programming fixed-point equations for a single value function or Q-function. These fixed-points can be constructed using a suitable form of Q-learning. For discounted costs, convergence of this form of Q-learning can be established using classical techniques. For undiscounted costs, we provide a convergence result that applies to finite-time deterministic games, which we use to illustrate our results. For complex games, the Q-learning iteration must be terminated before exploring the full-state, which can lead to policies that cannot guarantee the security levels implied by the final Q-function. To mitigate this, we propose an ``opponent-informed'' exploration policy for selecting the Q-learning samples. This form of exploration can guarantee that the final Q-function provides security levels that hold, at least, against a given set of policies. A numerical demonstration for a multi-agent game, Atlatl, indicates the effectiveness of these methods.","short_abstract":"This paper addresses zero-sum ``turn'' games, in which only one player can make decisions at each state. We show that pure saddle-point state-feedback policies for turn games can be constructed from dynamic programming fixed-point equations for a single value function or Q-function. These fixed-points can be constructe...","url_abs":"https://arxiv.org/abs/2509.13585","url_pdf":"https://arxiv.org/pdf/2509.13585v1","authors":"[\"Sean Anderson\",\"Chris Darken\",\"João Hespanha\"]","published":"2025-09-16T22:59:52Z","proceeding":"eess.SY","tasks":"[\"eess.SY\",\"cs.GT\"]","methods":"[\"LoRA\"]","has_code":false}
