{"ID":2832536,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.05667","arxiv_id":"2512.05667","title":"On Dynamic Programming Theory for Leader-Follower Stochastic Games","abstract":"Leader-follower general-sum stochastic games (LF-GSSGs) model sequential decision-making under asymmetric commitment, where a leader commits to a policy and a follower best responds, yielding a strong Stackelberg equilibrium (SSE) with leader-favourable tie-breaking. This paper introduces a dynamic programming (DP) framework that applies Bellman recursion over credible sets-state abstractions formally representing all rational follower best responses under partial leader commitments-to compute SSEs. We first prove that any LF-GSSG admits a lossless reduction to a Markov decision process (MDP) over credible sets. We further establish that synthesising an optimal memoryless deterministic leader policy is NP-hard, motivating the development of ε-optimal DP algorithms with provable guarantees on leader exploitability. Experiments on standard mixed-motive benchmarks-including security games, resource allocation, and adversarial planning-demonstrate empirical gains in leader value and runtime scalability over state-of-the-art methods.","short_abstract":"Leader-follower general-sum stochastic games (LF-GSSGs) model sequential decision-making under asymmetric commitment, where a leader commits to a policy and a follower best responds, yielding a strong Stackelberg equilibrium (SSE) with leader-favourable tie-breaking. This paper introduces a dynamic programming (DP) fra...","url_abs":"https://arxiv.org/abs/2512.05667","url_pdf":"https://arxiv.org/pdf/2512.05667v1","authors":"[\"Jilles Steeve Dibangoye\",\"Thibaut Le Marre\",\"Ocan Sankur\",\"François Schwarzentruber\"]","published":"2025-12-05T12:23:56Z","proceeding":"cs.GT","tasks":"[\"cs.GT\",\"cs.AI\"]","methods":"[\"Large Language Model\"]","has_code":false}