{"ID":2844090,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.07312","arxiv_id":"2511.07312","title":"Superhuman AI for Stratego Using Self-Play Reinforcement Learning and Test-Time Search","abstract":"Few classical games have been regarded as such significant benchmarks of artificial intelligence as to have justified training costs in the millions of dollars. Among these, Stratego -- a board wargame exemplifying the challenge of strategic decision making under massive amounts of hidden information -- stands apart as a case where such efforts failed to produce performance at the level of top humans. This work establishes a step change in both performance and cost for Stratego, showing that it is now possible not only to reach the level of top humans, but to achieve vastly superhuman level -- and that doing so requires not an industrial budget, but merely a few thousand dollars. We achieved this result by developing general approaches for self-play reinforcement learning and test-time search under imperfect information.","short_abstract":"Few classical games have been regarded as such significant benchmarks of artificial intelligence as to have justified training costs in the millions of dollars. Among these, Stratego -- a board wargame exemplifying the challenge of strategic decision making under massive amounts of hidden information -- stands apart as...","url_abs":"https://arxiv.org/abs/2511.07312","url_pdf":"https://arxiv.org/pdf/2511.07312v1","authors":"[\"Samuel Sokota\",\"Eugene Vinitsky\",\"Hengyuan Hu\",\"J. Zico Kolter\",\"Gabriele Farina\"]","published":"2025-11-10T17:13:41Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}