{"ID":2839712,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.15830","arxiv_id":"2511.15830","title":"Mini Amusement Parks (MAPs): A Testbed for Modelling Business Decisions","abstract":"Despite rapid progress in artificial intelligence, current systems struggle with the interconnected challenges that define real-world decision making. Practical domains, such as business management, require optimizing an open-ended and multi-faceted objective, actively learning environment dynamics from sparse experience, planning over long horizons in stochastic settings, and reasoning over spatial information. Yet existing human--AI benchmarks isolate subsets of these capabilities, limiting our ability to assess holistic decision-making competence. We introduce Mini Amusement Parks (MAPs), an amusement-park simulator designed to evaluate an agent's ability to model its environment, anticipate long-term consequences under uncertainty, and strategically operate a complex business. We provide human baselines and a comprehensive evaluation of state-of-the-art LLM agents, finding that humans outperform these systems by 6.5x on easy mode and 9.8x on medium mode. Our analysis reveals persistent weaknesses in long-horizon optimization, sample-efficient learning, spatial reasoning, and world modelling. By unifying these challenges within a single environment, MAPs offers a new foundation for benchmarking agents capable of adaptable decision making. Code: https://github.com/Skyfall-Research/MAPs","short_abstract":"Despite rapid progress in artificial intelligence, current systems struggle with the interconnected challenges that define real-world decision making. Practical domains, such as business management, require optimizing an open-ended and multi-faceted objective, actively learning environment dynamics from sparse experien...","url_abs":"https://arxiv.org/abs/2511.15830","url_pdf":"https://arxiv.org/pdf/2511.15830v1","authors":"[\"Stéphane Aroca-Ouellette\",\"Ian Berlot-Attwell\",\"Panagiotis Lymperopoulos\",\"Abhiramon Rajasekharan\",\"Tongqi Zhu\",\"Herin Kang\",\"Kaheer Suleman\",\"Sam Pasupalak\"]","published":"2025-11-19T19:38:05Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Large Language Model\"]","has_code":false,"code_links":[{"ID":606899,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2839712,"paper_url":"https://arxiv.org/abs/2511.15830","paper_title":"Mini Amusement Parks (MAPs): A Testbed for Modelling Business Decisions","repo_url":"https://github.com/Skyfall-Research/MAPs","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
