{"ID":2874097,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.04791","arxiv_id":"2509.04791","title":"What-If Analysis of Large Language Models: Explore the Game World Using Proactive Thinking","abstract":"LLMs struggle with decision-making in high-stakes environments like MOBA games, primarily due to a lack of proactive reasoning and limited understanding of complex game dynamics. To address this, we propose What-if Analysis LLM (WiA-LLM), a framework that trains an LLM as an explicit, language-based world model. Instead of representing the environment in latent vectors, WiA-LLM uses natural language to simulate how the game state evolves over time in response to candidate actions, and provides textual justifications for these predicted outcomes. WiA-LLM is trained in two stages: supervised fine-tuning on human-like reasoning traces, followed by reinforcement learning with outcome-based rewards based on the alignment between predicted and actual future states. In the Honor of Kings (HoK) environment, WiA-LLM attains 74.2\\% accuracy (27\\%$\\uparrow$ vs. base model) in forecasting game-state changes. In addition, WiA-LLM demonstrate strategic behavior more closely aligned with expert players than purely reactive LLMs, indicating enhanced foresight and expert-like decision-making.","short_abstract":"LLMs struggle with decision-making in high-stakes environments like MOBA games, primarily due to a lack of proactive reasoning and limited understanding of complex game dynamics. To address this, we propose What-if Analysis LLM (WiA-LLM), a framework that trains an LLM as an explicit, language-based world model. Instea...","url_abs":"https://arxiv.org/abs/2509.04791","url_pdf":"https://arxiv.org/pdf/2509.04791v3","authors":"[\"Yuan Sui\",\"Yanming Zhang\",\"Yi Liao\",\"Yu Gu\",\"Guohua Tang\",\"Zhongqian Sun\",\"Wei Yang\",\"Bryan Hooi\"]","published":"2025-09-05T04:05:27Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\"]","has_code":false}
