{"ID":2868179,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.17116","arxiv_id":"2509.17116","title":"MCTS-EP: Empowering Embodied Planning with Online Preference Optimization","abstract":"This paper introduces MCTS-EP, an online learning framework that combines large language models (LLM) with Monte Carlo Tree Search (MCTS) for training embodied agents. MCTS-EP integrates three key components: MCTS-guided exploration for preference data collection, efficient multi-modal reasoning mechanism, and iterative training pipeline based on preference optimization. We theoretically prove that MCTS-EP achieves better performance bounds than conventional on-policy algorithms when the loss function is strongly convex, and demonstrate that it can be formulated as a search-enhanced variant of GAIL. MCTS-EP achieves state-of-the-art performace across serval benchmarks. In ALFWorld, it achieves 92% and 87% success rates for textual and visual tasks. In WebShop, it reaches an average reward of 0.81. MTCS-EP also reduces average interaction steps from from 18.7/19.5 to 10.2/9.9 steps in visual ALFWorld.Code available at: https://github.com/xuhang-2/Embodied-Agent-Planning","short_abstract":"This paper introduces MCTS-EP, an online learning framework that combines large language models (LLM) with Monte Carlo Tree Search (MCTS) for training embodied agents. MCTS-EP integrates three key components: MCTS-guided exploration for preference data collection, efficient multi-modal reasoning mechanism, and iterativ...","url_abs":"https://arxiv.org/abs/2509.17116","url_pdf":"https://arxiv.org/pdf/2509.17116v2","authors":"[\"Hang Xu\",\"Zang Yu\",\"Yehui Tang\",\"Pengbo Hu\",\"Yuhao Tang\",\"Hao Dong\"]","published":"2025-09-21T15:17:44Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\",\"LoRA\"]","has_code":false,"code_links":[{"ID":609558,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2868179,"paper_url":"https://arxiv.org/abs/2509.17116","paper_title":"MCTS-EP: Empowering Embodied Planning with Online Preference Optimization","repo_url":"https://github.com/xuhang-2/Embodied-Agent-Planning","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
