{"ID":2873138,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.07945","arxiv_id":"2509.07945","title":"One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning","abstract":"In heterogeneous multi-task decision-making, tasks not only exhibit diverse observation and action spaces but also vary substantially in their underlying complexities. While conventional multi-task world models like UniZero excel in single-task settings, we find that when handling a broad and diverse suite of tasks, gradient conflicts and the loss of model plasticity often constrain their sample efficiency. In this work, we address these challenges from two complementary perspectives: the single learning iteration and the overall learning process. First, to mitigate the gradient conflicts, we systematically investigate key architectural designs for extending UniZero. Our investigation identifies a Mixture-of-Experts (MoE) architecture as the most effective approach. We demonstrate, both theoretically and empirically, that this architecture alleviates gradient conflicts by routing task-specific representations to specialized sub-networks. This finding leads to our proposed model, \\textit{ScaleZero}. Second, to dynamically allocate model capacity throughout the learning process, we introduce an online Dynamic Parameter Scaling (DPS) strategy. This strategy progressively integrates LoRA adapters in response to task-specific progress, enabling adaptive knowledge retention and parameter expansion. Evaluations on a diverse set of standard benchmarks (Atari, DMC, Jericho) demonstrate that ScaleZero, utilizing solely online reinforcement learning with one model, performs on par with specialized single-task agents. With the DPS strategy, it remains competitive while using just 71.5% of the environment interactions. These findings underscore the potential of ScaleZero for effective multi-task planning. Our code is available at \\textcolor{magenta}{https://github.com/opendilab/LightZero}.","short_abstract":"In heterogeneous multi-task decision-making, tasks not only exhibit diverse observation and action spaces but also vary substantially in their underlying complexities. While conventional multi-task world models like UniZero excel in single-task settings, we find that when handling a broad and diverse suite of tasks, gr...","url_abs":"https://arxiv.org/abs/2509.07945","url_pdf":"https://arxiv.org/pdf/2509.07945v3","authors":"[\"Yuan Pu\",\"Yazhe Niu\",\"Jia Tang\",\"Junyu Xiong\",\"Shuai Hu\",\"Hongsheng Li\"]","published":"2025-09-09T17:27:53Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Reinforcement Learning\",\"LoRA\"]","has_code":false,"code_links":[{"ID":610024,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2873138,"paper_url":"https://arxiv.org/abs/2509.07945","paper_title":"One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning","repo_url":"https://github.com/opendilab/LightZero","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
