{"ID":2837751,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.19584","arxiv_id":"2511.19584","title":"Learning Massively Multitask World Models for Continuous Control","abstract":"General-purpose control demands agents that act across many tasks and embodiments, yet research on reinforcement learning (RL) for continuous control remains dominated by single-task or offline regimes, reinforcing a view that online RL does not scale. Inspired by the foundation model recipe (large-scale pretraining followed by light RL) we ask whether a single agent can be trained on hundreds of tasks with online interaction. To accelerate research in this direction, we introduce a new benchmark with 200 diverse tasks spanning many domains and embodiments, each with language instructions, demonstrations, and optionally image observations. We then present \\emph{Newt}, a language-conditioned multitask world model that is first pretrained on demonstrations to acquire task-aware representations and action priors, and then jointly optimized with online interaction across all tasks. Experiments show that Newt yields better multitask performance and data-efficiency than a set of strong baselines, exhibits strong open-loop control, and enables rapid adaptation to unseen tasks. We release our environments, demonstrations, code for training and evaluation, as well as 200+ checkpoints.","short_abstract":"General-purpose control demands agents that act across many tasks and embodiments, yet research on reinforcement learning (RL) for continuous control remains dominated by single-task or offline regimes, reinforcing a view that online RL does not scale. Inspired by the foundation model recipe (large-scale pretraining fo...","url_abs":"https://arxiv.org/abs/2511.19584","url_pdf":"https://arxiv.org/pdf/2511.19584v2","authors":"[\"Nicklas Hansen\",\"Hao Su\",\"Xiaolong Wang\"]","published":"2025-11-24T18:57:19Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.CV\",\"cs.RO\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
