{"ID":2824806,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.22560","arxiv_id":"2512.22560","title":"RollArt: Scaling Agentic RL Training via Disaggregated Infrastructure","abstract":"Agentic Reinforcement Learning (RL) enables Large Language Models (LLMs) to perform autonomous decision-making and long-term planning. Unlike standard LLM post-training, agentic RL workloads are highly heterogeneous, combining compute-intensive prefill phases, bandwidth-bound decoding, and stateful, CPU-heavy environment simulations. We argue that efficient agentic RL training requires disaggregated infrastructure to leverage specialized, best-fit hardware. However, naive disaggregation introduces substantial synchronization overhead and resource underutilization due to the complex dependencies between stages. We present RollArc, a distributed system designed to maximize throughput for multi-task agentic RL on disaggregated infrastructure. RollArc is built on three core principles: (1) hardware-affinity workload mapping, which routes compute-bound and bandwidth-bound tasks to bestfit GPU devices, (2) fine-grained asynchrony, which manages execution at the trajectory level to mitigate resource bubbles, and (3) statefulness-aware computation, which offloads stateless components (e.g., reward models) to serverless infrastructure for elastic scaling. Our results demonstrate that RollArc effectively improves training throughput and achieves 1.35-2.05\\(\\times\\) end-to-end training time reduction compared to monolithic and synchronous baselines. We also evaluate RollArc by training a hundreds-of-billions-parameter MoE model for Qoder product on an Alibaba cluster with more than 3,000 GPUs, further demonstrating RollArc scalability and robustness. The code is available at https://github.com/alibaba/ROLL.","short_abstract":"Agentic Reinforcement Learning (RL) enables Large Language Models (LLMs) to perform autonomous decision-making and long-term planning. Unlike standard LLM post-training, agentic RL workloads are highly heterogeneous, combining compute-intensive prefill phases, bandwidth-bound decoding, and stateful, CPU-heavy environme...","url_abs":"https://arxiv.org/abs/2512.22560","url_pdf":"https://arxiv.org/pdf/2512.22560v1","authors":"[\"Wei Gao\",\"Yuheng Zhao\",\"Tianyuan Wu\",\"Shaopan Xiong\",\"Weixun Wang\",\"Dakai An\",\"Lunxi Cao\",\"Dilxat Muhtar\",\"Zichen Liu\",\"Haizhou Zhao\",\"Ju Huang\",\"Siran Yang\",\"Yongbin Li\",\"Wenbo Su\",\"Jiamang Wang\",\"Lin Qu\",\"Bo Zheng\",\"Wei Wang\"]","published":"2025-12-27T11:14:23Z","proceeding":"cs.DC","tasks":"[\"cs.DC\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":605614,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2824806,"paper_url":"https://arxiv.org/abs/2512.22560","paper_title":"RollArt: Scaling Agentic RL Training via Disaggregated Infrastructure","repo_url":"https://github.com/alibaba/ROLL","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
