{"ID":2856327,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.11345","arxiv_id":"2510.11345","title":"Part II: ROLL Flash -- Accelerating RLVR and Agentic Training with Asynchrony","abstract":"Synchronous Reinforcement Learning (RL) post-training has emerged as a crucial step for enhancing Large Language Models (LLMs) with diverse capabilities. However, many systems designed to accelerate RL post-training still suffer from low resource utilization and limited scalability. We present ROLL Flash, a system that extends ROLL with native support for asynchronous RL post-training. ROLL Flash is built upon two core design principles: fine-grained parallelism and rollout-train decoupling. Guided by these principles, ROLL Flash provides flexible programming interfaces that enable a fully asynchronous training architecture and support efficient rollout mechanisms, including queue scheduling and environment-level asynchronous execution. Through comprehensive theoretical analysis and extensive experiments, we demonstrate that ROLL Flash significantly improves resource utilization and scalability over synchronous RL post-training. ROLL Flash achieves up to 2.24x speedup on RLVR tasks and 2.72x on agentic tasks, using the same GPU budget as synchronous baselines. Furthermore, we implement several popular off-policy algorithms and verify that asynchronous training can achieve performance on par with synchronous training.","short_abstract":"Synchronous Reinforcement Learning (RL) post-training has emerged as a crucial step for enhancing Large Language Models (LLMs) with diverse capabilities. However, many systems designed to accelerate RL post-training still suffer from low resource utilization and limited scalability. We present ROLL Flash, a system that...","url_abs":"https://arxiv.org/abs/2510.11345","url_pdf":"https://arxiv.org/pdf/2510.11345v1","authors":"[\"Han Lu\",\"Zichen Liu\",\"Shaopan Xiong\",\"Yancheng He\",\"Wei Gao\",\"Yanan Wu\",\"Weixun Wang\",\"Jiashun Liu\",\"Yang Li\",\"Haizhou Zhao\",\"Ju Huang\",\"Siran Yang\",\"Xiaoyang Li\",\"Yijia Luo\",\"Zihe Liu\",\"Ling Pan\",\"Junchi Yan\",\"Wei Wang\",\"Wenbo Su\",\"Jiamang Wang\",\"Lin Qu\",\"Bo Zheng\"]","published":"2025-10-13T12:41:27Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\"]","has_code":false}
