{"ID":2837417,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.18871","arxiv_id":"2511.18871","title":"Periodic Asynchrony: An On-Policy Approach for Accelerating LLM Reinforcement Learning","abstract":"Since the introduction of the GRPO algorithm, reinforcement learning (RL) has attracted increasing attention for LLM post-training, yet training efficiency remains a critical challenge. In mainstream RL frameworks, inference and training are co-located on the same devices, and their synchronous execution prevents concurrent inference and training. In this work, we revisit the strategy of separating inference and training deployment, and propose a periodically asynchronous framework that transforms synchronous RL training into an asynchronous producer-consumer pipeline. By synchronising model weights at the beginning of each training iteration and generating all rollouts from the same policy, the proposed framework remains inherently on-policy -- without any modification to standard RL algorithms -- thereby avoiding the off-policy bias introduced by existing asynchronous approaches. We further introduce a unified tri-model architecture and a shared-prompt attention mechanism to support efficient asynchronous execution and reduce redundant computation. Experiments on NPU platforms show approximately 2x throughput improvement from asynchronous execution, with additional gains from system-level optimisations, substantially outperforming mainstream RL frameworks in end-to-end throughput, with speedups of up to 3x on GPU platforms, further confirming cross-architecture generalisability while maintaining comparable accuracy. The proposed framework thus offers a practical, algorithm-agnostic solution for scalable RL post-training without sacrificing on-policy correctness. Code available at: https://github.com/janelu9/EasyLLM","short_abstract":"Since the introduction of the GRPO algorithm, reinforcement learning (RL) has attracted increasing attention for LLM post-training, yet training efficiency remains a critical challenge. In mainstream RL frameworks, inference and training are co-located on the same devices, and their synchronous execution prevents concu...","url_abs":"https://arxiv.org/abs/2511.18871","url_pdf":"https://arxiv.org/pdf/2511.18871v7","authors":"[\"Jian Lu\"]","published":"2025-11-24T08:22:50Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\"]","has_code":false,"code_links":[{"ID":606690,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2837417,"paper_url":"https://arxiv.org/abs/2511.18871","paper_title":"Periodic Asynchrony: An On-Policy Approach for Accelerating LLM Reinforcement Learning","repo_url":"https://github.com/janelu9/EasyLLM","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}