{"ID":2840117,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.14617","arxiv_id":"2511.14617","title":"Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning","abstract":"Reinforcement Learning (RL) has emerged as a critical technique for advancing modern Large Language Models (LLMs), yet existing synchronous RL systems face severe performance bottlenecks. The rollout phase, which dominates end-to-end iteration time, suffers from substantial long-tail latency and poor resource utilization due to inherent workload imbalance. We present Seer, a novel context learning RL system that addresses these challenges through a key observation: requests sharing the same prompt exhibit strong similarities in output lengths and response patterns. Leveraging this insight, Seer introduces three coordinated techniques: (1) divided rollout for dynamic load balancing, (2) context-aware scheduling to mitigate long-tail request delays, and (3) adaptive grouped speculative decoding to accelerate generation. These mechanisms work in concert to markedly reduce long-tail latency and improve resource efficiency during rollout. Evaluations on production-grade RL workloads demonstrate that Seer achieves up to 2.04$\\times$ end-to-end rollout throughput improvement compared to the state-of-the-art synchronous RL systems, while notably reducing long-tail latency by 72-94%.","short_abstract":"Reinforcement Learning (RL) has emerged as a critical technique for advancing modern Large Language Models (LLMs), yet existing synchronous RL systems face severe performance bottlenecks. The rollout phase, which dominates end-to-end iteration time, suffers from substantial long-tail latency and poor resource utilizati...","url_abs":"https://arxiv.org/abs/2511.14617","url_pdf":"https://arxiv.org/pdf/2511.14617v3","authors":"[\"Ruoyu Qin\",\"Weiran He\",\"Weixiao Huang\",\"Yangkun Zhang\",\"Yikai Zhao\",\"Bo Pang\",\"Xinran Xu\",\"Yingdi Shan\",\"Yongwei Wu\",\"Mingxing Zhang\"]","published":"2025-11-18T16:12:21Z","proceeding":"cs.DC","tasks":"[\"cs.DC\",\"cs.LG\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\"]","has_code":false}
