{"ID":2862177,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.01051","arxiv_id":"2510.01051","title":"GEM: A Gym for Agentic LLMs","abstract":"The training paradigm for large language models (LLMs) is moving from static datasets to experience-based learning, where agents acquire skills via interacting with complex environments. To facilitate this transition we introduce GEM (General Experience Maker), an open-source environment simulator designed for the age of LLMs. Analogous to OpenAI-Gym for traditional reinforcement learning (RL), GEM provides a standardized framework for the environment-agent interface, including asynchronous vectorized execution for high throughput, and flexible wrappers for easy extensibility. GEM also features a diverse suite of environments, robust integrated tools, and single-file example scripts demonstrating using GEM with five popular RL training frameworks. Along with this, we also provide a set of baselines across 24 environments using REINFORCE with Return Batch Normalization (ReBN), which -- unlike GRPO -- is compatible with the full RL setting of dense per-turn rewards and offers better credit assignment. We further conduct apple-to-apple benchmarking of PPO, GRPO and REINFORCE in both single- and multi-turn settings using GEM to shed light on the algorithmic designs. Lastly, GEM also functions as a convenient evaluation toolkit besides a training environment. We hope this framework can help accelerate future agentic LLM research.","short_abstract":"The training paradigm for large language models (LLMs) is moving from static datasets to experience-based learning, where agents acquire skills via interacting with complex environments. To facilitate this transition we introduce GEM (General Experience Maker), an open-source environment simulator designed for the age...","url_abs":"https://arxiv.org/abs/2510.01051","url_pdf":"https://arxiv.org/pdf/2510.01051v2","authors":"[\"Zichen Liu\",\"Anya Sims\",\"Keyu Duan\",\"Changyu Chen\",\"Simon Yu\",\"Xiangxin Zhou\",\"Haotian Xu\",\"Shaopan Xiong\",\"Bo Liu\",\"Chenmien Tan\",\"Chuen Yang Beh\",\"Weixun Wang\",\"Hao Zhu\",\"Weiyan Shi\",\"Diyi Yang\",\"Michael Shieh\",\"Yee Whye Teh\",\"Wee Sun Lee\",\"Min Lin\"]","published":"2025-10-01T15:55:57Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.CL\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\"]","has_code":false}
