{"ID":2872885,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.07414","arxiv_id":"2509.07414","title":"Language Self-Play For Data-Free Training","abstract":"Large language models (LLMs) have advanced rapidly in recent years, driven by scale, abundant high-quality training data, and reinforcement learning. Yet this progress faces a fundamental bottleneck: the need for ever more data from which models can continue to learn. In this work, we propose a reinforcement learning approach that removes this dependency by enabling models to improve without additional data. Our method leverages a game-theoretic framework of self-play, where a model's capabilities are cast as performance in a competitive game and stronger policies emerge by having the model play against itself-a process we call Language Self-Play (LSP). Experiments with Llama-3.2-3B-Instruct on instruction-following, mathematics, and coding benchmarks show that pretrained models can be effectively improved with self-play alone.","short_abstract":"Large language models (LLMs) have advanced rapidly in recent years, driven by scale, abundant high-quality training data, and reinforcement learning. Yet this progress faces a fundamental bottleneck: the need for ever more data from which models can continue to learn. In this work, we propose a reinforcement learning a...","url_abs":"https://arxiv.org/abs/2509.07414","url_pdf":"https://arxiv.org/pdf/2509.07414v3","authors":"[\"Jakub Grudzien Kuba\",\"Mengting Gu\",\"Qi Ma\",\"Yuandong Tian\",\"Vijai Mohan\",\"Jason Chen\"]","published":"2025-09-09T05:51:34Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CL\",\"cs.GT\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\"]","has_code":false}
