{"ID":2849258,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.24684","arxiv_id":"2510.24684","title":"SPICE: Self-Play In Corpus Environments Improves Reasoning","abstract":"Self-improving systems require environmental interaction for continuous adaptation. We introduce SPICE (Self-Play In Corpus Environments), a reinforcement learning framework where a single model acts in two roles: a Challenger that mines documents from a large corpus to generate diverse reasoning tasks, and a Reasoner that solves them. Through adversarial dynamics, the Challenger creates an automatic curriculum at the frontier of the Reasoner's capability, while corpus grounding provides the rich, near-inexhaustible external signal necessary for sustained improvement. Unlike existing ungrounded self-play methods that offer more limited benefits, SPICE achieves consistent gains across mathematical (+8.9%) and general reasoning (+9.8%) benchmarks on multiple model families. Our analysis reveals how document grounding is a key ingredient in SPICE to continuously generate its own increasingly challenging goals and achieve them, enabling sustained self-improvement.","short_abstract":"Self-improving systems require environmental interaction for continuous adaptation. We introduce SPICE (Self-Play In Corpus Environments), a reinforcement learning framework where a single model acts in two roles: a Challenger that mines documents from a large corpus to generate diverse reasoning tasks, and a Reasoner...","url_abs":"https://arxiv.org/abs/2510.24684","url_pdf":"https://arxiv.org/pdf/2510.24684v1","authors":"[\"Bo Liu\",\"Chuanyang Jin\",\"Seungone Kim\",\"Weizhe Yuan\",\"Wenting Zhao\",\"Ilia Kulikov\",\"Xian Li\",\"Sainbayar Sukhbaatar\",\"Jack Lanchantin\",\"Jason Weston\"]","published":"2025-10-28T17:46:16Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}