{"ID":2899332,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.01951","arxiv_id":"2507.01951","title":"Test-Time Scaling with Reflective Generative Model","abstract":"We introduce our first reflective generative model MetaStone-S1, which obtains OpenAI o3-mini's performance via the new Reflective Generative Form. The new form focuses on high-quality reasoning trajectory selection and contains two novelties: 1) A unified interface for policy and process reward model: we share the backbone network and use task-specific heads for reasoning trajectory predicting and scoring respectively, introducing only 53M extra parameters for trajectory scoring. 2) Eliminating the reliance on process-level annotation: we provide a self-supervised process reward model, which can directly learn the high-quality reasoning trajectory selection from the outcome reward. Equipped with the reflective generative form, MetaStone-S1 is naturally suitable for test-time scaling, and we provide three reasoning effort modes (low, medium, and high) based on the controllable thinking length. Experiments demonstrate that our MetaStone-S1 achieves comparable performance to OpenAI o3-mini's series with only 32B parameter size. To support the research community, we have open-sourced MetaStone-S1 at https://github.com/MetaStone-AI/MetaStone-S1.","short_abstract":"We introduce our first reflective generative model MetaStone-S1, which obtains OpenAI o3-mini's performance via the new Reflective Generative Form. The new form focuses on high-quality reasoning trajectory selection and contains two novelties: 1) A unified interface for policy and process reward model: we share the bac...","url_abs":"https://arxiv.org/abs/2507.01951","url_pdf":"https://arxiv.org/pdf/2507.01951v2","authors":"[\"Zixiao Wang\",\"Yuxin Wang\",\"Xiaorui Wang\",\"Mengting Xing\",\"Jie Gao\",\"Jianjun Xu\",\"Guangcan Liu\",\"Chenhui Jin\",\"Zhuo Wang\",\"Shengzhuo Zhang\",\"Hongtao Xie\"]","published":"2025-07-02T17:58:01Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.CL\"]","methods":"[]","has_code":false,"code_links":[{"ID":612473,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2899332,"paper_url":"https://arxiv.org/abs/2507.01951","paper_title":"Test-Time Scaling with Reflective Generative Model","repo_url":"https://github.com/MetaStone-AI/MetaStone-S1","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
