{"ID":2886468,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.03613","arxiv_id":"2508.03613","title":"Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction","abstract":"We introduce Goedel-Prover-V2, a series of open-source language models that set a new state-of-the-art in automated theorem proving. Built on the standard expert iteration and reinforcement learning pipeline, our approach incorporates three key innovations: (1) Scaffolded data synthesis: We generate synthetic tasks of increasing difficulty to train the model to master increasingly complex theorems; (2) Verifier-guided self-correction: We enable the model to iteratively revise its proofs by leveraging feedback from the Lean compiler; (3) Model averaging: We merge model checkpoints to mitigate the decrease in model output diversity in later stages of training. Our small model, Goedel-Prover-V2-8B, reaches 84.6% pass@32 on MiniF2F and outperforms DeepSeek-Prover-V2-671B under the same metric, despite being 80X smaller. Our flagship model, Goedel-Prover-V2-32B, achieves 88.1% on MiniF2F at pass@32 in standard mode and 90.4% in self-correction mode, outperforming prior SOTA by a large margin. Additionally, our flagship model solves 86 problems on PutnamBench at pass@184, securing the first place among open-source models on the leaderboard, surpassing DeepSeek-Prover-V2-671B's record of solving 47 problems by pass@1024 with a significantly smaller model size and compute budget. At the time of its release (July-August 2025), Goedel-Prover-V2 achieves the strongest overall performance among all open-source theorem provers. It also ranks among the top-performing models--including closed-source systems with publicly reported performance--under a constrained test-time compute budget. Our models, code, and data are released at https://github.com/Goedel-LM/Goedel-Prover-V2.","short_abstract":"We introduce Goedel-Prover-V2, a series of open-source language models that set a new state-of-the-art in automated theorem proving. Built on the standard expert iteration and reinforcement learning pipeline, our approach incorporates three key innovations: (1) Scaffolded data synthesis: We generate synthetic tasks of...","url_abs":"https://arxiv.org/abs/2508.03613","url_pdf":"https://arxiv.org/pdf/2508.03613v1","authors":"[\"Yong Lin\",\"Shange Tang\",\"Bohan Lyu\",\"Ziran Yang\",\"Jui-Hui Chung\",\"Haoyu Zhao\",\"Lai Jiang\",\"Yihan Geng\",\"Jiawei Ge\",\"Jingruo Sun\",\"Jiayun Wu\",\"Jiri Gesi\",\"Ximing Lu\",\"David Acuna\",\"Kaiyu Yang\",\"Hongzhou Lin\",\"Yejin Choi\",\"Danqi Chen\",\"Sanjeev Arora\",\"Chi Jin\"]","published":"2025-08-05T16:28:22Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Reinforcement Learning\",\"Language Model\"]","has_code":false,"code_links":[{"ID":611311,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2886468,"paper_url":"https://arxiv.org/abs/2508.03613","paper_title":"Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction","repo_url":"https://github.com/Goedel-LM/Goedel-Prover-V2","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
