{"ID":2830042,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.10187","arxiv_id":"2512.10187","title":"MINIF2F-DAFNY: LLM-Guided Mathematical Theorem Proving via Auto-Active Verification","abstract":"LLMs excel at reasoning, but validating their steps remains challenging. Formal verification offers a solution through mechanically checkable proofs. Interactive theorem provers (ITPs) dominate mathematical reasoning but require detailed low-level proof steps, while auto-active verifiers offer automation but focus on software verification. Recent work has begun bridging this divide by evaluating LLMs for software verification in ITPs, but the complementary direction--LLMs for mathematical theorem proving in auto-active verifiers--remains unexplored. We present MINIF2F-DAFNY, the first translation of the widely-used mathematical benchmark miniF2F to an auto-active verifier: Dafny. We find that Dafny's automation alone solves 39-44% of problems with empty proofs, whereas many require substantial proof guidance in ITPs. For remaining problems, we evaluate 7 off-the-shelf LLMs, achieving 55.7% success with the best model (Claude Sonnet 4.5) using modest resources. These results demonstrate effective division of labor: LLMs provide high-level guidance while automation handles low-level details. Our benchmark can be found on GitHub at http://github.com/dafny-lang/miniF2F .","short_abstract":"LLMs excel at reasoning, but validating their steps remains challenging. Formal verification offers a solution through mechanically checkable proofs. Interactive theorem provers (ITPs) dominate mathematical reasoning but require detailed low-level proof steps, while auto-active verifiers offer automation but focus on s...","url_abs":"https://arxiv.org/abs/2512.10187","url_pdf":"https://arxiv.org/pdf/2512.10187v2","authors":"[\"Mantas Baksys\",\"Stefan Zetzsche\",\"Olivier Bouissou\",\"Sean B. Holden\"]","published":"2025-12-11T00:52:19Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Large Language Model\"]","has_code":false,"code_links":[{"ID":605991,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2830042,"paper_url":"https://arxiv.org/abs/2512.10187","paper_title":"MINIF2F-DAFNY: LLM-Guided Mathematical Theorem Proving via Auto-Active Verification","repo_url":"https://github.com/dafny-lang/miniF2F","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
