{"ID":2826792,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.18436","arxiv_id":"2512.18436","title":"VeruSAGE: A Study of Agent-Based Verification for Rust Systems","abstract":"Large language models (LLMs) have shown impressive capability to understand and develop code. However, their capability to rigorously reason about and prove code correctness remains in question. This paper offers a comprehensive study of LLMs' capability to develop correctness proofs for system software written in Rust. We curate a new system-verification benchmark suite, VeruSAGE-Bench, which consists of 849 proof tasks extracted from eight open-source Verus-verified Rust systems. Furthermore, we design different agent systems to match the strengths and weaknesses of different LLMs (o4-mini, GPT-5, Sonnet 4, and Sonnet 4.5). Our study shows that different tools and agent settings are needed to stimulate the system-verification capability of different types of LLMs. The best LLM-agent combination in our study completes over 80% of system-verification tasks in VeruSAGE-Bench. It also completes over 90% of a set of system proof tasks not part of VeruSAGE-Bench because they had not yet been finished by human experts. This result shows the great potential for LLM-assisted development of verified system software.","short_abstract":"Large language models (LLMs) have shown impressive capability to understand and develop code. However, their capability to rigorously reason about and prove code correctness remains in question. This paper offers a comprehensive study of LLMs' capability to develop correctness proofs for system software written in Rust...","url_abs":"https://arxiv.org/abs/2512.18436","url_pdf":"https://arxiv.org/pdf/2512.18436v2","authors":"[\"Chenyuan Yang\",\"Natalie Neamtu\",\"Chris Hawblitzel\",\"Jacob R. Lorch\",\"Shan Lu\"]","published":"2025-12-20T17:22:52Z","proceeding":"cs.OS","tasks":"[\"cs.OS\",\"cs.AI\",\"cs.FL\",\"cs.SE\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
