{"ID":2895204,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.09580","arxiv_id":"2507.09580","title":"AICrypto: Evaluating Cryptography Capabilities of Large Language Models","abstract":"We build \\textbf{AICrypto}, a comprehensive benchmark designed to evaluate the cryptography capabilities of large language models (LLMs). The benchmark comprises 135 multiple-choice questions, 150 capture-the-flag challenges, and 30 proof problems, covering a broad range of skills from knowledge memorization to vulnerability exploitation and formal reasoning. All tasks are carefully reviewed or constructed by cryptography experts to improve correctness and rigor. For each proof problem, we provide detailed scoring rubrics and reference solutions that enable automated grading, achieving high correlation with human expert evaluations. We introduce strong human expert performance baselines for comparison across all task types. Our evaluation of 17 leading LLMs reveals that state-of-the-art models match or even surpass human experts in memorizing cryptographic concepts, exploiting common vulnerabilities, and routine proofs. However, our analysis reveals that they still lack a deep understanding of abstract mathematical concepts and struggle with tasks that require multi-step reasoning and dynamic analysis. We hope this work could provide insights for future research on LLMs in cryptographic applications. Our code and dataset are available at https://github.com/wangyu-ovo/aicrypto-agent.","short_abstract":"We build \\textbf{AICrypto}, a comprehensive benchmark designed to evaluate the cryptography capabilities of large language models (LLMs). The benchmark comprises 135 multiple-choice questions, 150 capture-the-flag challenges, and 30 proof problems, covering a broad range of skills from knowledge memorization to vulnera...","url_abs":"https://arxiv.org/abs/2507.09580","url_pdf":"https://arxiv.org/pdf/2507.09580v6","authors":"[\"Yu Wang\",\"Yijian Liu\",\"Liheng Ji\",\"Han Luo\",\"Wenjie Li\",\"Xiaofei Zhou\",\"Chiyun Feng\",\"Puji Wang\",\"Yuhan Cao\",\"Geyuan Zhang\",\"Xiaojian Li\",\"Rongwu Xu\",\"Yilei Chen\",\"Tianxing He\"]","published":"2025-07-13T11:11:01Z","proceeding":"cs.CR","tasks":"[\"cs.CR\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":612167,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2895204,"paper_url":"https://arxiv.org/abs/2507.09580","paper_title":"AICrypto: Evaluating Cryptography Capabilities of Large Language Models","repo_url":"https://github.com/wangyu-ovo/aicrypto-agent","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
