{"ID":2880885,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.14288","arxiv_id":"2508.14288","title":"Measuring LLM Code Generation Stability via Structural Entropy","abstract":"Assessing the stability of code generation from large language models (LLMs) is essential for judging their reliability in real-world development. We extend prior \"structural-entropy concepts\" to the program domain by pairing entropy with abstract syntax tree (AST) analysis. For any fixed prompt, we collect the multiset of depth-bounded subtrees of AST in each generated program and treat their relative frequencies as a probability distribution. We then measure stability in two complementary ways: (i) Jensen-Shannon divergence, a symmetric, bounded indicator of structural overlap, and (ii) a Structural Cross-Entropy ratio that highlights missing high-probability patterns. Both metrics admit structural-only and token-aware variants, enabling separate views on control-flow shape and identifier-level variability. Unlike pass@k, BLEU, or CodeBLEU, our metrics are reference-free, language-agnostic, and execution-independent. We benchmark several leading LLMs on standard code generation tasks, demonstrating that AST-driven structural entropy reveals nuances in model consistency and robustness. The method runs in O(n,d) time with no external tests, providing a lightweight addition to the code-generation evaluation toolkit.","short_abstract":"Assessing the stability of code generation from large language models (LLMs) is essential for judging their reliability in real-world development. We extend prior \"structural-entropy concepts\" to the program domain by pairing entropy with abstract syntax tree (AST) analysis. For any fixed prompt, we collect the multise...","url_abs":"https://arxiv.org/abs/2508.14288","url_pdf":"https://arxiv.org/pdf/2508.14288v1","authors":"[\"Yewei Song\",\"Tiezhu Sun\",\"Xunzhu Tang\",\"Prateek Rajput\",\"Tegawende F. Bissyande\",\"Jacques Klein\"]","published":"2025-08-19T22:07:12Z","proceeding":"cs.SE","tasks":"[\"cs.SE\",\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
