{"ID":2869050,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.20374","arxiv_id":"2509.20374","title":"CFDLLMBench: A Benchmark Suite for Evaluating Large Language Models in Computational Fluid Dynamics","abstract":"Large Language Models (LLMs) have demonstrated strong performance across general NLP tasks, but their utility in automating numerical experiments of complex physical system -- a critical and labor-intensive component -- remains underexplored. As the major workhorse of computational science over the past decades, Computational Fluid Dynamics (CFD) offers a uniquely challenging testbed for evaluating the scientific capabilities of LLMs. We introduce CFDLLMBench, a benchmark suite comprising three complementary components -- CFDQuery, CFDCodeBench, and FoamBench -- designed to holistically evaluate LLM performance across three key competencies: graduate-level CFD knowledge, numerical and physical reasoning of CFD, and context-dependent implementation of CFD workflows. Grounded in real-world CFD practices, our benchmark combines a detailed task taxonomy with a rigorous evaluation framework to deliver reproducible results and quantify LLM performance across code executability, solution accuracy, and numerical convergence behavior. CFDLLMBench establishes a solid foundation for the development and evaluation of LLM-driven automation of numerical experiments for complex physical systems. Code and data are available at https://github.com/NREL-Theseus/cfdllmbench/.","short_abstract":"Large Language Models (LLMs) have demonstrated strong performance across general NLP tasks, but their utility in automating numerical experiments of complex physical system -- a critical and labor-intensive component -- remains underexplored. As the major workhorse of computational science over the past decades, Comput...","url_abs":"https://arxiv.org/abs/2509.20374","url_pdf":"https://arxiv.org/pdf/2509.20374v3","authors":"[\"Nithin Somasekharan\",\"Ling Yue\",\"Yadi Cao\",\"Weichao Li\",\"Patrick Emami\",\"Pochinapeddi Sai Bhargav\",\"Anurag Acharya\",\"Xingyu Xie\",\"Shaowu Pan\"]","published":"2025-09-19T22:21:26Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":609647,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2869050,"paper_url":"https://arxiv.org/abs/2509.20374","paper_title":"CFDLLMBench: A Benchmark Suite for Evaluating Large Language Models in Computational Fluid Dynamics","repo_url":"https://github.com/NREL-Theseus/cfdllmbench","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
