{"ID":2834964,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.01010","arxiv_id":"2512.01010","title":"Chain of Unit-Physics: A Primitive-Centric Approach to Scientific Code Synthesis","abstract":"Agentic large language models are proposed as autonomous code generators for scientific computing, yet their reliability in high-stakes problems remains unclear. Developing computational scientific software from natural-language queries remains challenging broadly due to (a) sparse representation of domain codes during training and (b) the limited feasibility of RLHF with a small expert community. To address these limitations, this work conceptualizes an inverse approach to code design, embodied in the Chain of Unit-Physics framework: a first-principles (or primitives)-centric, multi-agent system in which human expert knowledge is encoded as unit-physics tests that explicitly constrain code generation. The framework is evaluated on a nontrivial combustion task, used here as a representative benchmark for scientific problem with realistic physical constraints. Closed-weight systems and code-focused agentic variants fail to produce correct end-to-end solvers, despite tool and web access, exhibiting four recurrent error classes: interface (syntax/API) hallucinations, overconfident assumptions, numerical/physical incoherence, and configuration fragility. Open-weight models with chain-of-thought (CoT) decoding reduce interface errors but still yield incorrect solutions. On the benchmark task, the proposed framework converges within 5-6 iterations, matches the human-expert implementation (mean error of $3.1\\times10^{-3}$ %), with a $\\sim$33.4 % faster runtime and a $\\sim$30 % efficient memory usage at a cost comparable to mid-sized commercial APIs, yielding a practical template for physics-grounded scientific code generation. As datasets and models evolve, zero-shot code accuracy will improve; however, the Chain of Unit-Physics framework goes further by embedding first-principles analysis that is foundational to scientific codes.","short_abstract":"Agentic large language models are proposed as autonomous code generators for scientific computing, yet their reliability in high-stakes problems remains unclear. Developing computational scientific software from natural-language queries remains challenging broadly due to (a) sparse representation of domain codes during...","url_abs":"https://arxiv.org/abs/2512.01010","url_pdf":"https://arxiv.org/pdf/2512.01010v1","authors":"[\"Vansh Sharma\",\"Venkat Raman\"]","published":"2025-11-30T18:16:50Z","proceeding":"cs.MA","tasks":"[\"cs.MA\",\"cs.AI\",\"cs.LG\",\"cs.SE\",\"physics.comp-ph\",\"physics.flu-dyn\"]","methods":"[\"Language Model\",\"RLHF\"]","has_code":false}