{"ID":2871649,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.09936","arxiv_id":"2509.09936","title":"SciML Agents: Write the Solver, Not the Solution","abstract":"Recent work in scientific machine learning aims to tackle scientific tasks directly by predicting target values with neural networks (e.g., physics-informed neural networks, neural ODEs, neural operators, etc.), but attaining high accuracy and robustness has been challenging. We explore an alternative view: use LLMs to write code that leverages decades of numerical algorithms. This shifts the burden from learning a solution function to making domain-aware numerical choices. We ask whether LLMs can act as SciML agents that, given a natural-language ODE description, generate runnable code that is scientifically appropriate, selecting suitable solvers (stiff vs. non-stiff), and enforcing stability checks. There is currently no benchmark to measure this kind of capability for scientific computing tasks. As such, we first introduce two new datasets: a diagnostic dataset of adversarial \"misleading\" problems; and a large-scale benchmark of 1,000 diverse ODE tasks. The diagnostic set contains problems whose superficial appearance suggests stiffness, and that require algebraic simplification to demonstrate non-stiffness; and the large-scale benchmark spans stiff and non-stiff ODE regimes. We evaluate open- and closed-source LLM models along two axes: (i) unguided versus guided prompting with domain-specific knowledge; and (ii) off-the-shelf versus fine-tuned variants. Our evaluation measures both executability and numerical validity against reference solutions. We find that with sufficient context and guided prompts, newer instruction-following models achieve high accuracy on both criteria. In many cases, recent open-source systems perform strongly without fine-tuning, while older or smaller models still benefit from fine-tuning. Overall, our preliminary results indicate that careful prompting and fine-tuning can yield a specialized LLM agent capable of reliably solving simple ODE problems.","short_abstract":"Recent work in scientific machine learning aims to tackle scientific tasks directly by predicting target values with neural networks (e.g., physics-informed neural networks, neural ODEs, neural operators, etc.), but attaining high accuracy and robustness has been challenging. We explore an alternative view: use LLMs to...","url_abs":"https://arxiv.org/abs/2509.09936","url_pdf":"https://arxiv.org/pdf/2509.09936v1","authors":"[\"Saarth Gaonkar\",\"Xiang Zheng\",\"Haocheng Xi\",\"Rishabh Tiwari\",\"Kurt Keutzer\",\"Dmitriy Morozov\",\"Michael W. Mahoney\",\"Amir Gholami\"]","published":"2025-09-12T02:53:57Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"math.NA\"]","methods":"[\"Large Language Model\"]","has_code":false}
