{"ID":2864598,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.23143","arxiv_id":"2509.23143","title":"MathBode: Measuring the Stability of LLM Reasoning using Frequency Response","abstract":"This paper presents MathBode, a dynamic diagnostic for mathematical reasoning in large language models (LLMs). Instead of one-shot accuracy, MathBode treats each parametric problem as a system: we drive a single parameter sinusoidally and fit first-harmonic responses of model outputs and exact solutions. This yields interpretable, frequency-resolved metrics -- gain (amplitude tracking) and phase (lag) -- that form Bode-style fingerprints. Across five closed-form families (linear solve, ratio/saturation, compound interest, 2x2 linear systems, similar triangles), the diagnostic surfaces systematic low-pass behavior and growing phase lag that accuracy alone obscures. We compare several models against a symbolic baseline that calibrates the instrument ($G \\approx 1$, $φ\\approx 0$). Results separate frontier from mid-tier models on dynamics, providing a compact, reproducible protocol that complements standard benchmarks with actionable measurements of reasoning fidelity and consistency. We open-source the dataset and code to enable further research and adoption.","short_abstract":"This paper presents MathBode, a dynamic diagnostic for mathematical reasoning in large language models (LLMs). Instead of one-shot accuracy, MathBode treats each parametric problem as a system: we drive a single parameter sinusoidally and fit first-harmonic responses of model outputs and exact solutions. This yields in...","url_abs":"https://arxiv.org/abs/2509.23143","url_pdf":"https://arxiv.org/pdf/2509.23143v4","authors":"[\"Charles L. Wang\"]","published":"2025-09-27T06:06:36Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.LG\",\"eess.SY\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
