{"ID":2898173,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.04121","arxiv_id":"2507.04121","title":"Model selection for stochastic dynamics: a parsimonious and principled approach","abstract":"This thesis focuses on the discovery of stochastic differential equations (SDEs) and stochastic partial differential equations (SPDEs) from noisy and discrete time series. A major challenge is selecting the simplest possible correct model from vast libraries of candidate models, where standard information criteria (AIC, BIC) are often limited. We introduce PASTIS (Parsimonious Stochastic Inference), a new information criterion derived from extreme value theory. Its penalty term, $n_\\mathcal{B} \\ln(n_0/p)$, explicitly incorporates the size of the initial library of candidate parameters ($n_0$), the number of parameters in the considered model ($n_\\mathcal{B}$), and a significance threshold ($p$). This significance threshold represents the probability of selecting a model containing more parameters than necessary when comparing many models. Benchmarks on various systems (Lorenz, Ornstein-Uhlenbeck, Lotka-Volterra for SDEs; Gray-Scott for SPDEs) demonstrate that PASTIS outperforms AIC, BIC, cross-validation (CV), and SINDy (a competing method) in terms of exact model identification and predictive capability. Furthermore, real-world data can be subject to large sampling intervals ($Δt$) or measurement noise ($σ$), which can impair model learning and selection capabilities. To address this, we have developed robust variants of PASTIS, PASTIS-$Δt$ and PASTIS-$σ$, thus extending the applicability of the approach to imperfect experimental data. PASTIS thus provides a statistically grounded, validated, and practical methodological framework for discovering simple models for processes with stochastic dynamics.","short_abstract":"This thesis focuses on the discovery of stochastic differential equations (SDEs) and stochastic partial differential equations (SPDEs) from noisy and discrete time series. A major challenge is selecting the simplest possible correct model from vast libraries of candidate models, where standard information criteria (AIC...","url_abs":"https://arxiv.org/abs/2507.04121","url_pdf":"https://arxiv.org/pdf/2507.04121v1","authors":"[\"Andonis Gerardos\"]","published":"2025-07-05T18:15:26Z","proceeding":"stat.ML","tasks":"[\"stat.ML\",\"cond-mat.stat-mech\",\"cs.LG\",\"physics.comp-ph\"]","methods":"[]","has_code":false}
