{"ID":2853836,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.16060","arxiv_id":"2510.16060","title":"Beyond Accuracy: Are Time Series Foundation Models Well-Calibrated?","abstract":"The recent development of foundation models for time series data has generated considerable interest in using such models across a variety of applications. Although foundation models achieve state-of-the-art predictive performance, their calibration properties remain relatively underexplored, despite the fact that calibration can be critical for many practical applications. In this paper, we investigate the calibration-related properties of five recent time series foundation models and two competitive baselines. We perform a series of systematic evaluations assessing model calibration (i.e., over- or under-confidence), effects of varying prediction heads, and calibration under long-term autoregressive forecasting. We find that time series foundation models are consistently better calibrated than baseline models and tend not to be either systematically over- or under-confident, in contrast to the overconfidence often seen in other deep learning models.","short_abstract":"The recent development of foundation models for time series data has generated considerable interest in using such models across a variety of applications. Although foundation models achieve state-of-the-art predictive performance, their calibration properties remain relatively underexplored, despite the fact that cali...","url_abs":"https://arxiv.org/abs/2510.16060","url_pdf":"https://arxiv.org/pdf/2510.16060v2","authors":"[\"Coen Adler\",\"Yuxin Chang\",\"Felix Draxler\",\"Samar Abdi\",\"Padhraic Smyth\"]","published":"2025-10-17T01:41:24Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"stat.ME\",\"stat.ML\"]","methods":"[]","has_code":false}