{"ID":2887125,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.02879","arxiv_id":"2508.02879","title":"CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data","abstract":"Time series foundation models (TSFMs) have recently gained significant attention due to their strong zero-shot capabilities and widespread real-world applications. Such models typically require a computationally costly pre-training on large-scale, carefully curated collections of real-world sequences. To allow for a sample-efficient pre-training of TSFMs, we propose \\textsc{CauKer}, a novel algorithm designed to generate diverse, causally coherent synthetic time series with realistic trends, seasonality, and nonlinear interactions. \\textsc{CauKer} combines Gaussian Process (GP) kernel composition with Structural Causal Models (SCM) to produce data for sample-efficient pre-training of state-of-the-art classification TSFMs having different architectures and following different pre-training approaches. Additionally, our experiments reveal that \\textsc{CauKer}-generated datasets exhibit clear scaling laws for both dataset size (10K to 10M samples) and model capacity (1M to 783M parameters), unlike real-world datasets, which display irregular scaling behavior. The source code is publicly available at https://github.com/ShifengXIE/CauKer.","short_abstract":"Time series foundation models (TSFMs) have recently gained significant attention due to their strong zero-shot capabilities and widespread real-world applications. Such models typically require a computationally costly pre-training on large-scale, carefully curated collections of real-world sequences. To allow for a sa...","url_abs":"https://arxiv.org/abs/2508.02879","url_pdf":"https://arxiv.org/pdf/2508.02879v3","authors":"[\"Shifeng Xie\",\"Vasilii Feofanov\",\"Ambroise Odonnat\",\"Lei Zan\",\"Marius Alonso\",\"Jianfeng Zhang\",\"Themis Palpanas\",\"Lujia Pan\",\"Keli Zhang\",\"Ievgen Redko\"]","published":"2025-08-04T20:18:31Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[]","has_code":false,"code_links":[{"ID":611397,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2887125,"paper_url":"https://arxiv.org/abs/2508.02879","paper_title":"CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data","repo_url":"https://github.com/ShifengXIE/CauKer","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
