{"ID":2864600,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.23145","arxiv_id":"2509.23145","title":"TimeExpert: Boosting Long Time Series Forecasting with Temporal Mix of Experts","abstract":"Transformer-based architectures dominate time series modeling by enabling global attention over all timestamps, yet their rigid 'one-size-fits-all' context aggregation fails to address two critical challenges in real-world data: (1) inherent lag effects, where the relevance of historical timestamps to a query varies dynamically; (2) anomalous segments, which introduce noisy signals that degrade forecasting accuracy. To resolve these problems, we propose the Temporal Mix of Experts (TMOE), a novel attention-level mechanism that reimagines key-value (K-V) pairs as local experts (each specialized in a distinct temporal context) and performs adaptive expert selection for each query via localized filtering of irrelevant timestamps. Complementing this local adaptation, a shared global expert preserves the Transformer's strength in capturing long-range dependencies. We then replace the vanilla attention mechanism in popular time-series Transformer frameworks (i.e., PatchTST and Timer) with TMOE, without extra structural modifications, yielding our specific version TimeExpert and general version TimeExpert-G. Extensive experiments on seven real-world long-term forecasting benchmarks demonstrate that TimeExpert and TimeExpert-G outperform state-of-the-art methods. Code is available at https://github.com/xwmaxwma/TimeExpert.","short_abstract":"Transformer-based architectures dominate time series modeling by enabling global attention over all timestamps, yet their rigid 'one-size-fits-all' context aggregation fails to address two critical challenges in real-world data: (1) inherent lag effects, where the relevance of historical timestamps to a query varies dy...","url_abs":"https://arxiv.org/abs/2509.23145","url_pdf":"https://arxiv.org/pdf/2509.23145v1","authors":"[\"Xiaowen Ma\",\"Shuning Ge\",\"Fan Yang\",\"Xiangyu Li\",\"Yun Chen\",\"Mengting Ma\",\"Wei Zhang\",\"Zhipeng Liu\"]","published":"2025-09-27T06:22:09Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Transformer\"]","has_code":false,"code_links":[{"ID":609175,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2864600,"paper_url":"https://arxiv.org/abs/2509.23145","paper_title":"TimeExpert: Boosting Long Time Series Forecasting with Temporal Mix of Experts","repo_url":"https://github.com/xwmaxwma/TimeExpert","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
