{"ID":2829685,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.11251","arxiv_id":"2512.11251","title":"Insight Miner: A Time Series Analysis Dataset for Cross-Domain Alignment with Natural Language","abstract":"Time-series data is critical across many scientific and industrial domains, including environmental analysis, agriculture, transportation, and finance. However, mining insights from this data typically requires deep domain expertise, a process that is both time-consuming and labor-intensive. In this paper, we propose \\textbf{Insight Miner}, a large-scale multimodal model (LMM) designed to generate high-quality, comprehensive time-series descriptions enriched with domain-specific knowledge. To facilitate this, we introduce \\textbf{TS-Insights}\\footnote{Available at \\href{https://huggingface.co/datasets/zhykoties/time-series-language-alignment}{https://huggingface.co/datasets/zhykoties/time-series-language-alignment}.}, the first general-domain dataset for time series and language alignment. TS-Insights contains 100k time-series windows sampled from 20 forecasting datasets. We construct this dataset using a novel \\textbf{agentic workflow}, where we use statistical tools to extract features from raw time series before synthesizing them into coherent trend descriptions with GPT-4. Following instruction tuning on TS-Insights, Insight Miner outperforms state-of-the-art multimodal models, such as LLaVA \\citep{liu2023llava} and GPT-4, in generating time-series descriptions and insights. Our findings suggest a promising direction for leveraging LMMs in time series analysis, and serve as a foundational step toward enabling LLMs to interpret time series as a native input modality.","short_abstract":"Time-series data is critical across many scientific and industrial domains, including environmental analysis, agriculture, transportation, and finance. However, mining insights from this data typically requires deep domain expertise, a process that is both time-consuming and labor-intensive. In this paper, we propose \\...","url_abs":"https://arxiv.org/abs/2512.11251","url_pdf":"https://arxiv.org/pdf/2512.11251v1","authors":"[\"Yunkai Zhang\",\"Yawen Zhang\",\"Ming Zheng\",\"Kezhen Chen\",\"Chongyang Gao\",\"Ruian Ge\",\"Siyuan Teng\",\"Amine Jelloul\",\"Jinmeng Rao\",\"Xiaoyuan Guo\",\"Chiang-Wei Fang\",\"Zeyu Zheng\",\"Jie Yang\"]","published":"2025-12-12T03:18:00Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Large Language Model\"]","has_code":false}
