{"ID":2833737,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.04324","arxiv_id":"2512.04324","title":"DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle","abstract":"Real-world enterprise data intelligence workflows encompass data engineering that turns raw sources into analytical-ready tables and data analysis that convert those tables into decision-oriented insights. We introduce DAComp, a benchmark of 210 tasks that mirrors these complex workflows. Data engineering (DE) tasks require repository-level engineering on industrial schemas, including designing and building multi-stage SQL pipelines from scratch and evolving existing systems under evolving requirements. Data analysis (DA) tasks pose open-ended business problems that demand strategic planning, exploratory analysis through iterative coding, interpretation of intermediate results, and the synthesis of actionable recommendations. Engineering tasks are scored through execution-based, multi-metric evaluation. Open-ended tasks are assessed by a reliable, experimentally validated LLM-judge, which is guided by hierarchical, meticulously crafted rubrics. Our experiments reveal that even state-of-the-art agents falter on DAComp. Performance on DE tasks is particularly low, with success rates under 20%, exposing a critical bottleneck in holistic pipeline orchestration, not merely code generation. Scores on DA tasks also average below 40%, highlighting profound deficiencies in open-ended reasoning and demonstrating that engineering and analysis are distinct capabilities. By clearly diagnosing these limitations, DAComp provides a rigorous and realistic testbed to drive the development of truly capable autonomous data agents for enterprise settings. Our data and code are available at https://da-comp.github.io","short_abstract":"Real-world enterprise data intelligence workflows encompass data engineering that turns raw sources into analytical-ready tables and data analysis that convert those tables into decision-oriented insights. We introduce DAComp, a benchmark of 210 tasks that mirrors these complex workflows. Data engineering (DE) tasks re...","url_abs":"https://arxiv.org/abs/2512.04324","url_pdf":"https://arxiv.org/pdf/2512.04324v1","authors":"[\"Fangyu Lei\",\"Jinxiang Meng\",\"Yiming Huang\",\"Junjie Zhao\",\"Yitong Zhang\",\"Jianwen Luo\",\"Xin Zou\",\"Ruiyi Yang\",\"Wenbo Shi\",\"Yan Gao\",\"Shizhu He\",\"Zuo Wang\",\"Qian Liu\",\"Yang Wang\",\"Ke Wang\",\"Jun Zhao\",\"Kang Liu\"]","published":"2025-12-03T23:21:28Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Large Language Model\",\"LoRA\"]","has_code":false}
