{"ID":2847482,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.27238","arxiv_id":"2510.27238","title":"DRAMA: Unifying Data Retrieval and Analysis for Open-Domain Analytic Queries","abstract":"Manually conducting real-world data analyses is labor-intensive and inefficient. Despite numerous attempts to automate data science workflows, none of the existing paradigms or systems fully demonstrate all three key capabilities required to support them effectively: (1) open-domain data collection, (2) structured data transformation, and (3) analytic reasoning. To overcome these limitations, we propose DRAMA, an end-to-end paradigm that answers users' analytic queries in natural language on large-scale open-domain data. DRAMA unifies data collection, transformation, and analysis as a single pipeline. To quantitatively evaluate system performance on tasks representative of DRAMA, we construct a benchmark, DRAMA-Bench, consisting of two categories of tasks: claim verification and question answering, each comprising 100 instances. These tasks are derived from real-world applications that have gained significant public attention and require the retrieval and analysis of open-domain data. We develop DRAMA-Bot, a multi-agent system designed following DRAMA. It comprises a data retriever that collects and transforms data by coordinating the execution of sub-agents, and a data analyzer that performs structured reasoning over the retrieved data. We evaluate DRAMA-Bot on DRAMA-Bench together with five state-of-the-art baseline agents. DRAMA-Bot achieves 86.5% task accuracy at a cost of $0.05, outperforming all baselines with up to 6.9 times the accuracy and less than 1/6 of the cost. DRAMA is publicly available at https://github.com/uiuc-kang-lab/drama.","short_abstract":"Manually conducting real-world data analyses is labor-intensive and inefficient. Despite numerous attempts to automate data science workflows, none of the existing paradigms or systems fully demonstrate all three key capabilities required to support them effectively: (1) open-domain data collection, (2) structured data...","url_abs":"https://arxiv.org/abs/2510.27238","url_pdf":"https://arxiv.org/pdf/2510.27238v1","authors":"[\"Chuxuan Hu\",\"Maxwell Yang\",\"James Weiland\",\"Yeji Lim\",\"Suhas Palawala\",\"Daniel Kang\"]","published":"2025-10-31T07:00:21Z","proceeding":"cs.DB","tasks":"[\"cs.DB\",\"cs.AI\",\"cs.CL\",\"cs.IR\"]","methods":"[]","has_code":false,"code_links":[{"ID":607531,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2847482,"paper_url":"https://arxiv.org/abs/2510.27238","paper_title":"DRAMA: Unifying Data Retrieval and Analysis for Open-Domain Analytic Queries","repo_url":"https://github.com/uiuc-kang-lab/drama","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
