{"ID":2899149,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.01627","arxiv_id":"2507.01627","title":"Chart Question Answering from Real-World Analytical Narratives","abstract":"We present a new dataset for chart question answering (CQA) constructed from visualization notebooks. The dataset features real-world, multi-view charts paired with natural language questions grounded in analytical narratives. Unlike prior benchmarks, our data reflects ecologically valid reasoning workflows. Benchmarking state-of-the-art multimodal large language models reveals a significant performance gap, with GPT-4.1 achieving an accuracy of 69.3%, underscoring the challenges posed by this more authentic CQA setting.","short_abstract":"We present a new dataset for chart question answering (CQA) constructed from visualization notebooks. The dataset features real-world, multi-view charts paired with natural language questions grounded in analytical narratives. Unlike prior benchmarks, our data reflects ecologically valid reasoning workflows. Benchmarki...","url_abs":"https://arxiv.org/abs/2507.01627","url_pdf":"https://arxiv.org/pdf/2507.01627v1","authors":"[\"Maeve Hutchinson\",\"Radu Jianu\",\"Aidan Slingsby\",\"Jo Wood\",\"Pranava Madhyastha\"]","published":"2025-07-02T11:58:04Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Language Model\"]","has_code":false}