{"ID":2900929,"CreatedAt":"2026-06-01T05:51:17.9442275Z","UpdatedAt":"2026-06-01T06:23:29.641557848Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2605.31031","arxiv_id":"2605.31031","title":"GraphARC: A Comprehensive Benchmark for Graph-Based Abstract Reasoning","abstract":"Relational reasoning lies at the heart of intelligence, but existing benchmarks are typically confined to formats such as grids or text. We introduce GraphARC, a benchmark for abstract reasoning on graph-structured data. GraphARC generalizes the few-shot transformation learning paradigm of the Abstraction and Reasoning Corpus (ARC). Each task requires inferring a transformation rule from a few input-output pairs and applying it to a new test graph, covering local, global, and hierarchical graph transformations. Unlike grid-based ARC, GraphARC instances can be generated at scale across diverse graph families and sizes, enabling systematic evaluation of generalization abilities. We evaluate state-of-the-art language models on GraphARC and observe clear limitations. Models can answer questions about graph properties but often fail to solve the full graph transformation task, revealing a comprehension-execution gap. Performance further degrades on larger instances, exposing scaling barriers. More broadly, by combining aspects of node classification, link prediction, and graph generation within a single framework, GraphARC provides a promising testbed for future graph foundation models.","short_abstract":"Relational reasoning lies at the heart of intelligence, but existing benchmarks are typically confined to formats such as grids or text. We introduce GraphARC, a benchmark for abstract reasoning on graph-structured data. GraphARC generalizes the few-shot transformation learning paradigm of the Abstraction and Reasoning...","url_abs":"https://arxiv.org/abs/2605.31031","url_pdf":"https://arxiv.org/pdf/2605.31031v1","authors":"[\"Saku Peltonen\",\"August Bøgh Rønberg\",\"Andreas Plesner\",\"Roger Wattenhofer\"]","published":"2026-05-29T09:03:30Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Language Model\"]","has_code":false}
