{"ID":2840199,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.14761","arxiv_id":"2511.14761","title":"ARC Is a Vision Problem!","abstract":"The Abstraction and Reasoning Corpus (ARC) is designed to promote research on abstract reasoning, a fundamental aspect of human intelligence. Common approaches to ARC treat it as a language-oriented problem, addressed by large language models (LLMs) or recurrent reasoning models. However, although the puzzle-like tasks in ARC are inherently visual, existing research has rarely approached the problem from a vision-centric perspective. In this work, we formulate ARC within a vision paradigm, framing it as an image-to-image translation problem. To incorporate visual priors, we represent the inputs on a \"canvas\" that can be processed like natural images. It is then natural for us to apply standard vision architectures, such as a vanilla Vision Transformer (ViT), to perform image-to-image mapping. Our model is trained from scratch solely on ARC data and generalizes to unseen tasks through test-time training. Our framework, termed Vision ARC (VARC), achieves 60.4% accuracy on the ARC-1 benchmark, substantially outperforming existing methods that are also trained from scratch. Our results are competitive with those of leading LLMs and close the gap to average human performance.","short_abstract":"The Abstraction and Reasoning Corpus (ARC) is designed to promote research on abstract reasoning, a fundamental aspect of human intelligence. Common approaches to ARC treat it as a language-oriented problem, addressed by large language models (LLMs) or recurrent reasoning models. However, although the puzzle-like tasks...","url_abs":"https://arxiv.org/abs/2511.14761","url_pdf":"https://arxiv.org/pdf/2511.14761v1","authors":"[\"Keya Hu\",\"Ali Cy\",\"Linlu Qiu\",\"Xiaoman Delores Ding\",\"Runqian Wang\",\"Yeyin Eva Zhu\",\"Jacob Andreas\",\"Kaiming He\"]","published":"2025-11-18T18:59:49Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Vision Transformer\",\"Transformer\",\"Large Language Model\",\"Language Model\"]","has_code":false}
