{"ID":2888694,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.22457","arxiv_id":"2507.22457","title":"What is an \"Abstract Reasoner\"? Revisiting Experiments and Arguments about Large Language Models","abstract":"Recent work has argued that large language models (LLMs) are not \"abstract reasoners\", citing their poor zero-shot performance on a variety of challenging tasks as evidence. We revisit these experiments in order to add nuance to the claim. First, we show that while LLMs indeed perform poorly in a zero-shot setting, even tuning a small subset of parameters for input encoding can enable near-perfect performance. However, we also show that this finetuning does not necessarily transfer across datasets. We take this collection of empirical results as an invitation to (re-)open the discussion of what it means to be an \"abstract reasoner\", and why it matters whether LLMs fit the bill.","short_abstract":"Recent work has argued that large language models (LLMs) are not \"abstract reasoners\", citing their poor zero-shot performance on a variety of challenging tasks as evidence. We revisit these experiments in order to add nuance to the claim. First, we show that while LLMs indeed perform poorly in a zero-shot setting, eve...","url_abs":"https://arxiv.org/abs/2507.22457","url_pdf":"https://arxiv.org/pdf/2507.22457v1","authors":"[\"Tian Yun\",\"Chen Sun\",\"Ellie Pavlick\"]","published":"2025-07-30T08:04:19Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
