{"ID":2841616,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.11773","arxiv_id":"2511.11773","title":"On the Measure of a Model: From Intelligence to Generality","abstract":"Benchmarks such as ARC, Raven-inspired tests, and the Blackbird Task are widely used to evaluate the intelligence of large language models (LLMs). Yet, the concept of intelligence remains elusive- lacking a stable definition and failing to predict performance on practical tasks such as question answering, summarization, or coding. Optimizing for such benchmarks risks misaligning evaluation with real-world utility. Our perspective is that evaluation should be grounded in generality rather than abstract notions of intelligence. We identify three assumptions that often underpin intelligence-focused evaluation: generality, stability, and realism. Through conceptual and formal analysis, we show that only generality withstands conceptual and empirical scrutiny. Intelligence is not what enables generality; generality is best understood as a multitask learning problem that directly links evaluation to measurable performance breadth and reliability. This perspective reframes how progress in AI should be assessed and proposes generality as a more stable foundation for evaluating capability across diverse and evolving tasks.","short_abstract":"Benchmarks such as ARC, Raven-inspired tests, and the Blackbird Task are widely used to evaluate the intelligence of large language models (LLMs). Yet, the concept of intelligence remains elusive- lacking a stable definition and failing to predict performance on practical tasks such as question answering, summarization...","url_abs":"https://arxiv.org/abs/2511.11773","url_pdf":"https://arxiv.org/pdf/2511.11773v1","authors":"[\"Ruchira Dhar\",\"Ninell Oldenburg\",\"Anders Soegaard\"]","published":"2025-11-14T09:46:48Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.LG\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
