{"ID":2872716,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.18122","arxiv_id":"2509.18122","title":"GAUSS: Benchmarking Structured Mathematical Skills for Large Language Models","abstract":"We introduce \\textbf{GAUSS} (\\textbf{G}eneral \\textbf{A}ssessment of \\textbf{U}nderlying \\textbf{S}tructured \\textbf{S}kills in Mathematics), a benchmark that evaluates LLMs' mathematical abilities across twelve core skill dimensions, grouped into three domains: knowledge and understanding, problem solving and communication, and meta-skills and creativity. By categorizing problems according to cognitive skills and designing tasks that isolate specific abilities, GAUSS constructs comprehensive, fine-grained, and interpretable profiles of models' mathematical abilities. These profiles faithfully represent their underlying mathematical intelligence. To exemplify how to use the \\textsc{GAUSS} benchmark, we have derived the skill profile of \\textsc{GPT-5-thinking}, revealing its strengths and weaknesses as well as its differences relative to \\textsc{o4-mini-high}, thereby underscoring the value of multidimensional, skill-based evaluation.","short_abstract":"We introduce \\textbf{GAUSS} (\\textbf{G}eneral \\textbf{A}ssessment of \\textbf{U}nderlying \\textbf{S}tructured \\textbf{S}kills in Mathematics), a benchmark that evaluates LLMs' mathematical abilities across twelve core skill dimensions, grouped into three domains: knowledge and understanding, problem solving and communic...","url_abs":"https://arxiv.org/abs/2509.18122","url_pdf":"https://arxiv.org/pdf/2509.18122v1","authors":"[\"Yue Zhang\",\"Jiaxin Zhang\",\"Qiuyu Ren\",\"Tahsin Saffat\",\"Xiaoxuan Liu\",\"Zitong Yang\",\"Banghua Zhu\",\"Yi Ma\"]","published":"2025-09-10T17:35:38Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
