{"ID":2831037,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.08270","arxiv_id":"2512.08270","title":"Reasoning Models Ace the CFA Exams","abstract":"Previous research has reported that large language models (LLMs) demonstrate poor performance on the Chartered Financial Analyst (CFA) exams. However, recent reasoning models have achieved strong results on graduate-level academic and professional examinations across various disciplines. In this paper, we evaluate state-of-the-art reasoning models on a set of mock CFA exams consisting of 980 questions across three Level I exams, two Level II exams, and three Level III exams. Using the same pass/fail criteria from prior studies, we find that most models clear all three levels. The models that pass, ordered by overall performance, are Gemini 3.0 Pro, Gemini 2.5 Pro, GPT-5, Grok 4, Claude Opus 4.1, and DeepSeek-V3.1. Specifically, Gemini 3.0 Pro achieves a record score of 97.6% on Level I. Performance is also strong on Level II, led by GPT-5 at 94.3%. On Level III, Gemini 2.5 Pro attains the highest score with 86.4% on multiple-choice questions while Gemini 3.0 Pro achieves 92.0% on constructed-response questions.","short_abstract":"Previous research has reported that large language models (LLMs) demonstrate poor performance on the Chartered Financial Analyst (CFA) exams. However, recent reasoning models have achieved strong results on graduate-level academic and professional examinations across various disciplines. In this paper, we evaluate stat...","url_abs":"https://arxiv.org/abs/2512.08270","url_pdf":"https://arxiv.org/pdf/2512.08270v1","authors":"[\"Jaisal Patel\",\"Yunzhe Chen\",\"Kaiwen He\",\"Keyi Wang\",\"David Li\",\"Kairong Xiao\",\"Xiao-Yang Liu\"]","published":"2025-12-09T05:57:19Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CL\",\"q-fin.GN\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}