{"ID":2862472,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.25721","arxiv_id":"2509.25721","title":"The AI Productivity Index (APEX)","abstract":"We present an extended version of the AI Productivity Index (APEX-v1-extended), a benchmark for assessing whether frontier models are capable of performing economically valuable tasks in four jobs: investment banking associate, management consultant, big law associate, and primary care physician (MD). This technical report details the extensions to APEX-v1, including an increase in the held-out evaluation set from n = 50 to n = 100 cases per job (n = 400 total) and updates to the grading methodology. We present a new leaderboard, where GPT5 (Thinking = High) remains the top performing model with a score of 67.0%. APEX-v1-extended shows that frontier models still have substantial limitations when performing typical professional tasks. To support further research, we are open sourcing n = 25 non-benchmark example cases per role (n = 100 total) along with our evaluation harness.","short_abstract":"We present an extended version of the AI Productivity Index (APEX-v1-extended), a benchmark for assessing whether frontier models are capable of performing economically valuable tasks in four jobs: investment banking associate, management consultant, big law associate, and primary care physician (MD). This technical re...","url_abs":"https://arxiv.org/abs/2509.25721","url_pdf":"https://arxiv.org/pdf/2509.25721v6","authors":"[\"Bertie Vidgen\",\"Abby Fennelly\",\"Evan Pinnix\",\"Julien Benchek\",\"Daniyal Khan\",\"Zach Richards\",\"Austin Bridges\",\"Calix Huang\",\"Kanishka Sahu\",\"Abhishek Kottamasu\",\"Bo Ma\",\"Ben Hunsberger\",\"Isaac Robinson\",\"Akul Datta\",\"Chirag Mahapatra\",\"Dominic Barton\",\"Cass R. Sunstein\",\"Eric Topol\",\"Brendan Foody\",\"Osvald Nitski\"]","published":"2025-09-30T03:26:17Z","proceeding":"econ.GN","tasks":"[\"econ.GN\",\"cs.AI\",\"cs.CL\",\"cs.HC\"]","methods":"[]","has_code":false}
