{"ID":2879458,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.16402","arxiv_id":"2508.16402","title":"AetherCode: Evaluating LLMs' Ability to Win In Premier Programming Competitions","abstract":"Competitive programming has emerged as a critical benchmark for evaluating the reasoning and coding capabilities of Large Language Models (LLMs). Despite impressive progress on existing benchmarks, we argue that current evaluations overstate model proficiency, masking a substantial gap between LLMs and elite human programmers. This gap arises from two key limitations: insufficient difficulty and scope of benchmark problems, and evaluation bias from low-quality test cases. To address these shortcomings, we present AetherCode, a new benchmark that draws problems from premier programming competitions such as IOI and ICPC, offering broader coverage and higher difficulty. AetherCode further incorporates comprehensive, expert-validated test suites built through a hybrid of automated generation and human curation, ensuring rigorous and reliable assessment. By combining challenging problem design with robust evaluation, AetherCode provides a more faithful measure of LLM capabilities and sets a new standard for future research in code reasoning.","short_abstract":"Competitive programming has emerged as a critical benchmark for evaluating the reasoning and coding capabilities of Large Language Models (LLMs). Despite impressive progress on existing benchmarks, we argue that current evaluations overstate model proficiency, masking a substantial gap between LLMs and elite human prog...","url_abs":"https://arxiv.org/abs/2508.16402","url_pdf":"https://arxiv.org/pdf/2508.16402v1","authors":"[\"Zihan Wang\",\"Jiaze Chen\",\"Zhicheng Liu\",\"Markus Mak\",\"Yidi Du\",\"Geonsik Moon\",\"Luoqi Xu\",\"Aaron Tua\",\"Kunshuo Peng\",\"Jiayi Lu\",\"Mingfei Xia\",\"Boqian Zou\",\"Chenyang Ran\",\"Guang Tian\",\"Shoutai Zhu\",\"Yeheng Duan\",\"Zhenghui Kang\",\"Zhenxing Lin\",\"Shangshu Li\",\"Qiang Luo\",\"Qingshen Long\",\"Zhiyong Chen\",\"Yihan Xiao\",\"Yurong Wu\",\"Daoguang Zan\",\"Yuyi Fu\",\"Mingxuan Wang\",\"Ming Ding\"]","published":"2025-08-22T14:04:55Z","proceeding":"cs.SE","tasks":"[\"cs.SE\",\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}