{"ID":2842720,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.09139","arxiv_id":"2511.09139","title":"MACEval: A Multi-Agent Continual Evaluation Network for Large Models","abstract":"Hundreds of benchmarks dedicated to evaluating large models have been presented over the past few years. However, most of them remain closed-ended and are prone to overfitting due to the potential data contamination. Moreover, the increasing scale and scope of current benchmarks with transient metrics, as well as the heavily human-dependent curation procedure, pose significant challenges for timely maintenance and adaptation. In this paper, we introduce MACEval, a Multi-Agent Continual Evaluation network for dynamic evaluation of large models, and define new metrics to quantify performance longitudinally. MACEval employs an interactive and autonomous evaluation mode, utilizing role assignment, in-process data generation, and evaluation routing through a cascaded agent network. Extensive experiments on 23 large models demonstrate the effectiveness of MACEval, which also lightens the evaluation process and reduces a considerable amount of overhead. We hope that MACEval can broaden future directions of large model evaluation. Project page: https://github.com/zijianchen98/MACEval.","short_abstract":"Hundreds of benchmarks dedicated to evaluating large models have been presented over the past few years. However, most of them remain closed-ended and are prone to overfitting due to the potential data contamination. Moreover, the increasing scale and scope of current benchmarks with transient metrics, as well as the h...","url_abs":"https://arxiv.org/abs/2511.09139","url_pdf":"https://arxiv.org/pdf/2511.09139v2","authors":"[\"Zijian Chen\",\"Yuze Sun\",\"Yuan Tian\",\"Wenjun Zhang\",\"Guangtao Zhai\"]","published":"2025-11-12T09:26:24Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false,"code_links":[{"ID":607153,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2842720,"paper_url":"https://arxiv.org/abs/2511.09139","paper_title":"MACEval: A Multi-Agent Continual Evaluation Network for Large Models","repo_url":"https://github.com/zijianchen98/MACEval","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
