{"ID":2851123,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.20439","arxiv_id":"2510.20439","title":"Explainable Benchmarking through the Lense of Concept Learning","abstract":"Evaluating competing systems in a comparable way, i.e., benchmarking them, is an undeniable pillar of the scientific method. However, system performance is often summarized via a small number of metrics. The analysis of the evaluation details and the derivation of insights for further development or use remains a tedious manual task with often biased results. Thus, this paper argues for a new type of benchmarking, which is dubbed explainable benchmarking. The aim of explainable benchmarking approaches is to automatically generate explanations for the performance of systems in a benchmark. We provide a first instantiation of this paradigm for knowledge-graph-based question answering systems. We compute explanations by using a novel concept learning approach developed for large knowledge graphs called PruneCEL. Our evaluation shows that PruneCEL outperforms state-of-the-art concept learners on the task of explainable benchmarking by up to 0.55 points F1 measure. A task-driven user study with 41 participants shows that in 80\\% of the cases, the majority of participants can accurately predict the behavior of a system based on our explanations. Our code and data are available at https://github.com/dice-group/PruneCEL/tree/K-cap2025","short_abstract":"Evaluating competing systems in a comparable way, i.e., benchmarking them, is an undeniable pillar of the scientific method. However, system performance is often summarized via a small number of metrics. The analysis of the evaluation details and the derivation of insights for further development or use remains a tedio...","url_abs":"https://arxiv.org/abs/2510.20439","url_pdf":"https://arxiv.org/pdf/2510.20439v1","authors":"[\"Quannian Zhang\",\"Michael Röder\",\"Nikit Srivastava\",\"N'Dah Jean Kouagou\",\"Axel-Cyrille Ngonga Ngomo\"]","published":"2025-10-23T11:20:20Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[]","has_code":false,"code_links":[{"ID":607870,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2851123,"paper_url":"https://arxiv.org/abs/2510.20439","paper_title":"Explainable Benchmarking through the Lense of Concept Learning","repo_url":"https://github.com/dice-group/PruneCEL","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
