{"ID":2832186,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.06296","arxiv_id":"2512.06296","title":"How Sharp and Bias-Robust is a Model? Dual Evaluation Perspectives on Knowledge Graph Completion","abstract":"Knowledge graph completion (KGC) aims to predict missing facts from the observed KG. While a number of KGC models have been studied, the evaluation of KGC still remain underexplored. In this paper, we observe that existing metrics overlook two key perspectives for KGC evaluation: (A1) predictive sharpness -- the degree of strictness in evaluating an individual prediction, and (A2) popularity-bias robustness -- the ability to predict low-popularity entities. Toward reflecting both perspectives, we propose a novel evaluation framework (PROBE), which consists of a rank transformer (RT) estimating the score of each prediction based on a required level of predictive sharpness and a rank aggregator (RA) aggregating all the scores in a popularity-aware manner. Experiments on real-world KGs reveal that existing metrics tend to over- or under-estimate the accuracy of KGC models, whereas PROBE yields a comprehensive understanding of KGC models and reliable evaluation results.","short_abstract":"Knowledge graph completion (KGC) aims to predict missing facts from the observed KG. While a number of KGC models have been studied, the evaluation of KGC still remain underexplored. In this paper, we observe that existing metrics overlook two key perspectives for KGC evaluation: (A1) predictive sharpness -- the degree...","url_abs":"https://arxiv.org/abs/2512.06296","url_pdf":"https://arxiv.org/pdf/2512.06296v1","authors":"[\"Sooho Moon\",\"Yunyong Ko\"]","published":"2025-12-06T04:49:29Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Transformer\"]","has_code":false}