{"ID":2887599,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.01324","arxiv_id":"2508.01324","title":"Towards Evaluation for Real-World LLM Unlearning","abstract":"This paper analyzes the limitations of existing unlearning evaluation metrics in terms of practicality, exactness, and robustness in real-world LLM unlearning scenarios. To overcome these limitations, we propose a new metric called Distribution Correction-based Unlearning Evaluation (DCUE). It identifies core tokens and corrects distributional biases in their confidence scores using a validation set. The evaluation results are quantified using the Kolmogorov-Smirnov test. Experimental results demonstrate that DCUE overcomes the limitations of existing metrics, which also guides the design of more practical and reliable unlearning algorithms in the future.","short_abstract":"This paper analyzes the limitations of existing unlearning evaluation metrics in terms of practicality, exactness, and robustness in real-world LLM unlearning scenarios. To overcome these limitations, we propose a new metric called Distribution Correction-based Unlearning Evaluation (DCUE). It identifies core tokens an...","url_abs":"https://arxiv.org/abs/2508.01324","url_pdf":"https://arxiv.org/pdf/2508.01324v1","authors":"[\"Ke Miao\",\"Yuke Hu\",\"Xiaochen Li\",\"Wenjie Bao\",\"Zhihao Liu\",\"Zhan Qin\",\"Kui Ren\"]","published":"2025-08-02T11:32:41Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Large Language Model\"]","has_code":false}