{"ID":2884680,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.06194","arxiv_id":"2508.06194","title":"SceneJailEval: A Scenario-Adaptive Multi-Dimensional Framework for Jailbreak Evaluation","abstract":"Accurate jailbreak evaluation is critical for LLM red team testing and jailbreak research. Mainstream methods rely on binary classification (string matching, toxic text classifiers, and LLM-based methods), outputting only \"yes/no\" labels without quantifying harm severity. Emerged multi-dimensional frameworks (e.g., Security Violation, Relative Truthfulness and Informativeness) use unified evaluation standards across scenarios, leading to scenario-specific mismatches (e.g., \"Relative Truthfulness\" is irrelevant to \"hate speech\"), undermining evaluation accuracy. To address these, we propose SceneJailEval, with key contributions: (1) A pioneering scenario-adaptive multi-dimensional framework for jailbreak evaluation, overcoming the critical \"one-size-fits-all\" limitation of existing multi-dimensional methods, and boasting robust extensibility to seamlessly adapt to customized or emerging scenarios. (2) A novel 14-scenario dataset featuring rich jailbreak variants and regional cases, addressing the long-standing gap in high-quality, comprehensive benchmarks for scenario-adaptive evaluation. (3) SceneJailEval delivers state-of-the-art performance with an F1 score of 0.917 on our full-scenario dataset (+6% over SOTA) and 0.995 on JBB (+3% over SOTA), breaking through the accuracy bottleneck of existing evaluation methods in heterogeneous scenarios and solidifying its superiority.","short_abstract":"Accurate jailbreak evaluation is critical for LLM red team testing and jailbreak research. Mainstream methods rely on binary classification (string matching, toxic text classifiers, and LLM-based methods), outputting only \"yes/no\" labels without quantifying harm severity. Emerged multi-dimensional frameworks (e.g., Sec...","url_abs":"https://arxiv.org/abs/2508.06194","url_pdf":"https://arxiv.org/pdf/2508.06194v2","authors":"[\"Lai Jiang\",\"Yuekang Li\",\"Xiaohan Zhang\",\"Youtao Ding\",\"Li Pan\"]","published":"2025-08-08T10:19:21Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Large Language Model\"]","has_code":false}
