{"ID":2839886,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.14195","arxiv_id":"2511.14195","title":"N-GLARE: An Non-Generative Latent Representation-Efficient LLM Safety Evaluator","abstract":"Evaluating the safety robustness of LLMs is critical for their deployment. However, mainstream Red Teaming methods rely on online generation and black-box output analysis. These approaches are not only costly but also suffer from feedback latency, making them unsuitable for agile diagnostics after training a new model. To address this, we propose N-GLARE (A Non-Generative, Latent Representation-Efficient LLM Safety Evaluator). N-GLARE operates entirely on the model's latent representations, bypassing the need for full text generation. It characterizes hidden layer dynamics by analyzing the APT (Angular-Probabilistic Trajectory) of latent representations and introducing the JSS (Jensen-Shannon Separability) metric. Experiments on over 40 models and 20 red teaming strategies demonstrate that the JSS metric exhibits high consistency with the safety rankings derived from Red Teaming. N-GLARE reproduces the discriminative trends of large-scale red-teaming tests at less than 1\\% of the token cost and the runtime cost, providing an efficient output-free evaluation proxy for real-time diagnostics.","short_abstract":"Evaluating the safety robustness of LLMs is critical for their deployment. However, mainstream Red Teaming methods rely on online generation and black-box output analysis. These approaches are not only costly but also suffer from feedback latency, making them unsuitable for agile diagnostics after training a new model....","url_abs":"https://arxiv.org/abs/2511.14195","url_pdf":"https://arxiv.org/pdf/2511.14195v2","authors":"[\"Zheyu Lin\",\"Jirui Yang\",\"Yukui Qiu\",\"Hengqi Guo\",\"Yubing Bao\",\"Yao Guan\"]","published":"2025-11-18T07:03:58Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.CR\"]","methods":"[\"Large Language Model\"]","has_code":false}
