{"ID":2854467,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.15007","arxiv_id":"2510.15007","title":"Rethinking Toxicity Evaluation in Large Language Models: A Multi-Label Perspective","abstract":"Large language models (LLMs) have achieved impressive results across a range of natural language processing tasks, but their potential to generate harmful content has raised serious safety concerns. Current toxicity detectors primarily rely on single-label benchmarks, which cannot adequately capture the inherently ambiguous and multi-dimensional nature of real-world toxic prompts. This limitation results in biased evaluations, including missed toxic detections and false positives, undermining the reliability of existing detectors. Additionally, gathering comprehensive multi-label annotations across fine-grained toxicity categories is prohibitively costly, further hindering effective evaluation and development. To tackle these issues, we introduce three novel multi-label benchmarks for toxicity detection: \\textbf{Q-A-MLL}, \\textbf{R-A-MLL}, and \\textbf{H-X-MLL}, derived from public toxicity datasets and annotated according to a detailed 15-category taxonomy. We further provide a theoretical proof that, on our released datasets, training with pseudo-labels yields better performance than directly learning from single-label supervision. In addition, we develop a pseudo-label-based toxicity detection method. Extensive experimental results show that our approach significantly surpasses advanced baselines, including GPT-4o and DeepSeek, thus enabling more accurate and reliable evaluation of multi-label toxicity in LLM-generated content.","short_abstract":"Large language models (LLMs) have achieved impressive results across a range of natural language processing tasks, but their potential to generate harmful content has raised serious safety concerns. Current toxicity detectors primarily rely on single-label benchmarks, which cannot adequately capture the inherently ambi...","url_abs":"https://arxiv.org/abs/2510.15007","url_pdf":"https://arxiv.org/pdf/2510.15007v1","authors":"[\"Zhiqiang Kou\",\"Junyang Chen\",\"Xin-Qiang Cai\",\"Ming-Kun Xie\",\"Biao Liu\",\"Changwei Wang\",\"Lei Feng\",\"Yuheng Jia\",\"Gang Niu\",\"Masashi Sugiyama\",\"Xin Geng\"]","published":"2025-10-16T06:50:33Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
