{"ID":2845312,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.04316","arxiv_id":"2511.04316","title":"AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research","abstract":"The rapid expansion of research on Large Language Model (LLM) safety and robustness has produced a fragmented and oftentimes buggy ecosystem of implementations, datasets, and evaluation methods. This fragmentation makes reproducibility and comparability across studies challenging, hindering meaningful progress. To address these issues, we introduce AdversariaLLM, a toolbox for conducting LLM jailbreak robustness research. Its design centers on reproducibility, correctness, and extensibility. The framework implements twelve adversarial attack algorithms, integrates seven benchmark datasets spanning harmfulness, over-refusal, and utility evaluation, and provides access to a wide range of open-weight LLMs via Hugging Face. The implementation includes advanced features for comparability and reproducibility such as compute-resource tracking, deterministic results, and distributional evaluation techniques. \\name also integrates judging through the companion package JudgeZoo, which can also be used independently. Together, these components aim to establish a robust foundation for transparent, comparable, and reproducible research in LLM safety.","short_abstract":"The rapid expansion of research on Large Language Model (LLM) safety and robustness has produced a fragmented and oftentimes buggy ecosystem of implementations, datasets, and evaluation methods. This fragmentation makes reproducibility and comparability across studies challenging, hindering meaningful progress. To addr...","url_abs":"https://arxiv.org/abs/2511.04316","url_pdf":"https://arxiv.org/pdf/2511.04316v1","authors":"[\"Tim Beyer\",\"Jonas Dornbusch\",\"Jakob Steimle\",\"Moritz Ladenburger\",\"Leo Schwinn\",\"Stephan Günnemann\"]","published":"2025-11-06T12:38:09Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.SE\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
