{"ID":2847811,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.26037","arxiv_id":"2510.26037","title":"SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning","abstract":"The ability of LLM agents to plan and invoke tools exposes them to new safety risks, making a comprehensive red-teaming system crucial for discovering vulnerabilities and ensuring their safe deployment. We present SIRAJ: a generic red-teaming framework for arbitrary black-box LLM agents. We employ a dynamic two-step process that starts with an agent definition and generates diverse seed test cases that cover various risk outcomes, tool-use trajectories, and risk sources. Then, it iteratively constructs and refines model-based adversarial attacks based on the execution trajectories of former attempts. To optimize the red-teaming cost, we present a model distillation approach that leverages structured forms of a teacher model's reasoning to train smaller models that are equally effective. Across diverse evaluation agent settings, our seed test case generation approach yields 2 -- 2.5x boost to the coverage of risk outcomes and tool-calling trajectories. Our distilled 8B red-teamer model improves attack success rate by 100%, surpassing the 671B Deepseek-R1 model. Our ablations and analyses validate the effectiveness of the iterative framework, structured reasoning, and the generalization of our red-teamer models.","short_abstract":"The ability of LLM agents to plan and invoke tools exposes them to new safety risks, making a comprehensive red-teaming system crucial for discovering vulnerabilities and ensuring their safe deployment. We present SIRAJ: a generic red-teaming framework for arbitrary black-box LLM agents. We employ a dynamic two-step pr...","url_abs":"https://arxiv.org/abs/2510.26037","url_pdf":"https://arxiv.org/pdf/2510.26037v1","authors":"[\"Kaiwen Zhou\",\"Ahmed Elgohary\",\"A S M Iftekhar\",\"Amin Saied\"]","published":"2025-10-30T00:32:58Z","proceeding":"cs.CR","tasks":"[\"cs.CR\",\"cs.AI\",\"cs.CL\"]","methods":"[\"Large Language Model\"]","has_code":false}
