{"ID":2857727,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.09781","arxiv_id":"2510.09781","title":"Building a Foundational Guardrail for General Agentic Systems via Synthetic Data","abstract":"While LLM agents can plan multi-step tasks, intervening at the planning stage-before any action is executed-is often the safest way to prevent harm, since certain risks can lead to severe consequences once carried out. However, existing guardrails mostly operate post-execution, which is difficult to scale and leaves little room for controllable supervision at the plan level. To address this challenge, we highlight three critical gaps in current research: data gap, model gap, and evaluation gap. To close the data gap, we introduce AuraGen, a controllable engine that (i) synthesizes benign trajectories, (ii) injects category-labeled risks with calibrated difficulty, and (iii) filters outputs via an automated reward model, producing large and reliable corpora for pre-execution safety. To close the guardian model gap, we propose a foundational guardrail Safiron, combining a cross-planner adapter with a compact guardian model. The adapter unifies different input formats, while Safiron flags risky cases, assigns risk types, and generates rationales; trained in two stages with a broadly explored data recipe, Safiron achieves robust transfer across settings. To close the evaluation gap, we release Pre-Exec Bench, a realistic benchmark covering diverse tools and branching trajectories, which measures detection, fine-grained categorization, explanation, and cross-planner generalization in human-verified scenarios. Extensive experiments demonstrate consistent gains of the proposed guardrail over strong baselines on Pre-Exec Bench, and ablations further distill actionable practices, providing a practical template for safer agentic systems.","short_abstract":"While LLM agents can plan multi-step tasks, intervening at the planning stage-before any action is executed-is often the safest way to prevent harm, since certain risks can lead to severe consequences once carried out. However, existing guardrails mostly operate post-execution, which is difficult to scale and leaves li...","url_abs":"https://arxiv.org/abs/2510.09781","url_pdf":"https://arxiv.org/pdf/2510.09781v1","authors":"[\"Yue Huang\",\"Hang Hua\",\"Yujun Zhou\",\"Pengcheng Jing\",\"Manish Nagireddy\",\"Inkit Padhi\",\"Greta Dolcetti\",\"Zhangchen Xu\",\"Subhajit Chaudhury\",\"Ambrish Rawat\",\"Liubov Nedoshivina\",\"Pin-Yu Chen\",\"Prasanna Sattigeri\",\"Xiangliang Zhang\"]","published":"2025-10-10T18:42:32Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.CL\"]","methods":"[\"Large Language Model\"]","has_code":false}
