{"ID":2923505,"CreatedAt":"2026-06-02T04:05:25.881865328Z","UpdatedAt":"2026-06-04T17:36:40.748176825Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.02540","arxiv_id":"2606.02540","title":"SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction","abstract":"Agent skills occupy a privileged position in the agent workflow, as agents are expected to implicitly follow and execute them, rendering third-party skills a vulnerable attack surface. Existing studies have revealed unsafe agent behaviors induced by skill-based attacks, but they primarily evaluate poisoned skills within a single task execution and enumerate harms through ad-hoc risk lists. To bridge these gaps, we introduce SkillHarm, a benchmark of skill-based attacks across the skill-use lifecycle, paired with a systematic taxonomy of skill-relevant risks. SkillHarm evaluates two attack scenarios: Fixed-Payload Poisoning (FPP), where a fixed poisoned skill package directly compromises any task session that invokes it, and Self-Mutating Poisoning (SMP), where an initially benign execution silently mutates persistent skill content, deferring harm until a subsequent reuse. It further defines 12 risk types based on the agent workflow component targeted by the harm: data pipelines, system environments, and agent autonomy. To instantiate these attacks at scale, we build AutoSkillHarm, an automated construction pipeline with coding agents driven by natural-language harnesses. The resulting benchmark contains 879 attack samples across 71 skills. Experiments show that current agents remain vulnerable with attack success rates up to 86.3% in FPP and 69.3% in SMP. Our analysis further reveals a latent risk: many apparent attack failures stem from the agent failing to engage with the poisoned file rather than genuine resistance, and current defenses still fail to reliably mitigate the threat.","short_abstract":"Agent skills occupy a privileged position in the agent workflow, as agents are expected to implicitly follow and execute them, rendering third-party skills a vulnerable attack surface. Existing studies have revealed unsafe agent behaviors induced by skill-based attacks, but they primarily evaluate poisoned skills withi...","url_abs":"https://arxiv.org/abs/2606.02540","url_pdf":"https://arxiv.org/pdf/2606.02540v1","authors":"[\"Yuting Ning\",\"Zhehao Zhang\",\"Yash Kumar Lal\",\"Boyu Gou\",\"Junyi Li\",\"Weitong Ruan\",\"Chentao Ye\",\"Rahul Gupta\",\"Diyi Yang\",\"Yu Su\",\"Huan Sun\"]","published":"2026-06-01T17:45:39Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[]","has_code":false}
