{"ID":3005044,"CreatedAt":"2026-06-03T03:09:48.883664427Z","UpdatedAt":"2026-06-05T07:32:30.50480903Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.03271","arxiv_id":"2606.03271","title":"Agentic Relationship Harm: Benchmarking and Gating Relational Manipulation in AI Agents","abstract":"AI agents built on large language models can assist not only legitimate tasks but also relational manipulation. AI agents can be used to help a user maintain a deceptive identity, intensify emotional dependency, isolate a target, or prepare for later extraction. We conceptualise this risk as agentic relationship harm: workflow-level assistance that can exploit recipient vulnerability, persuasive influence, and relational power asymmetry. Existing safety evaluations and generic guardrails often treat harmfulness as a property of isolated outputs, missing role-sensitive interaction patterns. To study this, we introduce a 110-prompt benchmark with balanced attacker- and victim-side cases, a relationship-specific labelling framework, and a lightweight post-generation policy gate for local agent deployments. In our evaluation, the relationship-specific gate outperforms generic safety prompting under automated judging, with no judge-identified harmful-compliance cases on the main benchmark or multi-turn stress test while preserving victim-side protective intervention. These results suggest that relationship harm is a distinct sociotechnical risk surface and that role-sensitive evaluation plus lightweight policy gating offers a practical path beyond generic refusal prompting.","short_abstract":"AI agents built on large language models can assist not only legitimate tasks but also relational manipulation. AI agents can be used to help a user maintain a deceptive identity, intensify emotional dependency, isolate a target, or prepare for later extraction. We conceptualise this risk as agentic relationship harm:...","url_abs":"https://arxiv.org/abs/2606.03271","url_pdf":"https://arxiv.org/pdf/2606.03271v1","authors":"[\"Pei-Sze Tan\",\"Tasuku Igarashi\",\"Isao Echizen\"]","published":"2026-06-02T07:36:50Z","proceeding":"cs.HC","tasks":"[\"cs.HC\"]","methods":"[\"Language Model\"]","has_code":false}
