{"ID":3005723,"CreatedAt":"2026-06-03T03:09:48.883664427Z","UpdatedAt":"2026-06-04T07:41:34.29888543Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.02668","arxiv_id":"2606.02668","title":"What You Approve Is What Executes: Consent Integrity for Black-Box LLM Agents","abstract":"Coding agents gate consequential actions behind a human-in-the-loop approval dialog, but the dialog is narrated by the agent itself: the human approves a summary the agent writes. The Lies-in-the-Loop (LITL) attack shows that summary is forgeable, so a compromised agent can show a benign description while a different action runs. This paper names the missing property, Consent Integrity, by importing What You See Is What You Sign (WYSIWYS) and the trusted-path property into the agent approval channel: the action shown to the human must be rendered by a trusted mediator from the real action at the boundary, not the agent's narration, over a path the agent cannot spoof, and bound to the exact action that executes. Two twists distinguish it from classical WYSIWYS: the renderer is the adversary, and the boundary ground truth is a low-level event that must be decoded without trusting the agent. Since no decoder is complete, the realizable target is analyzer-relative: whatever the analyzer cannot classify is surfaced as uninspectable rather than silently approved. A prototype implements the analyzer, renderer, and bind-to-execution; total mediation and the trusted path are specified but assumed, not implemented. On GTFOBins, an independent corpus of 1330 trusted-tool abuses, the prototype silently passes 10.0% (every instance through a trusted tool); on tldr, 28,798 normal-usage commands, it marks 87.0% uninspectable. These two independent measurements bracket the design's central tension: the trust list that bounds silent passes is the same one that drives over-prompting, and a boundary-only mediator can move along that frontier but not escape it. The contribution is the property, the mechanism, and an honest position on that frontier, not a solved defense.","short_abstract":"Coding agents gate consequential actions behind a human-in-the-loop approval dialog, but the dialog is narrated by the agent itself: the human approves a summary the agent writes. The Lies-in-the-Loop (LITL) attack shows that summary is forgeable, so a compromised agent can show a benign description while a different a...","url_abs":"https://arxiv.org/abs/2606.02668","url_pdf":"https://arxiv.org/pdf/2606.02668v1","authors":"[\"Xiaoqi Weng\"]","published":"2026-06-01T11:08:17Z","proceeding":"cs.CR","tasks":"[\"cs.CR\",\"cs.HC\"]","methods":"[\"Large Language Model\"]","has_code":false}
