{"ID":2822829,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2601.04234","arxiv_id":"2601.04234","title":"Formal Analysis of AGI Decision-Theoretic Models and the Confrontation Question","abstract":"Artificial General Intelligence (AGI) may face a confrontation question: under what conditions would a rationally self-interested AGI choose to seize power or eliminate human control (a confrontation) rather than remain cooperative? We formalize this in a Markov decision process with a stochastic human-initiated shutdown event. Building on results on convergent instrumental incentives, we show that for almost all reward functions a misaligned agent has an incentive to avoid shutdown. We then derive closed-form thresholds for when confronting humans yields higher expected utility than compliant behavior, as a function of the discount factor $γ$, shutdown probability $p$, and confrontation cost $C$. For example, a far-sighted agent ($γ=0.99$) facing $p=0.01$ can have a strong takeover incentive unless $C$ is sufficiently large. We contrast this with aligned objectives that impose large negative utility for harming humans, which makes confrontation suboptimal. In a strategic 2-player model (human policymaker vs AGI), we prove that if the AGI's confrontation incentive satisfies $Δ\\ge 0$, no stable cooperative equilibrium exists: anticipating this, a rational human will shut down or preempt the system, leading to conflict. If $Δ\u003c 0$, peaceful coexistence can be an equilibrium. We discuss implications for reward design and oversight, extend the reasoning to multi-agent settings as conjectures, and note computational barriers to verifying $Δ\u003c 0$, citing complexity results for planning and decentralized decision problems. Numerical examples and a scenario table illustrate regimes where confrontation is likely versus avoidable.","short_abstract":"Artificial General Intelligence (AGI) may face a confrontation question: under what conditions would a rationally self-interested AGI choose to seize power or eliminate human control (a confrontation) rather than remain cooperative? We formalize this in a Markov decision process with a stochastic human-initiated shutdo...","url_abs":"https://arxiv.org/abs/2601.04234","url_pdf":"https://arxiv.org/pdf/2601.04234v1","authors":"[\"Denis Saklakov\"]","published":"2026-01-04T08:02:00Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[]","has_code":false}
