{"ID":2864511,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.23023","arxiv_id":"2509.23023","title":"Deceive, Detect, and Disclose: Large Language Models Play Mini-Mafia","abstract":"Large language models are increasingly deployed in multi-agent settings whose outcomes hinge on social intelligence, motivating evaluations of their interactive capabilities; yet existing studies remain overwhelmingly empirical, leaving us without a theoretical understanding of how agent interactions determine collective outcomes. To address this, we introduce \\textit{Mini-Mafia}, a four-player simplification of the social deduction game Mafia in which a fixed night phase reduces the game to a single critical exchange among a mafioso, a detective, and a villager. In this setting, we show that the mafia win-rate $p$ is predicted by the analytical formula $\\text{logit}(p) = v \\times (m - d)$, where $m$, $d$, and $v$ represent the mafioso's deception, the detective's disclosure, and the villager's detection capabilities. We turn this analytical framework into the \\textit{Mini-Mafia Benchmark}, where Bayesian inference over gameplay data yields per-model estimates of the intrinsic parameters $m$, $d$, and $v$. For $I$ models, only $3I$ parameters suffice to predict the outcomes of all $I^3$ tournament combinations; and in 5-fold cross-validation the formula achieves a $76.6\\%$ Brier-score reduction over a random baseline. The benchmark also reveals counterintuitive results: Grok 3 Mini is the strongest detector and GPT-5 Mini the strongest discloser, both ahead of DeepSeek V3.1, Claude Opus 4, and Claude Sonnet 4; while Claude Sonnet 4 is the weakest detector, near random chance. Together, these results show that Mini-Mafia, a simple but nontrivial multi-agent system, admits an analytical description and serves as a principled benchmark for language model interactions.","short_abstract":"Large language models are increasingly deployed in multi-agent settings whose outcomes hinge on social intelligence, motivating evaluations of their interactive capabilities; yet existing studies remain overwhelmingly empirical, leaving us without a theoretical understanding of how agent interactions determine collecti...","url_abs":"https://arxiv.org/abs/2509.23023","url_pdf":"https://arxiv.org/pdf/2509.23023v3","authors":"[\"Davi Bastos Costa\",\"Renato Vicente\"]","published":"2025-09-27T00:40:19Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Language Model\"]","has_code":false}
