{"ID":3006451,"CreatedAt":"2026-06-03T03:09:48.883664427Z","UpdatedAt":"2026-06-03T05:56:00.181519634Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.02646","arxiv_id":"2606.02646","title":"The Ringelmann Effect in Multi-Agent LLM Systems: A Scaling Law for Effective Team Size","abstract":"Inference-time multi-agent LLM scaling lacks a shared unit: counting nominal agents conflates cost with independent evidence. We derive a two-parameter scaling law $R(N) = N_\\text{eff}/N = 1/(1+c(N-1)N^{-β})$ where the regime exponent $β$ classifies any configuration into one of three asymptotic regimes -- hard-ceiling at $1/c$ ($β= 0$), sublinear at $N^β/c$ ($0 \u003c β\u003c 1$), or linear ($β\\ge 1$), and a mean-field theorem predicts that peer count $k$ and rounds $τ$ during agent debate enter the dynamics only through their product $kτ$. The law applies at two levels: answer diversity and correctness redundancy. Across 44 (model $\\times$ task $\\times$ condition) cells spanning peer debate, self-correction, random-noise placebo, self-consistency, three open-weight families (Qwen, Llama, Ministral) at scales from 7B to 32B with a frontier API check (Gemini), thinking models, heterogeneous teams, and sparse communication, the functional form fits every condition at $R^2 \u003e 0.99$; only $(c, β)$ shifts. On free-form math, dense peer influence collapses the answer-level regime from sublinear into hard-ceiling; correctness-level fits remain hard-ceiling throughout. Three findings have practical implications. \\emph{(i)}~Thirty dense debating agents produce no more answer diversity than one on MMLU-Hard. \\emph{(ii)}~A noise placebo tracks self-correction on free-form math and at $4\\times$ scale, so within homogeneous teams the gain commonly attributed to ``debate'' comes from re-evaluation, not peer content. \\emph{(iii)}~A single $N \\le 5$ pilot predicts the $N=30$ structural ceiling, and within the configurations tested only architectural diversity (heterogeneous teams) lowers $c$ and escapes the hard-ceiling regime, communication-mode interventions do not.","short_abstract":"Inference-time multi-agent LLM scaling lacks a shared unit: counting nominal agents conflates cost with independent evidence. We derive a two-parameter scaling law $R(N) = N_\\text{eff}/N = 1/(1+c(N-1)N^{-β})$ where the regime exponent $β$ classifies any configuration into one of three asymptotic regimes -- hard-ceiling...","url_abs":"https://arxiv.org/abs/2606.02646","url_pdf":"https://arxiv.org/pdf/2606.02646v1","authors":"[\"Blaž Bertalanič\",\"Carolina Fortuna\"]","published":"2026-05-31T16:19:54Z","proceeding":"physics.soc-ph","tasks":"[\"physics.soc-ph\",\"cs.AI\",\"cs.MA\"]","methods":"[\"Large Language Model\"]","has_code":false}
