{"ID":3053227,"CreatedAt":"2026-06-04T04:41:36.695875263Z","UpdatedAt":"2026-06-05T19:51:57.508130993Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.04184","arxiv_id":"2606.04184","title":"GroupToM-Bench: Benchmarking Group Theory of Mind and Nonlinear Social Emergence in MLLMs","abstract":"True general intelligence requires not only a model of the physical world but also a social world model: the capacity to infer how individual mental states interact and crystallize into group-level outcomes. Despite notable progress in individual-level Theory of Mind (ToM) reasoning, existing multimodal large language models fail at this broader task. Collective behavior emerges non-linearly from social tensions, conformity dynamics, and structural constraints, meaning it cannot be recovered by merely summing individual intentions. We present GroupToM-Bench, the first multimodal benchmark for group-level ToM, built around a causal chain spanning micro-level BDI states (belief, desire, intention), meso-level group tension and structural constraints, and macro-level outcome prediction and mechanistic attribution. To probe this full arc, we develop a seven-level cognitive audit framework. Experiments reveal a gap between current models and human baselines, highlighting a failure to process social structures and non-linear collective dynamics.","short_abstract":"True general intelligence requires not only a model of the physical world but also a social world model: the capacity to infer how individual mental states interact and crystallize into group-level outcomes. Despite notable progress in individual-level Theory of Mind (ToM) reasoning, existing multimodal large language...","url_abs":"https://arxiv.org/abs/2606.04184","url_pdf":"https://arxiv.org/pdf/2606.04184v1","authors":"[\"Weidong Tang\",\"Jierui Li\",\"Yueling Hou\",\"Zihan Mei\",\"Can Zhang\",\"Xinyan Wan\",\"Zhiyuan Liang\",\"Pengfei Zhou\",\"Yang You\",\"Wangbo Zhao\"]","published":"2026-06-02T20:06:32Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
