{"ID":2848340,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.25110","arxiv_id":"2510.25110","title":"DEBATE: A Large-Scale Benchmark for Evaluating Opinion Dynamics in Role-Playing LLM Agents","abstract":"Accurately modeling opinion change through social interactions is crucial for understanding and mitigating polarization, misinformation, and societal conflict. Recent work simulates opinion dynamics with role-playing LLM agents (RPLAs), but multi-agent simulations often display unnatural group behavior, such as premature convergence, and lack empirical benchmarks for assessing alignment with real human group interactions. We introduce DEBATE, a large-scale benchmark for evaluating the authenticity of opinion dynamics in multi-agent RPLA simulations. DEBATE contains multi-round public messages and private Likert-scale beliefs from U.S.-based participants across 107 topics; the cleaned benchmark used in our experiments contains 2,788 participants in 697 groups, enabling evaluation at the utterance and group levels and supporting future individual-level analyses. We instantiate \"digital twin\" RPLAs with seven LLMs and evaluate across two settings: next-message prediction and full dynamics simulation, using stance-based opinion-dynamics metrics. In zero-shot settings, RPLA groups exhibit strong opinion convergence relative to human groups. On the held-out group split, supervised fine-tuning (SFT) for Llama-3.1-8B-Instruct improves auxiliary stance alignment and reduces group-level convergence error, though discrepancies in opinion change and belief updating remain. DEBATE enables rigorous benchmarking of simulated opinion dynamics and supports future research on aligning multi-agent RPLAs with realistic human interactions.","short_abstract":"Accurately modeling opinion change through social interactions is crucial for understanding and mitigating polarization, misinformation, and societal conflict. Recent work simulates opinion dynamics with role-playing LLM agents (RPLAs), but multi-agent simulations often display unnatural group behavior, such as prematu...","url_abs":"https://arxiv.org/abs/2510.25110","url_pdf":"https://arxiv.org/pdf/2510.25110v6","authors":"[\"Yun-Shiuan Chuang\",\"Ruixuan Tu\",\"Chengtao Dai\",\"You Li\",\"Smit Vasani\",\"Binwei Yao\",\"Michael Henry Tessler\",\"Sijia Yang\",\"Dhavan Shah\",\"Robert Hawkins\",\"Junjie Hu\",\"Timothy T. Rogers\"]","published":"2025-10-29T02:21:10Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Large Language Model\"]","has_code":false}