{"ID":2832943,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.04691","arxiv_id":"2512.04691","title":"Towards Ethical Multi-Agent Systems of Large Language Models: A Mechanistic Interpretability Perspective","abstract":"Large language models (LLMs) have been widely deployed in various applications, often functioning as autonomous agents that interact with each other in multi-agent systems. While these systems have shown promise in enhancing capabilities and enabling complex tasks, they also pose significant ethical challenges. This position paper outlines a research agenda aimed at ensuring the ethical behavior of multi-agent systems of LLMs (MALMs) from the perspective of mechanistic interpretability. We identify three key research challenges: (i) developing comprehensive evaluation frameworks to assess ethical behavior at individual, interactional, and systemic levels; (ii) elucidating the internal mechanisms that give rise to emergent behaviors through mechanistic interpretability; and (iii) implementing targeted parameter-efficient alignment techniques to steer MALMs towards ethical behaviors without compromising their performance.","short_abstract":"Large language models (LLMs) have been widely deployed in various applications, often functioning as autonomous agents that interact with each other in multi-agent systems. While these systems have shown promise in enhancing capabilities and enabling complex tasks, they also pose significant ethical challenges. This po...","url_abs":"https://arxiv.org/abs/2512.04691","url_pdf":"https://arxiv.org/pdf/2512.04691v1","authors":"[\"Jae Hee Lee\",\"Anne Lauscher\",\"Stefano V. Albrecht\"]","published":"2025-12-04T11:41:44Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CL\",\"cs.MA\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
