{"ID":2826552,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.18733","arxiv_id":"2512.18733","title":"Explainable and Fine-Grained Safeguarding of LLM Multi-Agent Systems via Bi-Level Graph Anomaly Detection","abstract":"Large language model (LLM)-based multi-agent systems (MAS) have shown strong capabilities in solving complex tasks. As MAS become increasingly autonomous in various safety-critical tasks, detecting malicious agents has become a critical security concern. Although existing graph anomaly detection (GAD)-based defenses can identify anomalous agents, they mainly rely on coarse sentence-level information and overlook fine-grained lexical cues, leading to suboptimal performance. Moreover, the lack of interpretability in these methods limits their reliability and real-world applicability. To address these limitations, we propose XG-Guard, an explainable and fine-grained safeguarding framework for detecting malicious agents in MAS. To incorporate both coarse and fine-grained textual information for anomalous agent identification, we utilize a bi-level agent encoder to jointly model the sentence- and token-level representations of each agent. A theme-based anomaly detector further captures the evolving discussion focus in MAS dialogues, while a bi-level score fusion mechanism quantifies token-level contributions for explanation. Extensive experiments across diverse MAS topologies and attack scenarios demonstrate robust detection performance and strong interpretability of XG-Guard.","short_abstract":"Large language model (LLM)-based multi-agent systems (MAS) have shown strong capabilities in solving complex tasks. As MAS become increasingly autonomous in various safety-critical tasks, detecting malicious agents has become a critical security concern. Although existing graph anomaly detection (GAD)-based defenses ca...","url_abs":"https://arxiv.org/abs/2512.18733","url_pdf":"https://arxiv.org/pdf/2512.18733v1","authors":"[\"Junjun Pan\",\"Yixin Liu\",\"Rui Miao\",\"Kaize Ding\",\"Yu Zheng\",\"Quoc Viet Hung Nguyen\",\"Alan Wee-Chung Liew\",\"Shirui Pan\"]","published":"2025-12-21T13:46:36Z","proceeding":"cs.CR","tasks":"[\"cs.CR\",\"cs.AI\",\"cs.MA\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
