{"ID":2883181,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.08800","arxiv_id":"2508.08800","title":"Fault Tolerant Multi-Agent Learning with Adversarial Budget Constraints","abstract":"We study robustness to agent malfunctions in cooperative multi-agent reinforcement learning (MARL), a failure mode that is critical in practice yet underexplored in existing theory. We introduce MARTA, a plug-and-play robustness layer that augments standard MARL algorithms with a Switcher-Adversary mechanism which selectively induces malfunctions in performance-critical states. This formulation defines a fault-switching $(N+2)$-player Markov game in which the Switcher chooses when and which agent fails, and the Adversary controls the resulting faulty behaviour via random or worst-case policies. We develop a Q-learning-type scheme and show that the associated Bellman operator is a contraction, yielding existence and uniqueness of the minimax value, convergence to a Markov perfect equilibrium. MARTA integrates seamlessly with MARL algorithms without architectural modification and consistently improves robustness across Traffic Junction (TJ), Level-Based Foraging (LBF), MPE SimpleTag, and SMAC (v2). In these domains, MARTA achieves large gains in final performance of up to 116.7\\% in SMAC, 21.4\\% in MPE SimpleTag, and 44.6\\% in LBF, while significantly reducing failure rates under train-test mismatched fault regimes. These results establish MARTA as a theoretically grounded and practically deployable mechanism for fault-tolerant MARL.","short_abstract":"We study robustness to agent malfunctions in cooperative multi-agent reinforcement learning (MARL), a failure mode that is critical in practice yet underexplored in existing theory. We introduce MARTA, a plug-and-play robustness layer that augments standard MARL algorithms with a Switcher-Adversary mechanism which sele...","url_abs":"https://arxiv.org/abs/2508.08800","url_pdf":"https://arxiv.org/pdf/2508.08800v2","authors":"[\"David Mguni\",\"Yaqi Sun\",\"Haojun Chen\",\"Wanrong Yang\",\"Amir Darabi\",\"Larry Olanrewaju Orimoloye\",\"Yaodong Yang\"]","published":"2025-08-12T09:57:05Z","proceeding":"cs.MA","tasks":"[\"cs.MA\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\"]","has_code":false}