{"ID":2873875,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.06235","arxiv_id":"2509.06235","title":"PillagerBench: Benchmarking LLM-Based Agents in Competitive Minecraft Team Environments","abstract":"LLM-based agents have shown promise in various cooperative and strategic reasoning tasks, but their effectiveness in competitive multi-agent environments remains underexplored. To address this gap, we introduce PillagerBench, a novel framework for evaluating multi-agent systems in real-time competitive team-vs-team scenarios in Minecraft. It provides an extensible API, multi-round testing, and rule-based built-in opponents for fair, reproducible comparisons. We also propose TactiCrafter, an LLM-based multi-agent system that facilitates teamwork through human-readable tactics, learns causal dependencies, and adapts to opponent strategies. Our evaluation demonstrates that TactiCrafter outperforms baseline approaches and showcases adaptive learning through self-play. Additionally, we analyze its learning process and strategic evolution over multiple game episodes. To encourage further research, we have open-sourced PillagerBench, fostering advancements in multi-agent AI for competitive environments.","short_abstract":"LLM-based agents have shown promise in various cooperative and strategic reasoning tasks, but their effectiveness in competitive multi-agent environments remains underexplored. To address this gap, we introduce PillagerBench, a novel framework for evaluating multi-agent systems in real-time competitive team-vs-team sce...","url_abs":"https://arxiv.org/abs/2509.06235","url_pdf":"https://arxiv.org/pdf/2509.06235v1","authors":"[\"Olivier Schipper\",\"Yudi Zhang\",\"Yali Du\",\"Mykola Pechenizkiy\",\"Meng Fang\"]","published":"2025-09-07T22:51:12Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.MA\"]","methods":"[\"Large Language Model\"]","has_code":false}