{"ID":2887144,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.02912","arxiv_id":"2508.02912","title":"Communicating Plans, Not Percepts: Scalable Multi-Agent Coordination with Embodied World Models","abstract":"Robust coordination is critical for effective decision-making in multi-agent systems, especially under partial observability. A central question in Multi-Agent Reinforcement Learning (MARL) is whether to engineer communication protocols or learn them end-to-end. We investigate this dichotomy using embodied world models. We propose and compare two communication strategies for a cooperative task-allocation problem. The first, Learned Direct Communication (LDC), learns a protocol end-to-end. The second, Intention Communication, uses an engineered inductive bias: a compact, learned world model, the Imagined Trajectory Generation Module (ITGM), which uses the agent's own policy to simulate future states. A Message Generation Network (MGN) then compresses this plan into a message. We evaluate these approaches on goal-directed interaction in a grid world, a canonical abstraction for embodied AI problems, while scaling environmental complexity. Our experiments reveal that while emergent communication is viable in simple settings, the engineered, world model-based approach shows superior performance, sample efficiency, and scalability as complexity increases. These findings advocate for integrating structured, predictive models into MARL agents to enable active, goal-driven coordination.","short_abstract":"Robust coordination is critical for effective decision-making in multi-agent systems, especially under partial observability. A central question in Multi-Agent Reinforcement Learning (MARL) is whether to engineer communication protocols or learn them end-to-end. We investigate this dichotomy using embodied world models...","url_abs":"https://arxiv.org/abs/2508.02912","url_pdf":"https://arxiv.org/pdf/2508.02912v4","authors":"[\"Brennen A. Hill\",\"Mant Koh En Wei\",\"Thangavel Jishnuanandh\"]","published":"2025-08-04T21:29:07Z","proceeding":"cs.MA","tasks":"[\"cs.MA\",\"cs.AI\",\"cs.LG\",\"eess.SY\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
