{"ID":2846550,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.01310","arxiv_id":"2511.01310","title":"From Pixels to Cooperation Multi Agent Reinforcement Learning based on Multimodal World Models","abstract":"Learning cooperative multi-agent policies directly from high-dimensional, multimodal sensory inputs like pixels and audio (from pixels) is notoriously sample-inefficient. Model-free Multi-Agent Reinforcement Learning (MARL) algorithms struggle with the joint challenge of representation learning, partial observability, and credit assignment. To address this, we propose a novel framework based on a shared, generative Multimodal World Model (MWM). Our MWM is trained to learn a compressed latent representation of the environment's dynamics by fusing distributed, multimodal observations from all agents using a scalable attention-based mechanism. Subsequently, we leverage this learned MWM as a fast, \"imagined\" simulator to train cooperative MARL policies (e.g., MAPPO) entirely within its latent space, decoupling representation learning from policy learning. We introduce a new set of challenging multimodal, multi-agent benchmarks built on a 3D physics simulator. Our experiments demonstrate that our MWM-MARL framework achieves orders-of-magnitude greater sample efficiency compared to state-of-the-art model-free MARL baselines. We further show that our proposed multimodal fusion is essential for task success in environments with sensory asymmetry and that our architecture provides superior robustness to sensor-dropout, a critical feature for real-world deployment.","short_abstract":"Learning cooperative multi-agent policies directly from high-dimensional, multimodal sensory inputs like pixels and audio (from pixels) is notoriously sample-inefficient. Model-free Multi-Agent Reinforcement Learning (MARL) algorithms struggle with the joint challenge of representation learning, partial observability,...","url_abs":"https://arxiv.org/abs/2511.01310","url_pdf":"https://arxiv.org/pdf/2511.01310v2","authors":"[\"Sureyya Akin\",\"Kavita Srivastava\",\"Prateek B. Kapoor\",\"Pradeep G. Sethi\",\"Sunita Q. Patel\",\"Rahu Srivastava\"]","published":"2025-11-03T07:44:56Z","proceeding":"cs.MA","tasks":"[\"cs.MA\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
