{"ID":2832826,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.04463","arxiv_id":"2512.04463","title":"MARL Warehouse Robots","abstract":"We present a comparative study of multi-agent reinforcement learning (MARL) algorithms for cooperative warehouse robotics. We evaluate QMIX and IPPO on the Robotic Warehouse (RWARE) environment and a custom Unity 3D simulation. Our experiments reveal that QMIX's value decomposition significantly outperforms independent learning approaches (achieving 3.25 mean return vs. 0.38 for advanced IPPO), but requires extensive hyperparameter tuning -- particularly extended epsilon annealing (5M+ steps) for sparse reward discovery. We demonstrate successful deployment in Unity ML-Agents, achieving consistent package delivery after 1M training steps. While MARL shows promise for small-scale deployments (2-4 robots), significant scaling challenges remain. Code and analyses: https://pallman14.github.io/MARL-QMIX-Warehouse-Robots/","short_abstract":"We present a comparative study of multi-agent reinforcement learning (MARL) algorithms for cooperative warehouse robotics. We evaluate QMIX and IPPO on the Robotic Warehouse (RWARE) environment and a custom Unity 3D simulation. Our experiments reveal that QMIX's value decomposition significantly outperforms independent...","url_abs":"https://arxiv.org/abs/2512.04463","url_pdf":"https://arxiv.org/pdf/2512.04463v2","authors":"[\"Price Allman\",\"Lian Thang\",\"Dre Simmons\",\"Salmon Riaz\"]","published":"2025-12-04T05:11:36Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.RO\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\"]","has_code":false}