{"ID":2849914,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.22477","arxiv_id":"2510.22477","title":"Agent-GSPO: Communication-Efficient Multi-Agent Systems via Group Sequence Policy Optimization","abstract":"To combat the prohibitive communication costs of ``free-for-all\" multi-agent systems (MAS), we introduce \\textbf{Agent-GSPO}, a framework that directly optimizes for token economy using sequence-level reinforcement learning. Agent-GSPO leverages the stable and memory-efficient Group Sequence Policy Optimization (GSPO) algorithm to train agents on a communication-aware reward that explicitly penalizes verbosity. Across seven reasoning benchmarks, Agent-GSPO not only achieves new state-of-the-art performance but does so with a fraction of the token consumption of existing methods. By fostering emergent strategies like ``strategic silence,\" our approach provides a practical blueprint for developing scalable and economically viable multi-agent systems.","short_abstract":"To combat the prohibitive communication costs of ``free-for-all\" multi-agent systems (MAS), we introduce \\textbf{Agent-GSPO}, a framework that directly optimizes for token economy using sequence-level reinforcement learning. Agent-GSPO leverages the stable and memory-efficient Group Sequence Policy Optimization (GSPO)...","url_abs":"https://arxiv.org/abs/2510.22477","url_pdf":"https://arxiv.org/pdf/2510.22477v1","authors":"[\"Yijia Fan\",\"Jusheng Zhang\",\"Jing Yang\",\"Keze Wang\"]","published":"2025-10-26T01:27:13Z","proceeding":"cs.MA","tasks":"[\"cs.MA\",\"cs.AI\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}