{"ID":2834183,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.03211","arxiv_id":"2512.03211","title":"A Multi-Agent, Policy-Gradient approach to Network Routing","abstract":"Network routing is a distributed decision problem which naturally admits numerical performance measures, such as the average time for a packet to travel from source to destination. OLPOMDP, a policy-gradient reinforcement learning algorithm, was successfully applied to simulated network routing under a number of network models. Multiple distributed agents (routers) learned co-operative behavior without explicit inter-agent communication, and they avoided behavior which was individually desirable, but detrimental to the group's overall performance. Furthermore, shaping the reward signal by explicitly penalizing certain patterns of sub-optimal behavior was found to dramatically improve the convergence rate.","short_abstract":"Network routing is a distributed decision problem which naturally admits numerical performance measures, such as the average time for a packet to travel from source to destination. OLPOMDP, a policy-gradient reinforcement learning algorithm, was successfully applied to simulated network routing under a number of networ...","url_abs":"https://arxiv.org/abs/2512.03211","url_pdf":"https://arxiv.org/pdf/2512.03211v1","authors":"[\"Nigel Tao\",\"Jonathan Baxter\",\"Lex Weaver\"]","published":"2025-12-02T20:31:01Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.NI\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}