{"ID":2843045,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.09792","arxiv_id":"2511.09792","title":"Beyond Monotonicity: Revisiting Factorization Principles in Multi-Agent Q-Learning","abstract":"Value decomposition is a central approach in multi-agent reinforcement learning (MARL), enabling centralized training with decentralized execution by factorizing the global value function into local values. To ensure individual-global-max (IGM) consistency, existing methods either enforce monotonicity constraints, which limit expressive power, or adopt softer surrogates at the cost of algorithmic complexity. In this work, we present a dynamical systems analysis of non-monotonic value decomposition, modeling learning dynamics as continuous-time gradient flow. We prove that, under approximately greedy exploration, all zero-loss equilibria violating IGM consistency are unstable saddle points, while only IGM-consistent solutions are stable attractors of the learning dynamics. Extensive experiments on both synthetic matrix games and challenging MARL benchmarks demonstrate that unconstrained, non-monotonic factorization reliably recovers IGM-optimal solutions and consistently outperforms monotonic baselines. Additionally, we investigate the influence of temporal-difference targets and exploration strategies, providing actionable insights for the design of future value-based MARL algorithms.","short_abstract":"Value decomposition is a central approach in multi-agent reinforcement learning (MARL), enabling centralized training with decentralized execution by factorizing the global value function into local values. To ensure individual-global-max (IGM) consistency, existing methods either enforce monotonicity constraints, whic...","url_abs":"https://arxiv.org/abs/2511.09792","url_pdf":"https://arxiv.org/pdf/2511.09792v1","authors":"[\"Tianmeng Hu\",\"Yongzheng Cui\",\"Rui Tang\",\"Biao Luo\",\"Ke Li\"]","published":"2025-11-12T22:49:35Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.MA\"]","methods":"[\"Reinforcement Learning\",\"LoRA\"]","has_code":false}
