{"ID":2867935,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.18371","arxiv_id":"2509.18371","title":"Policy Gradient with Self-Attention for Model-Free Distributed Nonlinear Multi-Agent Games","abstract":"Multi-agent games in dynamic nonlinear settings are challenging due to the time-varying interactions among the agents and the non-stationarity of the (potential) Nash equilibria. In this paper we consider model-free games, where agent transitions and costs are observed without knowledge of the transition and cost functions that generate them. We propose a policy gradient approach to learn distributed policies that follow the communication structure in multi-team games, with multiple agents per team. Our formulation is inspired by the structure of distributed policies in linear quadratic games, which take the form of time-varying linear feedback gains. In the nonlinear case, we model the policies as nonlinear feedback gains, parameterized by self-attention layers to account for the time-varying multi-agent communication topology. We demonstrate that our distributed policy gradient approach achieves strong performance in several settings, including distributed linear and nonlinear regulation, and simulated and real multi-robot pursuit-and-evasion games.","short_abstract":"Multi-agent games in dynamic nonlinear settings are challenging due to the time-varying interactions among the agents and the non-stationarity of the (potential) Nash equilibria. In this paper we consider model-free games, where agent transitions and costs are observed without knowledge of the transition and cost funct...","url_abs":"https://arxiv.org/abs/2509.18371","url_pdf":"https://arxiv.org/pdf/2509.18371v1","authors":"[\"Eduardo Sebastián\",\"Maitrayee Keskar\",\"Eeman Iqbal\",\"Eduardo Montijano\",\"Carlos Sagüés\",\"Nikolay Atanasov\"]","published":"2025-09-22T19:52:16Z","proceeding":"eess.SY","tasks":"[\"eess.SY\",\"cs.MA\",\"cs.RO\"]","methods":"[]","has_code":false}