{"ID":2840502,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.13186","arxiv_id":"2511.13186","title":"DiffFP: Learning Behaviors from Scratch via Diffusion-based Fictitious Play","abstract":"Self-play reinforcement learning has demonstrated significant success in learning complex strategic and interactive behaviors in competitive multi-agent games. However, achieving such behaviors in continuous decision spaces remains challenging. Ensuring adaptability and generalization in self-play settings is critical for achieving competitive performance in dynamic multi-agent environments. These challenges often cause methods to converge slowly or fail to converge at all to a Nash equilibrium, making agents vulnerable to strategic exploitation by unseen opponents. To address these challenges, we propose DiffFP, a fictitious play (FP) framework that estimates the best response to unseen opponents while learning a robust and multimodal behavioral policy. Specifically, we approximate the best response using a diffusion policy that leverages generative modeling to learn adaptive and diverse strategies. Through empirical evaluation, we demonstrate that the proposed FP framework converges towards $ε$-Nash equilibria in continuous- space zero-sum games. We validate our method on complex multi-agent environments, including racing and multi-particle zero-sum games. Simulation results show that the learned policies are robust against diverse opponents and outperform baseline reinforcement learning policies. Our approach achieves up to 3$\\times$ faster convergence and 30$\\times$ higher success rates on average against RL-based baselines, demonstrating its robustness to opponent strategies and stability across training iterations","short_abstract":"Self-play reinforcement learning has demonstrated significant success in learning complex strategic and interactive behaviors in competitive multi-agent games. However, achieving such behaviors in continuous decision spaces remains challenging. Ensuring adaptability and generalization in self-play settings is critical...","url_abs":"https://arxiv.org/abs/2511.13186","url_pdf":"https://arxiv.org/pdf/2511.13186v1","authors":"[\"Akash Karthikeyan\",\"Yash Vardhan Pant\"]","published":"2025-11-17T09:48:29Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"eess.SY\"]","methods":"[\"Reinforcement Learning\",\"Diffusion Model\"]","has_code":false}
