{"ID":2861156,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.03508","arxiv_id":"2510.03508","title":"D2 Actor Critic: Diffusion Actor Meets Distributional Critic","abstract":"We introduce D2AC, a new model-free reinforcement learning (RL) algorithm designed to train expressive diffusion policies online effectively. At its core is a policy improvement objective that avoids the high variance of typical policy gradients and the complexity of backpropagation through time. This stable learning process is critically enabled by our second contribution: a robust distributional critic, which we design through a fusion of distributional RL and clipped double Q-learning. The resulting algorithm is highly effective, achieving state-of-the-art performance on a benchmark of eighteen hard RL tasks, including Humanoid, Dog, and Shadow Hand domains, spanning both dense-reward and goal-conditioned RL scenarios. Beyond standard benchmarks, we also evaluate a biologically motivated predator-prey task to examine the behavioral robustness and generalization capacity of our approach. Code: https://github.com/d2ac-actor-critic/d2ac-public","short_abstract":"We introduce D2AC, a new model-free reinforcement learning (RL) algorithm designed to train expressive diffusion policies online effectively. At its core is a policy improvement objective that avoids the high variance of typical policy gradients and the complexity of backpropagation through time. This stable learning p...","url_abs":"https://arxiv.org/abs/2510.03508","url_pdf":"https://arxiv.org/pdf/2510.03508v3","authors":"[\"Lunjun Zhang\",\"Shuo Han\",\"Hanrui Lyu\",\"Bradly C Stadie\"]","published":"2025-10-03T20:47:24Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Reinforcement Learning\",\"Diffusion Model\"]","has_code":false,"code_links":[{"ID":608795,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2861156,"paper_url":"https://arxiv.org/abs/2510.03508","paper_title":"D2 Actor Critic: Diffusion Actor Meets Distributional Critic","repo_url":"https://github.com/d2ac-actor-critic/d2ac-public","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
