{"ID":2886490,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.03645","arxiv_id":"2508.03645","title":"DiWA: Diffusion Policy Adaptation with World Models","abstract":"Fine-tuning diffusion policies with reinforcement learning (RL) presents significant challenges. The long denoising sequence for each action prediction impedes effective reward propagation. Moreover, standard RL methods require millions of real-world interactions, posing a major bottleneck for practical fine-tuning. Although prior work frames the denoising process in diffusion policies as a Markov Decision Process to enable RL-based updates, its strong dependence on environment interaction remains highly inefficient. To bridge this gap, we introduce DiWA, a novel framework that leverages a world model for fine-tuning diffusion-based robotic skills entirely offline with reinforcement learning. Unlike model-free approaches that require millions of environment interactions to fine-tune a repertoire of robot skills, DiWA achieves effective adaptation using a world model trained once on a few hundred thousand offline play interactions. This results in dramatically improved sample efficiency, making the approach significantly more practical and safer for real-world robot learning. On the challenging CALVIN benchmark, DiWA improves performance across eight tasks using only offline adaptation, while requiring orders of magnitude fewer physical interactions than model-free baselines. To our knowledge, this is the first demonstration of fine-tuning diffusion policies for real-world robotic skills using an offline world model. We make the code publicly available at https://diwa.cs.uni-freiburg.de.","short_abstract":"Fine-tuning diffusion policies with reinforcement learning (RL) presents significant challenges. The long denoising sequence for each action prediction impedes effective reward propagation. Moreover, standard RL methods require millions of real-world interactions, posing a major bottleneck for practical fine-tuning. Al...","url_abs":"https://arxiv.org/abs/2508.03645","url_pdf":"https://arxiv.org/pdf/2508.03645v1","authors":"[\"Akshay L Chandra\",\"Iman Nematollahi\",\"Chenguang Huang\",\"Tim Welschehold\",\"Wolfram Burgard\",\"Abhinav Valada\"]","published":"2025-08-05T16:55:50Z","proceeding":"cs.RO","tasks":"[\"cs.RO\",\"cs.CV\",\"cs.LG\"]","methods":"[\"Reinforcement Learning\",\"Diffusion Model\"]","project_urls":"[\"https://diwa.cs.uni-freiburg.de\"]","has_code":false,"code_links":[{"ID":611314,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2886490,"paper_url":"https://arxiv.org/abs/2508.03645","paper_title":"DiWA: Diffusion Policy Adaptation with World Models","repo_url":"https://github.com/robot-learning-freiburg/CURB-SG","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0},{"ID":611315,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2886490,"paper_url":"https://arxiv.org/abs/2508.03645","paper_title":"DiWA: Diffusion Policy Adaptation with World Models","repo_url":"https://github.com/acl21/diwa","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}