{"ID":2921672,"CreatedAt":"2026-06-02T02:42:49.606572591Z","UpdatedAt":"2026-06-03T05:56:00.181519634Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.01151","arxiv_id":"2606.01151","title":"Lagrangian Perturbation Diffusion Steering: Latent Reinforcement Learning for Generative Policies","abstract":"Behavior cloning with high-capacity generative policies achieves strong imitation performance, but is often limited by demonstration coverage and distribution shift. Direct reinforcement learning fine-tuning can improve performance, but updating large action decoders is frequently unstable and sample inefficient. We propose Lagrangian Perturbation Diffusion Steering (LP-DS), a lightweight adaptation method that improves a frozen generative policy by learning a compact noise-space perturbation before decoding. LP-DS optimizes this perturbation with a Lagrangian trust-region objective, improving downstream value while constraining deviation from the latent prior. Across RoboMimic manipulation, OpenAI Gym locomotion, and Adroit dexterous manipulation benchmarks, LP-DS improves sample efficiency, success, and return while maintaining higher action-space entropy than unconstrained noise-space steering, with return improvements of up to 25% over prior baselines. Additional evaluations with flow-matching backbones, a large vision-language-action model, and physical Franka deployment show that LP-DS is not limited to compact diffusion policies or simulated benchmarks. Project page: https://sites.google.com/view/lp-ds/home.","short_abstract":"Behavior cloning with high-capacity generative policies achieves strong imitation performance, but is often limited by demonstration coverage and distribution shift. Direct reinforcement learning fine-tuning can improve performance, but updating large action decoders is frequently unstable and sample inefficient. We pr...","url_abs":"https://arxiv.org/abs/2606.01151","url_pdf":"https://arxiv.org/pdf/2606.01151v1","authors":"[\"Hikmet Simsir\",\"Ozgur S. Oguz\"]","published":"2026-05-31T10:40:28Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Reinforcement Learning\",\"Diffusion Model\"]","project_urls":"[\"https://sites.google.com/view/lp-ds/home\"]","has_code":false,"code_links":[{"ID":612592,"CreatedAt":"2026-06-02T02:42:49.606572591Z","UpdatedAt":"2026-06-02T02:42:49.606572591Z","DeletedAt":null,"paper_id":2921672,"paper_url":"https://arxiv.org/abs/2606.01151","paper_title":"Lagrangian Perturbation Diffusion Steering: Latent Reinforcement Learning for Generative Policies","repo_url":"https://github.com/google/safevalues","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0},{"ID":612593,"CreatedAt":"2026-06-02T02:42:49.606572591Z","UpdatedAt":"2026-06-02T02:42:49.606572591Z","DeletedAt":null,"paper_id":2921672,"paper_url":"https://arxiv.org/abs/2606.01151","paper_title":"Lagrangian Perturbation Diffusion Steering: Latent Reinforcement Learning for Generative Policies","repo_url":"https://github.com/MarcToussaint/robotic","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
