{"ID":2832880,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.04559","arxiv_id":"2512.04559","title":"Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function","abstract":"Diffusion models excel at generating high-likelihood samples but often require alignment with downstream objectives. Existing fine-tuning methods for diffusion models significantly suffer from reward over-optimization, resulting in high-reward but unnatural samples and degraded diversity. To mitigate over-optimization, we propose Soft Q-based Diffusion Finetuning (SQDF), a novel KL-regularized RL method for diffusion alignment that applies a reparameterized policy gradient of a training-free, differentiable estimation of the soft Q-function. SQDF is further enhanced with three innovations: a discount factor for proper credit assignment in the denoising process, the integration of consistency models to refine Q-function estimates, and the use of an off-policy replay buffer to improve mode coverage and manage the reward-diversity trade-off. Our experiments demonstrate that SQDF achieves superior target rewards while preserving diversity in text-to-image alignment. Furthermore, in online black-box optimization, SQDF attains high sample efficiency while maintaining naturalness and diversity. Our code is available at https://github.com/Shin-woocheol/SQDF.","short_abstract":"Diffusion models excel at generating high-likelihood samples but often require alignment with downstream objectives. Existing fine-tuning methods for diffusion models significantly suffer from reward over-optimization, resulting in high-reward but unnatural samples and degraded diversity. To mitigate over-optimization,...","url_abs":"https://arxiv.org/abs/2512.04559","url_pdf":"https://arxiv.org/pdf/2512.04559v3","authors":"[\"Hyeongyu Kang\",\"Jaewoo Lee\",\"Woocheol Shin\",\"Kiyoung Om\",\"Jinkyoo Park\"]","published":"2025-12-04T08:21:52Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Diffusion Model\"]","has_code":false,"code_links":[{"ID":606284,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2832880,"paper_url":"https://arxiv.org/abs/2512.04559","paper_title":"Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function","repo_url":"https://github.com/Shin-woocheol/SQDF","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}