{"ID":2874704,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.04372","arxiv_id":"2509.04372","title":"Connections between reinforcement learning with feedback,test-time scaling, and diffusion guidance: An anthology","abstract":"In this note, we reflect on several fundamental connections among widely used post-training techniques. We clarify some intimate connections and equivalences between reinforcement learning with human feedback, reinforcement learning with internal feedback, and test-time scaling (particularly soft best-of-$N$ sampling), while also illuminating intrinsic links between diffusion guidance and test-time scaling. Additionally, we introduce a resampling approach for alignment and reward-directed diffusion models, sidestepping the need for explicit reinforcement learning techniques.","short_abstract":"In this note, we reflect on several fundamental connections among widely used post-training techniques. We clarify some intimate connections and equivalences between reinforcement learning with human feedback, reinforcement learning with internal feedback, and test-time scaling (particularly soft best-of-$N$ sampling),...","url_abs":"https://arxiv.org/abs/2509.04372","url_pdf":"https://arxiv.org/pdf/2509.04372v1","authors":"[\"Yuchen Jiao\",\"Yuxin Chen\",\"Gen Li\"]","published":"2025-09-04T16:29:38Z","proceeding":"stat.ML","tasks":"[\"stat.ML\",\"cs.GL\",\"cs.LG\",\"math.ST\"]","methods":"[\"Reinforcement Learning\",\"Diffusion Model\"]","has_code":false}
