{"ID":2860728,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.02654","arxiv_id":"2510.02654","title":"Smart-GRPO: Smartly Sampling Noise for Efficient RL of Flow-Matching Models","abstract":"Recent advancements in flow-matching have enabled high-quality text-to-image generation. However, the deterministic nature of flow-matching models makes them poorly suited for reinforcement learning, a key tool for improving image quality and human alignment. Prior work has introduced stochasticity by perturbing latents with random noise, but such perturbations are inefficient and unstable. We propose Smart-GRPO, the first method to optimize noise perturbations for reinforcement learning in flow-matching models. Smart-GRPO employs an iterative search strategy that decodes candidate perturbations, evaluates them with a reward function, and refines the noise distribution toward higher-reward regions. Experiments demonstrate that Smart-GRPO improves both reward optimization and visual quality compared to baseline methods. Our results suggest a practical path toward reinforcement learning in flow-matching frameworks, bridging the gap between efficient training and human-aligned generation.","short_abstract":"Recent advancements in flow-matching have enabled high-quality text-to-image generation. However, the deterministic nature of flow-matching models makes them poorly suited for reinforcement learning, a key tool for improving image quality and human alignment. Prior work has introduced stochasticity by perturbing latent...","url_abs":"https://arxiv.org/abs/2510.02654","url_pdf":"https://arxiv.org/pdf/2510.02654v1","authors":"[\"Benjamin Yu\",\"Jackie Liu\",\"Justin Cui\"]","published":"2025-10-03T01:13:17Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
