{"ID":2828006,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.15347","arxiv_id":"2512.15347","title":"Expand and Prune: Maximizing Trajectory Diversity for Effective GRPO in Generative Models","abstract":"Group Relative Policy Optimization (GRPO) is a powerful technique for aligning generative models, but its effectiveness is bottlenecked by the conflict between large group sizes and prohibitive computational costs. In this work, we investigate the trade-off through empirical studies, yielding two key observations. First, we discover the reward clustering phenomenon in which many trajectories collapse toward the group-mean reward, offering limited optimization value. Second, we design a heuristic strategy named Optimal Variance Filtering (OVF), and verify that a high-variance subset of trajectories, selected by OVF can outperform the larger, unfiltered group. However, this static, post-sampling OVF approach still necessitates critical computational overhead, as it performs unnecessary sampling for trajectories that are ultimately discarded. To resolve this, we propose Pro-GRPO (Proactive GRPO), a novel dynamic framework that integrates latent feature-based trajectory pruning into the sampling process. Through the early termination of reward-clustered trajectories, Pro-GRPO reduces computational overhead. Leveraging its efficiency, Pro-GRPO employs an \"Expand-and-Prune\" strategy. This strategy first expands the size of initial sampling group to maximize trajectory diversity, then it applies multi-step OVF to the latents, avoiding prohibitive computational costs. Extensive experiments on both diffusion-based and flow-based models demonstrate the generality and effectiveness of our Pro-GRPO framework.","short_abstract":"Group Relative Policy Optimization (GRPO) is a powerful technique for aligning generative models, but its effectiveness is bottlenecked by the conflict between large group sizes and prohibitive computational costs. In this work, we investigate the trade-off through empirical studies, yielding two key observations. Firs...","url_abs":"https://arxiv.org/abs/2512.15347","url_pdf":"https://arxiv.org/pdf/2512.15347v1","authors":"[\"Shiran Ge\",\"Chenyi Huang\",\"Yuang Ai\",\"Qihang Fan\",\"Huaibo Huang\",\"Ran He\"]","published":"2025-12-17T11:44:34Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.LG\"]","methods":"[\"Diffusion Model\"]","has_code":false}