{"ID":3083853,"CreatedAt":"2026-06-05T06:46:15.197025399Z","UpdatedAt":"2026-06-07T05:32:54.120957816Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.05800","arxiv_id":"2606.05800","title":"SALT: When More Rollouts Don't Help in Group-Based Policy Optimization and How to Make Them Matter","abstract":"Reinforcement learning with verifiable rewards (RLVR) often adopts GRPO-style group-relative updates, sampling multiple rollouts per prompt to construct normalized learning signals. However, merely increasing the number of rollouts does not reliably strengthen learning: under GRPO-style group normalization, per-rollout policy-gradient features can concentrate into a low-rank, signed geometry, causing substantial cancellation during aggregation and weakening the effective update. We address this failure mode with SALT, a Subspace-Adaptive geometry pLug-in componenT that uses sample-wise gradient geometry to reweight the coefficients of group-relative updates. SALT estimates a dominant shared subspace from the mini-batch Gram geometry, decomposes group-relative coefficients into shared and residual channels, and adaptively amplifies the residual channel when signed cancellation is severe. Across diverse reasoning-oriented RLVR benchmarks and model scales, SALT improves effective update geometry and performance without modifying the reward model or the rollout sampling procedure","short_abstract":"Reinforcement learning with verifiable rewards (RLVR) often adopts GRPO-style group-relative updates, sampling multiple rollouts per prompt to construct normalized learning signals. However, merely increasing the number of rollouts does not reliably strengthen learning: under GRPO-style group normalization, per-rollout...","url_abs":"https://arxiv.org/abs/2606.05800","url_pdf":"https://arxiv.org/pdf/2606.05800v1","authors":"[\"Powei Chang\",\"Jinpeng Zhang\",\"Chaoqun Sun\",\"MiniWell Tsao\",\"Lianrui Li\",\"Jianxiang Xiang\",\"Chenyu Wang\",\"Yukang Gao\",\"Dongying Kong\"]","published":"2026-06-04T07:29:43Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
