{"ID":2849539,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.23142","arxiv_id":"2510.23142","title":"Rethinking GSPO: The Perplexity-Entropy Equivalence","abstract":"We provide a new perspective on GSPO's length-normalized importance ratios by establishing their connection to information-theoretic quantities. We show that GSPO's sequence-level weight $s(θ) = (π_θ/π_{θ_{\\text{old}}})^{1/|y|}$ can be equivalently expressed as the inverse perplexity ratio $\\text{PPL}_{θ_{\\text{old}}}/\\text{PPL}_θ$ and as the exponential cross-entropy change $\\exp(ΔH)$. While the perplexity-entropy relationship follows from standard definitions, this observation provides a useful lens for understanding GSPO: the algorithm weights policy gradient updates by perplexity ratios, offering an information-theoretic interpretation of the importance weights. This perspective helps explain GSPO's empirical properties, including log-domain variance reduction through geometric averaging and stability in training mixture-of-experts models. We validate the mathematical equivalences and variance predictions through controlled experiments on mathematical reasoning tasks.","short_abstract":"We provide a new perspective on GSPO's length-normalized importance ratios by establishing their connection to information-theoretic quantities. We show that GSPO's sequence-level weight $s(θ) = (π_θ/π_{θ_{\\text{old}}})^{1/|y|}$ can be equivalently expressed as the inverse perplexity ratio $\\text{PPL}_{θ_{\\text{old}}}/...","url_abs":"https://arxiv.org/abs/2510.23142","url_pdf":"https://arxiv.org/pdf/2510.23142v1","authors":"[\"Chi Liu\"]","published":"2025-10-27T09:19:10Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.CL\"]","methods":"[]","has_code":false}
