{"ID":2866156,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.21500","arxiv_id":"2509.21500","title":"Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training","abstract":"Reinforcement fine-tuning (RFT) often suffers from reward over-optimization, where a policy model hacks the reward signals to achieve high scores while producing low-quality outputs. Our theoretical analysis shows that the key lies in reward misspecification at the high-reward tail: the inability to reliably distinguish Excellent responses from merely Great ones. This motivate us to focus on the high-reward region. However, such tail examples are scarce under the base LLM. While off-policy exemplars (e.g. from stronger models or rewrites) are easier to obtain, naively training on them yields a misspecified reward for the policy we aim to align. To address this, we study rubric-based rewards. By design, rubrics can leverage off-policy examples while remaining insensitive to their artifacts. To elicit rubrics that capture the high-reward tail, we highlight the importance of distinguishing among great and diverse responses, and introduce a workflow to implement this idea. We empirically demonstrate that rubric-based rewards substantially mitigate reward over-optimization and deliver effective LLM post-training improvements.","short_abstract":"Reinforcement fine-tuning (RFT) often suffers from reward over-optimization, where a policy model hacks the reward signals to achieve high scores while producing low-quality outputs. Our theoretical analysis shows that the key lies in reward misspecification at the high-reward tail: the inability to reliably distinguis...","url_abs":"https://arxiv.org/abs/2509.21500","url_pdf":"https://arxiv.org/pdf/2509.21500v3","authors":"[\"Junkai Zhang\",\"Zihao Wang\",\"Lin Gui\",\"Swarnashree Mysore Sathyendra\",\"Jaehwan Jeong\",\"Victor Veitch\",\"Wei Wang\",\"Yunzhong He\",\"Bing Liu\",\"Lifeng Jin\"]","published":"2025-09-25T19:57:39Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
