{"ID":2889925,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.20252","arxiv_id":"2507.20252","title":"Post-Completion Learning for Language Models","abstract":"Current language model training paradigms typically terminate learning upon reaching the end-of-sequence (\u003ceos\u003e) token, overlooking the potential learning opportunities in the post-completion space. We propose Post-Completion Learning (PCL), a novel training framework that systematically utilizes the sequence space after model output completion, to enhance both the reasoning and self-evaluation abilities. PCL enables models to continue generating self-assessments and reward predictions during training, while maintaining efficient inference by stopping at the completion point. To fully utilize this post-completion space, we design a white-box reinforcement learning method: let the model evaluate the output content according to the reward rules, then calculate and align the score with the reward functions for supervision. We implement dual-track SFT to optimize both reasoning and evaluation capabilities, and mixed it with RL training to achieve multi-objective hybrid optimization. Experimental results on different datasets and models demonstrate consistent improvements over traditional SFT and RL methods. Our method provides a new technical path for language model training that enhances output quality while preserving deployment efficiency.","short_abstract":"Current language model training paradigms typically terminate learning upon reaching the end-of-sequence (\u003ceos\u003e) token, overlooking the potential learning opportunities in the post-completion space. We propose Post-Completion Learning (PCL), a novel training framework that systematically utilizes the sequence space aft...","url_abs":"https://arxiv.org/abs/2507.20252","url_pdf":"https://arxiv.org/pdf/2507.20252v3","authors":"[\"Xiang Fei\",\"Siqi Wang\",\"Shu Wei\",\"Yuxiang Nie\",\"Wei Shi\",\"Hao Feng\",\"Chao Feng\",\"Can Huang\"]","published":"2025-07-27T12:47:26Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Reinforcement Learning\",\"Language Model\"]","has_code":false}
