{"ID":2921217,"CreatedAt":"2026-06-02T02:42:49.606572591Z","UpdatedAt":"2026-06-04T00:54:56.190393508Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.01635","arxiv_id":"2606.01635","title":"AlphaToken: Decoupling Adaptation and Stability for Path-Aware Response Token Valuation in LLM Post-Training","abstract":"Token selection is pivotal for effective LLM post-training. However, existing methods mostly rely on local heuristics and rarely formulate token selection as a principled valuation of individual response tokens. We introduce $\\textbf{AlphaToken}$, a response token valuation framework that decouples valuation into $\\textbf{adaptation}$ (promoting target-task learning) and $\\textbf{stability}$ (preserving pre-trained capabilities), and makes each objective $\\textbf{path-aware}$ by combining the direct-path signal from local token gradients with the downstream causal-path signal in autoregressive generation. Since retention data are typically unavailable, AlphaToken approximates stability via a $\\textbf{Fisher-drift proxy}$ anchored at the pre-trained reference model. For efficient computation, we extend Ghost Dot-Product to token-level valuation. AlphaToken masks low-value response tokens during fine-tuning and preference optimization, concentrating training signals on more valuable positions. Experiments show that AlphaToken improves post-training performance and mitigates catastrophic forgetting.","short_abstract":"Token selection is pivotal for effective LLM post-training. However, existing methods mostly rely on local heuristics and rarely formulate token selection as a principled valuation of individual response tokens. We introduce $\\textbf{AlphaToken}$, a response token valuation framework that decouples valuation into $\\tex...","url_abs":"https://arxiv.org/abs/2606.01635","url_pdf":"https://arxiv.org/pdf/2606.01635v1","authors":"[\"Liu Qing\",\"Ou Wu\",\"Yi Du\"]","published":"2026-06-01T03:40:35Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Large Language Model\"]","has_code":false}
