{"ID":3004922,"CreatedAt":"2026-06-03T03:09:48.883664427Z","UpdatedAt":"2026-06-04T19:14:31.964469513Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.03080","arxiv_id":"2606.03080","title":"Regret Pre-training: Bridging Prior and Posterior Views for Enhanced Knowledge Grounding","abstract":"Causal language models factorize sequence probabilities using only preceding context, leaving future information unexploited during training despite its availability in the training data. This paper introduces Regret Pre-training, a self-supervised framework grounded in the Learning Using Privileged Information (LUPI) paradigm. The framework employs a dual-view architecture in which a single model generates both a causal Student distribution and a future-conditioned Teacher distribution. The training objective augments standard language modeling with a regret loss that minimizes the KL divergence from teacher to student, transferring future-aware signals to the causal representations. We investigate two teacher configurations on the OLMoE-1B-7B architecture:LocalRegret, which extends attention by one future token, andGlobalRegret, which conditions on bidirectional context with the target position masked. Experiments on nine downstream tasks following 4 billion tokens of training demonstrate that both configurations consistently outperform the baseline. On average,GlobalRegret andLocalRegret achieve 33.9% and 32.2% accuracy respectively, surpassing the baseline's 30.2%. Most notably,GlobalRegret improves BoolQ performance by 18.1 percentage points (61.0% vs 42.9%). The framework introduces no additional parameters and requires only one extra inference-mode forward pass per training step.","short_abstract":"Causal language models factorize sequence probabilities using only preceding context, leaving future information unexploited during training despite its availability in the training data. This paper introduces Regret Pre-training, a self-supervised framework grounded in the Learning Using Privileged Information (LUPI)...","url_abs":"https://arxiv.org/abs/2606.03080","url_pdf":"https://arxiv.org/pdf/2606.03080v1","authors":"[\"Mingkuan Zhao\",\"Xiayu Sun\",\"Wentao Hu\",\"Suquan Chen\",\"Jiaxuan Li\",\"Xiaoyan Zhu\",\"Xin Lai\",\"Jiayin Wang\"]","published":"2026-06-02T03:11:39Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Language Model\"]","has_code":false}