{"ID":2846909,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.02130","arxiv_id":"2511.02130","title":"Re-FORC: Adaptive Reward Prediction for Efficient Chain-of-Thought Reasoning","abstract":"We propose Re-FORC, an adaptive reward prediction method that, given a context, enables prediction of the expected future rewards as a function of the number of future thinking tokens. Re-FORC trains a lightweight adapter on reasoning models, demonstrating improved prediction with longer reasoning and larger models. Re-FORC enables: 1) early stopping of unpromising reasoning chains, reducing compute by 26% while maintaining accuracy, 2) optimized model and thinking length selection that achieves 4% higher accuracy at equal compute and 55% less compute at equal accuracy compared to the largest model, 3) adaptive test-time scaling, which increases accuracy by 11% in high compute regime, and 7% in low compute regime. Re-FORC allows dynamic reasoning with length control via cost-per-token thresholds while estimating computation time upfront.","short_abstract":"We propose Re-FORC, an adaptive reward prediction method that, given a context, enables prediction of the expected future rewards as a function of the number of future thinking tokens. Re-FORC trains a lightweight adapter on reasoning models, demonstrating improved prediction with longer reasoning and larger models. Re...","url_abs":"https://arxiv.org/abs/2511.02130","url_pdf":"https://arxiv.org/pdf/2511.02130v1","authors":"[\"Renos Zabounidis\",\"Aditya Golatkar\",\"Michael Kleinman\",\"Alessandro Achille\",\"Wei Xia\",\"Stefano Soatto\"]","published":"2025-11-03T23:47:49Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.LG\"]","methods":"[]","has_code":false}
