{"ID":2845936,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.03808","arxiv_id":"2511.03808","title":"Optimizing Reasoning Efficiency through Prompt Difficulty Prediction","abstract":"Reasoning language models perform well on complex tasks but are costly to deploy due to their size and long reasoning traces. We propose a routing approach that assigns each problem to the smallest model likely to solve it, reducing compute without sacrificing accuracy. Using intermediate representations from s1.1-32B, we train lightweight predictors of problem difficulty or model correctness to guide routing across a pool of reasoning models. On diverse math benchmarks, routing improves efficiency over random assignment and matches s1.1-32B's performance while using significantly less compute. Our results demonstrate that difficulty-aware routing is effective for cost-efficient deployment of reasoning models.","short_abstract":"Reasoning language models perform well on complex tasks but are costly to deploy due to their size and long reasoning traces. We propose a routing approach that assigns each problem to the smallest model likely to solve it, reducing compute without sacrificing accuracy. Using intermediate representations from s1.1-32B,...","url_abs":"https://arxiv.org/abs/2511.03808","url_pdf":"https://arxiv.org/pdf/2511.03808v1","authors":"[\"Bo Zhao\",\"Berkcan Kapusuzoglu\",\"Kartik Balasubramaniam\",\"Sambit Sahu\",\"Supriyo Chakraborty\",\"Genta Indra Winata\"]","published":"2025-11-05T19:14:53Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Language Model\"]","has_code":false}
