{"ID":2861067,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.03199","arxiv_id":"2510.03199","title":"Best-of-Majority: Minimax-Optimal Strategy for Pass@$k$ Inference Scaling","abstract":"LLM inference often generates a batch of candidates for a prompt and selects one via strategies like majority voting or Best-of- N (BoN). For difficult tasks, this single-shot selection often underperforms. Consequently, evaluations commonly report Pass@$k$: the agent may submit up to $k$ responses, and only the best of them is used when computing regret. Motivated by this, we study inference scaling in the more general Pass@$k$ inference setting, and prove that neither majority voting nor BoN exhibits the desirable scaling with $k$ and the sampling budget $N$. Combining the advantages of majority voting and BoN, we propose a new inference strategy called Best-of-Majority (BoM), with a pivotal step that restricts the candidates to the responses with high frequency in the $N$ samples before selecting the top-$k$ rewards. We prove that when the sampling budget is $N=\\tildeΩ(C^*)$, the regret of BoM is $O(ε_{\\mathrm{opt}}+\\sqrt{ε_{\\mathrm{RM}}^2C^*/k})$, where $C^*$ is the coverage coefficient, $ε_{\\mathrm{RM}}$ is the estimation error of the reward model, and $ε_{\\mathrm{opt}}$ is the estimation error of reward at the optimal response. We further establish a matching lower bound, certifying that our algorithm is minimax optimal. Beyond optimality, BoM has a key advantage: unlike majority voting and BoN, its performance does not degrade when increasing $N$. Experimental results of inference on math problems show BoM outperforming both majority voting and BoN.","short_abstract":"LLM inference often generates a batch of candidates for a prompt and selects one via strategies like majority voting or Best-of- N (BoN). For difficult tasks, this single-shot selection often underperforms. Consequently, evaluations commonly report Pass@$k$: the agent may submit up to $k$ responses, and only the best o...","url_abs":"https://arxiv.org/abs/2510.03199","url_pdf":"https://arxiv.org/pdf/2510.03199v1","authors":"[\"Qiwei Di\",\"Kaixuan Ji\",\"Xuheng Li\",\"Heyang Zhao\",\"Quanquan Gu\"]","published":"2025-10-03T17:35:45Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"stat.ML\"]","methods":"[\"Large Language Model\"]","has_code":false}
