{"ID":2851589,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.19348","arxiv_id":"2510.19348","title":"A Markov Decision Process for Variable Selection in Branch \u0026 Bound","abstract":"Mixed-Integer Linear Programming (MILP) is a powerful framework used to address a wide range of NP-hard combinatorial optimization problems, often solved by Branch and Bound (B\u0026B). A key factor influencing the performance of B\u0026B solvers is the variable selection heuristic governing branching decisions. Recent contributions have sought to adapt reinforcement learning (RL) algorithms to the B\u0026B setting to learn optimal branching policies, through Markov Decision Processes (MDP) inspired formulations, and ad hoc convergence theorems and algorithms. In this work, we introduce BBMDP, a principled vanilla MDP formulation for variable selection in B\u0026B, allowing to leverage a broad range of RL algorithms for the purpose of learning optimal B\\\u0026B heuristics. Computational experiments validate our model empirically, as our branching agent outperforms prior state-of-the-art RL agents on four standard MILP benchmarks.","short_abstract":"Mixed-Integer Linear Programming (MILP) is a powerful framework used to address a wide range of NP-hard combinatorial optimization problems, often solved by Branch and Bound (B\u0026B). A key factor influencing the performance of B\u0026B solvers is the variable selection heuristic governing branching decisions. Recent contribut...","url_abs":"https://arxiv.org/abs/2510.19348","url_pdf":"https://arxiv.org/pdf/2510.19348v1","authors":"[\"Paul Strang\",\"Zacharie Alès\",\"Côme Bissuel\",\"Olivier Juan\",\"Safia Kedad-Sidhoum\",\"Emmanuel Rachelson\"]","published":"2025-10-22T08:15:32Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}