{"ID":2850890,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.22069","arxiv_id":"2510.22069","title":"Neural Index Policies for Restless Multi-Action Bandits with Heterogeneous Budgets","abstract":"Restless multi-armed bandits (RMABs) provide a scalable framework for sequential decision-making under uncertainty, but classical formulations assume binary actions and a single global budget. Real-world settings, such as healthcare, often involve multiple interventions with heterogeneous costs and constraints, where such assumptions break down. We introduce a Neural Index Policy (NIP) for multi-action RMABs with heterogeneous budget constraints. Our approach learns to assign budget-aware indices to arm--action pairs using a neural network, and converts them into feasible allocations via a differentiable knapsack layer formulated as an entropy-regularized optimal transport (OT) problem. The resulting model unifies index prediction and constrained optimization in a single end-to-end differentiable framework, enabling gradient-based training directly on decision quality. The network is optimized to align its induced occupancy measure with the theoretical upper bound from a linear programming relaxation, bridging asymptotic RMAB theory with practical learning. Empirically, NIP achieves near-optimal performance within 5% of the oracle occupancy-measure policy while strictly enforcing heterogeneous budgets and scaling to hundreds of arms. This work establishes a general, theoretically grounded, and scalable framework for learning index-based policies in complex resource-constrained environments.","short_abstract":"Restless multi-armed bandits (RMABs) provide a scalable framework for sequential decision-making under uncertainty, but classical formulations assume binary actions and a single global budget. Real-world settings, such as healthcare, often involve multiple interventions with heterogeneous costs and constraints, where s...","url_abs":"https://arxiv.org/abs/2510.22069","url_pdf":"https://arxiv.org/pdf/2510.22069v1","authors":"[\"Himadri S. Pandey\",\"Kai Wang\",\"Gian-Gabriel P. Garcia\"]","published":"2025-10-24T23:08:36Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"stat.ML\"]","methods":"[]","has_code":false}
