{"ID":2823644,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.24669","arxiv_id":"2512.24669","title":"Nonparametric Bandits with Single-Index Rewards: Optimality and Adaptivity","abstract":"Contextual bandits are a central framework for sequential decision-making, with applications ranging from recommendation systems to clinical trials. While nonparametric methods can flexibly model complex reward structures, they suffer from the curse of dimensionality. We address this challenge using a single-index model, which projects high-dimensional covariates onto a one-dimensional subspace while preserving nonparametric flexibility. We first develop a nonasymptotic theory for offline single-index regression for each arm, combining maximum rank correlation for index estimation with local polynomial regression. Building on this foundation, we propose a single-index bandit algorithm and establish its convergence rate. We further derive a matching lower bound, showing that the algorithm achieves minimax-optimal regret independent of the ambient dimension $d$, thereby overcoming the curse of dimensionality. We also establish an impossibility result for adaptation: without additional assumptions, no policy can adapt to unknown smoothness levels. Under a standard self-similarity condition, however, we construct a policy that remains minimax-optimal while automatically adapting to the unknown smoothness. Finally, as the dimension $d$ increases, our algorithm continues to achieve minimax-optimal regret, revealing a phase transition that characterizes the fundamental limits of single-index bandit learning.","short_abstract":"Contextual bandits are a central framework for sequential decision-making, with applications ranging from recommendation systems to clinical trials. While nonparametric methods can flexibly model complex reward structures, they suffer from the curse of dimensionality. We address this challenge using a single-index mode...","url_abs":"https://arxiv.org/abs/2512.24669","url_pdf":"https://arxiv.org/pdf/2512.24669v1","authors":"[\"Wanteng Ma\",\"T. Tony Cai\"]","published":"2025-12-31T06:48:58Z","proceeding":"math.ST","tasks":"[\"math.ST\"]","methods":"[]","has_code":false}
