{"ID":2867201,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.19098","arxiv_id":"2509.19098","title":"Asymptotically Optimal Problem-Dependent Bandit Policies for Transfer Learning","abstract":"We study the non-contextual multi-armed bandit problem in a transfer learning setting: before any pulls, the learner is given N'_k i.i.d. samples from each source distribution nu'_k, and the true target distributions nu_k lie within a known distance bound d_k(nu_k, nu'_k) \u003c= L_k. In this framework, we first derive a problem-dependent asymptotic lower bound on cumulative regret that extends the classical Lai-Robbins result to incorporate the transfer parameters (d_k, L_k, N'_k). We then propose KL-UCB-Transfer, a simple index policy that matches this new bound in the Gaussian case. Finally, we validate our approach via simulations, showing that KL-UCB-Transfer significantly outperforms the no-prior baseline when source and target distributions are sufficiently close.","short_abstract":"We study the non-contextual multi-armed bandit problem in a transfer learning setting: before any pulls, the learner is given N'_k i.i.d. samples from each source distribution nu'_k, and the true target distributions nu_k lie within a known distance bound d_k(nu_k, nu'_k) \u003c= L_k. In this framework, we first derive a pr...","url_abs":"https://arxiv.org/abs/2509.19098","url_pdf":"https://arxiv.org/pdf/2509.19098v1","authors":"[\"Adrien Prevost\",\"Timothee Mathieu\",\"Odalric-Ambrym Maillard\"]","published":"2025-09-23T14:47:42Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"math.ST\"]","methods":"[]","has_code":false}
