{"ID":2852695,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.17390","arxiv_id":"2510.17390","title":"Exploration via Feature Perturbation in Contextual Bandits","abstract":"We propose feature perturbation, a simple yet effective exploration strategy for contextual bandits that injects randomness directly into feature inputs, instead of randomizing unknown parameters or adding noise to rewards. Remarkably, this algorithm achieves $\\tilde{\\mathcal{O}}(d\\sqrt{T})$ worst-case regret bound for generalized linear contextual bandits, while avoiding the $\\tilde{\\mathcal{O}}(d^{3/2}\\sqrt{T})$ regret typical of existing randomized bandit algorithms. Because our algorithm eschews parameter sampling, it is both computationally efficient and naturally extends to non-parametric or neural network models. We verify these advantages through empirical evaluations, demonstrating that feature perturbation not only surpasses existing methods but also unifies strong practical performance with the near-optimal regret guarantees.","short_abstract":"We propose feature perturbation, a simple yet effective exploration strategy for contextual bandits that injects randomness directly into feature inputs, instead of randomizing unknown parameters or adding noise to rewards. Remarkably, this algorithm achieves $\\tilde{\\mathcal{O}}(d\\sqrt{T})$ worst-case regret bound for...","url_abs":"https://arxiv.org/abs/2510.17390","url_pdf":"https://arxiv.org/pdf/2510.17390v2","authors":"[\"Seouh-won Yi\",\"Min-hwan Oh\"]","published":"2025-10-20T10:32:48Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"stat.ML\"]","methods":"[\"LoRA\"]","has_code":false}
