{"ID":2885915,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.04654","arxiv_id":"2508.04654","title":"Non-Stationary Bandit Convex Optimization: An Optimal Algorithm with Two-Point Feedback","abstract":"This paper studies bandit convex optimization in non-stationary environments with two-point feedback, using dynamic regret as the performance measure. We propose an algorithm based on bandit mirror descent that extends naturally to non-Euclidean settings. Let $T$ be the total number of iterations and $\\mathcal{P}_{T,p}$ the path variation with respect to the $\\ell_p$-norm. In Euclidean space, our algorithm matches the optimal regret bound $\\mathcal{O}(\\sqrt{dT(1+\\mathcal{P}_{T,2})})$, improving upon \\citet{zhao2021bandit} by a factor of $\\mathcal{O}(\\sqrt{d})$. Beyond Euclidean settings, our algorithm achieves an upper bound of $\\mathcal{O}(\\sqrt{d\\log(d)T\\log(T)(1 + \\mathcal{P}_{T,1})})$ on the simplex, which is nearly optimal up to log factors. For the cross-polytope, the bound reduces to $\\mathcal{O}(\\sqrt{d\\log(d)T(1+\\mathcal{P}_{T,p})})$ for some $p = 1 + 1/\\log(d)$.","short_abstract":"This paper studies bandit convex optimization in non-stationary environments with two-point feedback, using dynamic regret as the performance measure. We propose an algorithm based on bandit mirror descent that extends naturally to non-Euclidean settings. Let $T$ be the total number of iterations and $\\mathcal{P}_{T,p}...","url_abs":"https://arxiv.org/abs/2508.04654","url_pdf":"https://arxiv.org/pdf/2508.04654v3","authors":"[\"Chang He\",\"Bo Jiang\",\"Shuzhong Zhang\"]","published":"2025-08-06T17:18:47Z","proceeding":"math.OC","tasks":"[\"math.OC\"]","methods":"[]","has_code":false}
