{"ID":3004707,"CreatedAt":"2026-06-03T03:09:48.883664427Z","UpdatedAt":"2026-06-05T11:43:53.432517148Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.03851","arxiv_id":"2606.03851","title":"Two-Action Apple Tasting with Switching Costs","abstract":"We study the two-action apple-tasting problem with switching costs against an oblivious adversary. In an equivalent normalized formulation, at each round the learner chooses between a revealing action and a blind action: the revealing action gives reward $0$ and reveals the hidden value $x_t\\in[-1,1]$ of the blind action; the blind action gives reward $x_t$ but reveals nothing. The learner pays one unit whenever they switches actions, and regret is measured against the best fixed action in hindsight. General feedback-graph algorithms with switching costs give $\\widetilde O(T^{2/3})$ regret guarantees for this problem. The two-action apple-tasting graph was the natural candidate for the missing $Ω(T^{2/3})$ obstruction in the switching-cost classification: such a lower bound would have transferred to a large family of still-unclassified feedback graphs. We prove that this obstruction is not there: the oblivious minimax expected regret for this problem satisfies \\[ \\frac{1}{2\\sqrt3}\\cdot\\sqrt T \\le R_T^\\star \\le 2\\sqrt{3}\\cdot \\sqrt{T}. \\]","short_abstract":"We study the two-action apple-tasting problem with switching costs against an oblivious adversary. In an equivalent normalized formulation, at each round the learner chooses between a revealing action and a blind action: the revealing action gives reward $0$ and reveals the hidden value $x_t\\in[-1,1]$ of the blind acti...","url_abs":"https://arxiv.org/abs/2606.03851","url_pdf":"https://arxiv.org/pdf/2606.03851v1","authors":"[\"Tommaso Cesari\",\"Roberto Colomboni\"]","published":"2026-06-02T16:28:45Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[]","has_code":false}
