{"ID":2859835,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.04728","arxiv_id":"2510.04728","title":"EVaR-Optimal Arm Identification in Bandits","abstract":"We study the fixed-confidence best arm identification (BAI) problem within the multi-armed bandit (MAB) framework under the Entropic Value-at-Risk (EVaR) criterion. Our analysis considers a nonparametric setting, allowing for general reward distributions bounded in [0,1]. This formulation addresses the critical need for risk-averse decision-making in high-stakes environments, such as finance, moving beyond simple expected value optimization. We propose a $δ$-correct, Track-and-Stop based algorithm and derive a corresponding lower bound on the expected sample complexity, which we prove is asymptotically matched. The implementation of our algorithm and the characterization of the lower bound both require solving a complex convex optimization problem and a related, simpler non-convex one.","short_abstract":"We study the fixed-confidence best arm identification (BAI) problem within the multi-armed bandit (MAB) framework under the Entropic Value-at-Risk (EVaR) criterion. Our analysis considers a nonparametric setting, allowing for general reward distributions bounded in [0,1]. This formulation addresses the critical need fo...","url_abs":"https://arxiv.org/abs/2510.04728","url_pdf":"https://arxiv.org/pdf/2510.04728v1","authors":"[\"Mehrasa Ahmadipour\",\"Aurélien Garivier\"]","published":"2025-10-06T11:49:56Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[]","has_code":false}
