{"ID":2825709,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.20096","arxiv_id":"2512.20096","title":"Information-directed sampling for bandits: a primer","abstract":"The Multi-Armed Bandit problem provides a fundamental framework for analyzing the tension between exploration and exploitation in sequential learning. This paper explores Information Directed Sampling (IDS) policies, a class of heuristics that balance immediate regret against information gain. We focus on the tractable environment of two-state Bernoulli bandits as a minimal model to rigorously compare heuristic strategies against the optimal policy. We extend the IDS framework to the discounted infinite-horizon setting by introducing a modified information measure and a tuning parameter to modulate the decision-making behavior. We examine two specific problem classes: symmetric bandits and the scenario involving one fair coin. In the symmetric case we show that IDS achieves bounded cumulative regret, whereas in the one-fair-coin scenario the IDS policy yields a regret that scales logarithmically with the horizon, in agreement with classical asymptotic lower bounds. This work serves as a pedagogical synthesis, aiming to bridge concepts from reinforcement learning and information theory for an audience of statistical physicists.","short_abstract":"The Multi-Armed Bandit problem provides a fundamental framework for analyzing the tension between exploration and exploitation in sequential learning. This paper explores Information Directed Sampling (IDS) policies, a class of heuristics that balance immediate regret against information gain. We focus on the tractable...","url_abs":"https://arxiv.org/abs/2512.20096","url_pdf":"https://arxiv.org/pdf/2512.20096v1","authors":"[\"Annika Hirling\",\"Giorgio Nicoletti\",\"Antonio Celani\"]","published":"2025-12-23T06:49:33Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.IT\"]","methods":"[\"Reinforcement Learning\",\"LoRA\"]","has_code":false}
