{"ID":2892097,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.15290","arxiv_id":"2507.15290","title":"Feel-Good Thompson Sampling for Contextual Bandits: a Markov Chain Monte Carlo Showdown","abstract":"Thompson Sampling (TS) is widely used to address the exploration/exploitation tradeoff in contextual bandits, yet recent theory shows that it does not explore aggressively enough in high-dimensional problems. Feel-Good Thompson Sampling (FG-TS) addresses this by adding an optimism bonus that biases toward high-reward models, and it achieves the asymptotically minimax-optimal regret in the linear setting when posteriors are exact. However, its performance with \\emph{approximate} posteriors -- common in large-scale or neural problems -- has not been benchmarked. We provide the first systematic study of FG-TS and its smoothed variant (SFG-TS) across eleven real-world and synthetic benchmarks. To evaluate their robustness, we compare performance across settings with exact posteriors (linear and logistic bandits) to approximate regimes produced by fast but coarse stochastic-gradient samplers. Ablations over preconditioning, bonus scale, and prior strength reveal a trade-off: larger bonuses help when posterior samples are accurate, but hurt when sampling noise dominates. FG-TS generally outperforms vanilla TS in linear and logistic bandits, but tends to be weaker in neural bandits. Nevertheless, because FG-TS and its variants are competitive and easy-to-use, we recommend them as baselines in modern contextual-bandit benchmarks. Finally, we provide source code for all our experiments in https://github.com/SarahLiaw/ctx-bandits-mcmc-showdown.","short_abstract":"Thompson Sampling (TS) is widely used to address the exploration/exploitation tradeoff in contextual bandits, yet recent theory shows that it does not explore aggressively enough in high-dimensional problems. Feel-Good Thompson Sampling (FG-TS) addresses this by adding an optimism bonus that biases toward high-reward m...","url_abs":"https://arxiv.org/abs/2507.15290","url_pdf":"https://arxiv.org/pdf/2507.15290v3","authors":"[\"Emile Anand\",\"Sarah Liaw\"]","published":"2025-07-21T06:42:56Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"LoRA\"]","has_code":false,"code_links":[{"ID":611955,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2892097,"paper_url":"https://arxiv.org/abs/2507.15290","paper_title":"Feel-Good Thompson Sampling for Contextual Bandits: a Markov Chain Monte Carlo Showdown","repo_url":"https://github.com/SarahLiaw/ctx-bandits-mcmc-showdown","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}