{"ID":2845374,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.04454","arxiv_id":"2511.04454","title":"Fitting Reinforcement Learning Model to Behavioral Data under Bandits","abstract":"We consider the problem of fitting a reinforcement learning (RL) model to some given behavioral data under a multi-armed bandit environment. These models have received much attention in recent years for characterizing human and animal decision making behavior. We provide a generic mathematical optimization problem formulation for the fitting problem of a wide range of RL models that appear frequently in scientific research applications. We then provide a detailed theoretical analysis of its convexity properties. Based on the theoretical results, we introduce a novel solution method for the fitting problem of RL models based on convex relaxation and optimization. Our method is then evaluated in several simulated and real-world bandit environments to compare with some benchmark methods that appear in the literature. Numerical results indicate that our method achieves comparable performance to the state-of-the-art, while significantly reducing computation time. We also provide an open-source Python package for our proposed method to empower researchers to apply it in the analysis of their datasets directly, without prior knowledge of convex optimization.","short_abstract":"We consider the problem of fitting a reinforcement learning (RL) model to some given behavioral data under a multi-armed bandit environment. These models have received much attention in recent years for characterizing human and animal decision making behavior. We provide a generic mathematical optimization problem form...","url_abs":"https://arxiv.org/abs/2511.04454","url_pdf":"https://arxiv.org/pdf/2511.04454v2","authors":"[\"Hao Zhu\",\"Jasper Hoffmann\",\"Baohe Zhang\",\"Joschka Boedecker\"]","published":"2025-11-06T15:24:40Z","proceeding":"cs.CE","tasks":"[\"cs.CE\",\"cs.LG\",\"math.OC\",\"q-bio.NC\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
