{"ID":2851015,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.20264","arxiv_id":"2510.20264","title":"Optimistic Task Inference for Behavior Foundation Models","abstract":"Behavior Foundation Models (BFMs) are capable of retrieving high-performing policy for any reward function specified directly at test-time, commonly referred to as zero-shot reinforcement learning (RL). While this is a very efficient process in terms of compute, it can be less so in terms of data: as a standard assumption, BFMs require computing rewards over a non-negligible inference dataset, assuming either access to a functional form of rewards, or significant labeling efforts. To alleviate these limitations, we tackle the problem of task inference purely through interaction with the environment at test-time. We propose OpTI-BFM, an optimistic decision criterion that directly models uncertainty over reward functions and guides BFMs in data collection for task inference. Formally, we provide a regret bound for well-trained BFMs through a direct connection to upper-confidence algorithms for linear bandits. Empirically, we evaluate OpTI-BFM on established zero-shot benchmarks, and observe that it enables successor-features-based BFMs to identify and optimize an unseen reward function in a handful of episodes with minimal compute overhead. Code is available at https://github.com/ThomasRupf/opti-bfm.","short_abstract":"Behavior Foundation Models (BFMs) are capable of retrieving high-performing policy for any reward function specified directly at test-time, commonly referred to as zero-shot reinforcement learning (RL). While this is a very efficient process in terms of compute, it can be less so in terms of data: as a standard assumpt...","url_abs":"https://arxiv.org/abs/2510.20264","url_pdf":"https://arxiv.org/pdf/2510.20264v2","authors":"[\"Thomas Rupf\",\"Marco Bagatella\",\"Marin Vlastelica\",\"Andreas Krause\"]","published":"2025-10-23T06:36:18Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Reinforcement Learning\"]","has_code":false,"code_links":[{"ID":607858,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2851015,"paper_url":"https://arxiv.org/abs/2510.20264","paper_title":"Optimistic Task Inference for Behavior Foundation Models","repo_url":"https://github.com/ThomasRupf/opti-bfm","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
