{"ID":2853939,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.15382","arxiv_id":"2510.15382","title":"Towards Robust Zero-Shot Reinforcement Learning","abstract":"The recent development of zero-shot reinforcement learning (RL) has opened a new avenue for learning pre-trained generalist policies that can adapt to arbitrary new tasks in a zero-shot manner. While the popular Forward-Backward representations (FB) and related methods have shown promise in zero-shot RL, we empirically found that their modeling lacks expressivity and that extrapolation errors caused by out-of-distribution (OOD) actions during offline learning sometimes lead to biased representations, ultimately resulting in suboptimal performance. To address these issues, we propose Behavior-REgularizEd Zero-shot RL with Expressivity enhancement (BREEZE), an upgraded FB-based framework that simultaneously enhances learning stability, policy extraction capability, and representation learning quality. BREEZE introduces behavioral regularization in zero-shot RL policy learning, transforming policy optimization into a stable in-sample learning paradigm. Additionally, BREEZE extracts the policy using a task-conditioned diffusion model, enabling the generation of high-quality and multimodal action distributions in zero-shot RL settings. Moreover, BREEZE employs expressive attention-based architectures for representation modeling to capture the complex relationships between environmental dynamics. Extensive experiments on ExORL and D4RL Kitchen demonstrate that BREEZE achieves the best or near-the-best performance while exhibiting superior robustness compared to prior offline zero-shot RL methods. The official implementation is available at: https://github.com/Whiterrrrr/BREEZE.","short_abstract":"The recent development of zero-shot reinforcement learning (RL) has opened a new avenue for learning pre-trained generalist policies that can adapt to arbitrary new tasks in a zero-shot manner. While the popular Forward-Backward representations (FB) and related methods have shown promise in zero-shot RL, we empirically...","url_abs":"https://arxiv.org/abs/2510.15382","url_pdf":"https://arxiv.org/pdf/2510.15382v2","authors":"[\"Kexin Zheng\",\"Lauriane Teyssier\",\"Yinan Zheng\",\"Yu Luo\",\"Xianyuan Zhan\"]","published":"2025-10-17T07:33:19Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.RO\"]","methods":"[\"Reinforcement Learning\",\"Diffusion Model\"]","has_code":false,"code_links":[{"ID":608101,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2853939,"paper_url":"https://arxiv.org/abs/2510.15382","paper_title":"Towards Robust Zero-Shot Reinforcement Learning","repo_url":"https://github.com/Whiterrrrr/BREEZE","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
