{"ID":2895610,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.08387","arxiv_id":"2507.08387","title":"Online Pre-Training for Offline-to-Online Reinforcement Learning","abstract":"Offline-to-online reinforcement learning (RL) aims to integrate the complementary strengths of offline and online RL by pre-training an agent offline and subsequently fine-tuning it through online interactions. However, recent studies reveal that offline pre-trained agents often underperform during online fine-tuning due to inaccurate value estimation caused by distribution shift, with random initialization proving more effective in certain cases. In this work, we propose a novel method, Online Pre-Training for Offline-to-Online RL (OPT), explicitly designed to address the issue of inaccurate value estimation in offline pre-trained agents. OPT introduces a new learning phase, Online Pre-Training, which allows the training of a new value function tailored specifically for effective online fine-tuning. Implementation of OPT on TD3 and SPOT demonstrates an average 30% improvement in performance across a wide range of D4RL environments, including MuJoCo, Antmaze, and Adroit.","short_abstract":"Offline-to-online reinforcement learning (RL) aims to integrate the complementary strengths of offline and online RL by pre-training an agent offline and subsequently fine-tuning it through online interactions. However, recent studies reveal that offline pre-trained agents often underperform during online fine-tuning d...","url_abs":"https://arxiv.org/abs/2507.08387","url_pdf":"https://arxiv.org/pdf/2507.08387v1","authors":"[\"Yongjae Shin\",\"Jeonghye Kim\",\"Whiyoung Jung\",\"Sunghoon Hong\",\"Deunsol Yoon\",\"Youngsoo Jang\",\"Geonhyeong Kim\",\"Jongseong Chae\",\"Youngchul Sung\",\"Kanghoon Lee\",\"Woohyung Lim\"]","published":"2025-07-11T08:00:12Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
