{"ID":2896255,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.07969","arxiv_id":"2507.07969","title":"Reinforcement Learning with Action Chunking","abstract":"We present Q-chunking, a simple yet effective recipe for improving reinforcement learning (RL) algorithms for long-horizon, sparse-reward tasks. Our recipe is designed for the offline-to-online RL setting, where the goal is to leverage an offline prior dataset to maximize the sample-efficiency of online learning. Effective exploration and sample-efficient learning remain central challenges in this setting, as it is not obvious how the offline data should be utilized to acquire a good exploratory policy. Our key insight is that action chunking, a technique popularized in imitation learning where sequences of future actions are predicted rather than a single action at each timestep, can be applied to temporal difference (TD)-based RL methods to mitigate the exploration challenge. Q-chunking adopts action chunking by directly running RL in a 'chunked' action space, enabling the agent to (1) leverage temporally consistent behaviors from offline data for more effective online exploration and (2) use unbiased $n$-step backups for more stable and efficient TD learning. Our experimental results demonstrate that Q-chunking exhibits strong offline performance and online sample efficiency, outperforming prior best offline-to-online methods on a range of long-horizon, sparse-reward manipulation tasks.","short_abstract":"We present Q-chunking, a simple yet effective recipe for improving reinforcement learning (RL) algorithms for long-horizon, sparse-reward tasks. Our recipe is designed for the offline-to-online RL setting, where the goal is to leverage an offline prior dataset to maximize the sample-efficiency of online learning. Effec...","url_abs":"https://arxiv.org/abs/2507.07969","url_pdf":"https://arxiv.org/pdf/2507.07969v4","authors":"[\"Qiyang Li\",\"Zhiyuan Zhou\",\"Sergey Levine\"]","published":"2025-07-10T17:48:03Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.RO\",\"stat.ML\"]","methods":"[\"Reinforcement Learning\",\"LoRA\"]","has_code":false}
