{"ID":2887602,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.01329","arxiv_id":"2508.01329","title":"Is Exploration or Optimization the Problem for Deep Reinforcement Learning?","abstract":"In the era of deep reinforcement learning, making progress is more complex, as the collected experience must be compressed into a deep model for future exploitation and sampling. Many papers have shown that training a deep learning policy under the changing state and action distribution leads to sub-optimal performance, or even collapse. This naturally leads to the concern that even if the community creates improved exploration algorithms or reward objectives, will those improvements fall on the \\textit{deaf ears} of optimization difficulties. This work proposes a new \\textit{practical} sub-optimality estimator to determine optimization limitations of deep reinforcement learning algorithms. Through experiments across environments and RL algorithms, it is shown that the difference between the best experience generated is 2-3$\\times$ better than the policies' learned performance. This large difference indicates that deep RL methods only exploit half of the good experience they generate.","short_abstract":"In the era of deep reinforcement learning, making progress is more complex, as the collected experience must be compressed into a deep model for future exploitation and sampling. Many papers have shown that training a deep learning policy under the changing state and action distribution leads to sub-optimal performance...","url_abs":"https://arxiv.org/abs/2508.01329","url_pdf":"https://arxiv.org/pdf/2508.01329v1","authors":"[\"Glen Berseth\"]","published":"2025-08-02T11:40:26Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Reinforcement Learning\",\"LoRA\"]","has_code":false}