{"ID":2868841,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.15981","arxiv_id":"2509.15981","title":"Uncertainty-Based Smooth Policy Regularisation for Reinforcement Learning with Few Demonstrations","abstract":"In reinforcement learning with sparse rewards, demonstrations can accelerate learning, but determining when to imitate them remains challenging. We propose Smooth Policy Regularisation from Demonstrations (SPReD), a framework that addresses the fundamental question: when should an agent imitate a demonstration versus follow its own policy? SPReD uses ensemble methods to explicitly model Q-value distributions for both demonstration and policy actions, quantifying uncertainty for comparisons. We develop two complementary uncertainty-aware methods: a probabilistic approach estimating the likelihood of demonstration superiority, and an advantage-based approach scaling imitation by statistical significance. Unlike prevailing methods (e.g. Q-filter) that make binary imitation decisions, SPReD applies continuous, uncertainty-proportional regularisation weights, reducing gradient variance during training. Despite its computational simplicity, SPReD achieves remarkable gains in experiments across eight robotics tasks, outperforming existing approaches by up to a factor of 14 in complex tasks while maintaining robustness to demonstration quality and quantity. Our code is available at https://github.com/YujieZhu7/SPReD.","short_abstract":"In reinforcement learning with sparse rewards, demonstrations can accelerate learning, but determining when to imitate them remains challenging. We propose Smooth Policy Regularisation from Demonstrations (SPReD), a framework that addresses the fundamental question: when should an agent imitate a demonstration versus f...","url_abs":"https://arxiv.org/abs/2509.15981","url_pdf":"https://arxiv.org/pdf/2509.15981v2","authors":"[\"Yujie Zhu\",\"Charles A. Hepburn\",\"Matthew Thorpe\",\"Giovanni Montana\"]","published":"2025-09-19T13:47:20Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.RO\",\"stat.ML\"]","methods":"[\"Reinforcement Learning\"]","has_code":false,"code_links":[{"ID":609629,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2868841,"paper_url":"https://arxiv.org/abs/2509.15981","paper_title":"Uncertainty-Based Smooth Policy Regularisation for Reinforcement Learning with Few Demonstrations","repo_url":"https://github.com/YujieZhu7/SPReD","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}