{"ID":2868416,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.16650","arxiv_id":"2509.16650","title":"Safe and Near-Optimal Control with Online Dynamics Learning","abstract":"Achieving both optimality and safety under unknown system dynamics is a central challenge in real-world deployment of agents. To address this, we introduce a notion of maximum safe dynamics learning, where sufficient exploration is performed within the space of safe policies. Our method executes $\\textit{pessimistically}$ safe policies while $\\textit{optimistically}$ exploring informative states and, despite not reaching them due to model uncertainty, ensures continuous online learning of dynamics. The framework achieves first-of-its-kind results: learning the dynamics model sufficiently $-$ up to an arbitrary small tolerance (subject to noise) $-$ in a finite time, while ensuring provably safe operation throughout with high probability and without requiring resets. Building on this, we propose an algorithm to maximize rewards while learning the dynamics $\\textit{only to the extent needed}$ to achieve close-to-optimal performance. Unlike typical reinforcement learning (RL) methods, our approach operates online in a non-episodic setting and ensures safety throughout the learning process. We demonstrate the effectiveness of our approach in challenging domains such as autonomous car racing and drone navigation under aerodynamic effects $-$ scenarios where safety is critical and accurate modeling is difficult.","short_abstract":"Achieving both optimality and safety under unknown system dynamics is a central challenge in real-world deployment of agents. To address this, we introduce a notion of maximum safe dynamics learning, where sufficient exploration is performed within the space of safe policies. Our method executes $\\textit{pessimisticall...","url_abs":"https://arxiv.org/abs/2509.16650","url_pdf":"https://arxiv.org/pdf/2509.16650v2","authors":"[\"Manish Prajapat\",\"Johannes Köhler\",\"Melanie N. Zeilinger\",\"Andreas Krause\"]","published":"2025-09-20T11:55:24Z","proceeding":"eess.SY","tasks":"[\"eess.SY\",\"cs.LG\",\"cs.RO\",\"math.DS\",\"math.OC\"]","methods":"[\"Reinforcement Learning\",\"LoRA\"]","has_code":false}
