{"ID":2922165,"CreatedAt":"2026-06-02T02:42:49.606572591Z","UpdatedAt":"2026-06-02T17:44:34.312992241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.00840","arxiv_id":"2606.00840","title":"Certificate-Guided Evaluation of Reinforcement Learning Generalization","abstract":"This work presents a logic-driven framework to evaluate the performance of reinforcement learning (RL) algorithms in their ability to generalize to unseen tasks. Our framework defines a family of inductive reach-avoid tasks, characterized by structural similarities in task dynamics, enabling evaluation of generalization capabilities. We introduce a neural certificate function that validates trajectories generated by RL algorithms by enforcing key conditions, thereby serving as a litmus test for RL generalization. We empirically demonstrate our method's capability in certifying generalization for several state-of-the-art generalizable RL algorithms on challenging continuous environments. Our results show that a lower percentage of certificate function violations correlates with a higher number of test tasks successfully solved, highlighting the effectiveness of our framework in evaluating and distinguishing generalization capabilities of RL algorithms. This work provides a principled approach for benchmarking RL generalization.","short_abstract":"This work presents a logic-driven framework to evaluate the performance of reinforcement learning (RL) algorithms in their ability to generalize to unseen tasks. Our framework defines a family of inductive reach-avoid tasks, characterized by structural similarities in task dynamics, enabling evaluation of generalizatio...","url_abs":"https://arxiv.org/abs/2606.00840","url_pdf":"https://arxiv.org/pdf/2606.00840v1","authors":"[\"Vignesh Subramanian\",\"Đorđe Žikelić\",\"Suguman Bansal\"]","published":"2026-05-30T18:31:57Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}